Sequential optimistic ad-hoc methods for nonstationary multi_armed bandit problem

هفدهمین کنفرانس مهندسی برق ایران ICEE2009 , 2009-05-12

Title : ( Sequential optimistic ad-hoc methods for nonstationary multi_armed bandit problem )

Authors: Majid Mazouchi , Farzaneh Tatari , Mohammad Bagher Naghibi Sistani ,

File:

Full Text

Citation: BibTeX | EndNote

Abstract

One of the common ways for showing the trade_off between exploration_exploitation in reinforcement learning problems is the multi_armed bandit problem. In this paper we consider the MABP in a nonstationary environment which features change during the period of learning. The represented learning algorithms are intuition based solutions to the exploration_explotation tarde_off that are called ad hoc method. These methods include action_value methods with e-greedy and softmax action selection rules, the probability matching method and finally the adaptive pursuit method. For producing near optimal results we change the ad hoc methods to sequential optimistic ad hoc methods which provide us completely better results.

Keywords

, Sequential optimistic ad hoc methods, Exploration_exploitation, Multi_armed bandit, Reinforcement learning, Action selection

برای دانلود از شناسه و رمز عبور پرتال پویا استفاده کنید.

BibTeX
EndNote

@inproceedings{paperid:1022487,
author = {Mazouchi, Majid and Tatari, Farzaneh and Naghibi Sistani, Mohammad Bagher},
title = {Sequential optimistic ad-hoc methods for nonstationary multi_armed bandit problem},
booktitle = {هفدهمین کنفرانس مهندسی برق ایران ICEE2009},
year = {2009},
location = {تهران, IRAN},
keywords = {Sequential optimistic ad hoc methods; Exploration_exploitation; Multi_armed bandit; Reinforcement learning; Action selection},
}

[Download]

%0 Conference Proceedings
%T Sequential optimistic ad-hoc methods for nonstationary multi_armed bandit problem
%A Mazouchi, Majid
%A Tatari, Farzaneh
%A Naghibi Sistani, Mohammad Bagher
%J هفدهمین کنفرانس مهندسی برق ایران ICEE2009
%D 2009

[Download]