هفدهمین کنفرانس مهندسی برق ایران ICEE2009 , 2009-05-12

Title : ( Sequential optimistic ad-hoc methods for nonstationary multi_armed bandit problem )

Authors: Majid Mazouchi , Farzaneh Tatari , Mohammad Bagher Naghibi Sistani ,

Citation: BibTeX | EndNote

Abstract

One of the common ways for showing the trade_off between exploration_exploitation in reinforcement learning problems is the multi_armed bandit problem. In this paper we consider the MABP in a nonstationary environment which features change during the period of learning. The represented learning algorithms are intuition based solutions to the exploration_explotation tarde_off that are called ad hoc method. These methods include action_value methods with e-greedy and softmax action selection rules, the probability matching method and finally the adaptive pursuit method. For producing near optimal results we change the ad hoc methods to sequential optimistic ad hoc methods which provide us completely better results.

Keywords

, Sequential optimistic ad hoc methods, Exploration_exploitation, Multi_armed bandit, Reinforcement learning, Action selection
برای دانلود از شناسه و رمز عبور پرتال پویا استفاده کنید.

@inproceedings{paperid:1022487,
author = {Mazouchi, Majid and Tatari, Farzaneh and Naghibi Sistani, Mohammad Bagher},
title = {Sequential optimistic ad-hoc methods for nonstationary multi_armed bandit problem},
booktitle = {هفدهمین کنفرانس مهندسی برق ایران ICEE2009},
year = {2009},
location = {تهران, IRAN},
keywords = {Sequential optimistic ad hoc methods; Exploration_exploitation; Multi_armed bandit; Reinforcement learning; Action selection},
}

[Download]

%0 Conference Proceedings
%T Sequential optimistic ad-hoc methods for nonstationary multi_armed bandit problem
%A Mazouchi, Majid
%A Tatari, Farzaneh
%A Naghibi Sistani, Mohammad Bagher
%J هفدهمین کنفرانس مهندسی برق ایران ICEE2009
%D 2009

[Download]