複数アクションを選択する Adversarial Bandit 問題について Adversarial Bandit Problems with Multiple Plays

Search this Article

Author(s)

Abstract

Auerらにより研究されたadversarial bandit問題は,プレーヤーが選択したアクションに対する報酬生成過程において確率的な仮定をおかないmulti-armed bandit問題である.本稿ではadversarial bandit問題を,各時刻においてk(≧1)回のアクションを選択できるように拡張し,アクションの重複選択を許す場合と許さない場合の2つの設定で分析を行う.両方の設定において,Auerらが提案したアルゴリズムExp3を一般化し,最適固定アクション集合に対する損失上界の一般化を得る.

Adversarial bandit problems studied by Auer et al. are the multi-armed bandit problems in which no stochastic assumption is made on the nature of the process generating the rewards for actions. In this paper, we extend their theories to those in the case when k(≧1) actions are selected at each time step. We analyze the problems in two settings of multiple plays, that is, in the settings of selections with and without repetition. In the both settings, we generalize the Exp3 family algorithms developed by them and give the generalized upper bounds on the regrets for the best fixed set of actions.

Journal

  • IEICE technical report

    IEICE technical report 109(195), 13-20, 2009-09-07

    The Institute of Electronics, Information and Communication Engineers

References:  10

Codes

  • NII Article ID (NAID)
    110007387428
  • NII NACSIS-CAT ID (NCID)
    AN10013152
  • Text Lang
    ENG
  • Article Type
    ART
  • ISSN
    09135685
  • NDL Article ID
    10390011
  • NDL Source Classification
    ZN33(科学技術--電気工学・電気機械工業--電子工学・電気通信)
  • NDL Call No.
    Z16-940
  • Data Source
    CJP  NDL  NII-ELS 
Page Top