強化学習における報酬割当ての理論的考察

書誌事項

タイトル別名
  • A Theory of Profit Sharing in Reinforcement Learning

この論文をさがす

抄録

<p>Reinforcement learning is a kind of machine learning. It aims to adapt a system to a given environment according to rewards. We consider profit sharing that is a representative reinforcement learning method. A rule sequence applied between reward and reward is called an episode. Profit sharing reinforce rules at each episode. A function that shares the reward between rules of an episode is called a reinforcement function. Conventional work has used ad hoc functions. This paper analyzes reinforcement functions theoretically. First, we examine what a reinforcement function is locally reasonable. We call a rule is ineffective if and only if it is on a detour for any episodes. It is locally reasonable that ineffective rules are suppressed than any effective rules. We have derived the necessary and sufficient condition to suppress any ineffective rules as following inequality ; <LΣ>^^^w___<j=i><f_<i-1> (i=1,…,W). where, L is the maximum number of conflicting effective rules, W is the maximum length of episodes, and f_j is the value of reinforcement for the j-th previous rule applied before the reward. We call this as the ineffective rule suppression theorem. We demonstrate that a profit sharing can learn ineffective rules when the condition is violated. Second, we examine whether reinforcement functions satisfying the condition are globally reasonable. We call a collection of effective rules as a rule selection plan if and only if it selects at most one effective rule per one state. It is globally reasonable that a plan gains reward continuously. We show that the condition is also necessary and sufficient to learn a rewardfull plan. We call this as the rewardfull plan acquisition theorem. We also demonstrate that a profit sharing can learn rewardless plans when the condition is violated.</p>

収録刊行物

  • 人工知能

    人工知能 9 (4), 580-587, 1994-07-01

    一般社団法人 人工知能学会

被引用文献 (98)*注記

もっと見る

参考文献 (17)*注記

もっと見る

詳細情報 詳細情報について

  • CRID
    1390004222628867456
  • NII論文ID
    110002807845
  • NII書誌ID
    AN10067140
  • DOI
    10.11517/jjsai.9.4_580
  • ISSN
    24358614
    21882266
  • 本文言語コード
    ja
  • データソース種別
    • JaLC
    • CiNii Articles
  • 抄録ライセンスフラグ
    使用不可

問題の指摘

ページトップへ