実例に基づく強化学習法

畝見 達夫

doi:10.11517/jjsai.7.4_697

<p>This paper proposes a reinforcement learning method based on an instance-based learning approach. The learning take is assumed as follows. The input on each learning cycle is a vector of real numbers, the output is a symbol selected from a Priori known finite set, and the reinforcement from environment is +1, 0 or -1 usually being 0, that is, in the manner of delayed reinforcement. The last assumption makes it difficult to apply any conventional supervised concept learning schema because the evaluation of its output is not given at every cycle. The key idea is to propagate reinforcement backward through the memorized experiences in the order of time. The learner tends to select the output which is associated with the input similar to current situation and which will likely lead to high positive reinforcement, scanning all of the past experiences stored in memory verbatim. In addition to this basic mechanism, two types of extensions are proposed. The first is to restrict the capacity of memory to avoid infinite increase of time and space complexity, replacing the oldest data by new data in each cycle. The second is to embed a feedback mechanism concerning with reliability of each memorized experience. Reliability of the experience employed to decide the output of nearly previous cycle is increased when the learner gets positive reinforcement, and is decreased when negative reinforcement. Experimental results show these learning algorithms work well for a domain of simulating adaptive behavior, and the extension methods are effective.</p>

実例に基づく強化学習法

書誌事項

この論文をさがす

抄録

収録刊行物

被引用文献 (35)*注記

参考文献 (24)*注記

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

実例に基づく強化学習法

書誌事項

この論文をさがす

抄録

収録刊行物

被引用文献 (35)*注記

参考文献 (24)*注記

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

参加プロジェクトリスト

詳細情報詳細情報について