-
- 畝見 達夫
- 長岡技術科学大学工学部計画・経営系
書誌事項
- タイトル別名
-
- Instance-Based Reinforcement Learning Method
この論文をさがす
抄録
<p>This paper proposes a reinforcement learning method based on an instance-based learning approach. The learning take is assumed as follows. The input on each learning cycle is a vector of real numbers, the output is a symbol selected from a Priori known finite set, and the reinforcement from environment is +1, 0 or -1 usually being 0, that is, in the manner of delayed reinforcement. The last assumption makes it difficult to apply any conventional supervised concept learning schema because the evaluation of its output is not given at every cycle. The key idea is to propagate reinforcement backward through the memorized experiences in the order of time. The learner tends to select the output which is associated with the input similar to current situation and which will likely lead to high positive reinforcement, scanning all of the past experiences stored in memory verbatim. In addition to this basic mechanism, two types of extensions are proposed. The first is to restrict the capacity of memory to avoid infinite increase of time and space complexity, replacing the oldest data by new data in each cycle. The second is to embed a feedback mechanism concerning with reliability of each memorized experience. Reliability of the experience employed to decide the output of nearly previous cycle is increased when the learner gets positive reinforcement, and is decreased when negative reinforcement. Experimental results show these learning algorithms work well for a domain of simulating adaptive behavior, and the extension methods are effective.</p>
収録刊行物
-
- 人工知能
-
人工知能 7 (4), 697-707, 1992-07-01
一般社団法人 人工知能学会
- Tweet
キーワード
詳細情報 詳細情報について
-
- CRID
- 1390004222628654976
-
- NII論文ID
- 110002807614
-
- NII書誌ID
- AN10067140
-
- ISSN
- 24358614
- 21882266
-
- 本文言語コード
- ja
-
- データソース種別
-
- JaLC
- CiNii Articles
-
- 抄録ライセンスフラグ
- 使用不可