An Information-Theoretic Analysis of Return Maximization in Reinforcement Learning
この論文をさがす
抄録
application/pdf
We present a general analysis of return maximization in reinforcement learning. This analysis does not require assumptions of Markovianity, stationarity, and ergodicity for the stochastic sequential decision processes of reinforcement learning. Instead, our analysis assumes the asymptotic equipartition property fundamental to information theory, providing a substantially different view from that in the literature. As our main results, we show that return maximization is achieved by the overlap of typical and best sequence sets, and we present a class of stochastic sequential decision processes with the necessary condition for return maximization. We also describe several examples of best sequences in terms of return maximization in the class of stochastic sequential decision processes, which satisfy the necessary condition.
収録刊行物
-
- Neural Networks
-
Neural Networks 24 (10), 1074-1081, 2011-12
Elsevier
- Tweet
詳細情報 詳細情報について
-
- CRID
- 1050859536460069248
-
- NII論文ID
- 120006353330
-
- NII書誌ID
- AA10680676
-
- ISSN
- 08936080
-
- 本文言語コード
- en
-
- 資料種別
- journal article
-
- データソース種別
-
- IRDB
- Crossref
- CiNii Articles
- KAKEN