On the Convergence of Stochastic Iterative Dynamic Programming Algorithms

Tommi Jaakkola, Michael I. Jordan, Satinder P. Singh

doi:10.1162/neco.1994.6.6.1185

On the Convergence of Stochastic Iterative Dynamic Programming Algorithms

DOI Web Site 被引用文献5件

Tommi Jaakkola

Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
Michael I. Jordan

Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139 USA
Satinder P. Singh

Department of Computer Science, University of Massachusetts, Amherst, MA 01003 USA

抄録

<jats:p> Recent developments in the area of reinforcement learning have yielded a number of new algorithms for the prediction and control of Markovian environments. These algorithms, including the TD(λ) algorithm of Sutton (1988) and the Q-learning algorithm of Watkins (1989), can be motivated heuristically as approximations to dynamic programming (DP). In this paper we provide a rigorous proof of convergence of these DP-based learning algorithms by relating them to the powerful techniques of stochastic approximation theory via a new convergence theorem. The theorem establishes a general class of convergent algorithms to which both TD(λ) and Q-learning belong. </jats:p>

収録刊行物

Neural Computation

Neural Computation 6 (6), 1185-1201, 1994-11

MIT Press - Journals

被引用文献 (5)*注記

詳細情報詳細情報について

CRID

1361981468552361600
NII論文ID

30036176680
DOI

10.1162/neco.1994.6.6.1185
ISSN

1530888X

08997667
Web Site

https://www.mitpressjournals.org/doi/pdf/10.1162/neco.1994.6.6.1185
データソース種別
- Crossref
- CiNii Articles

書き出し

問題の指摘

ページトップへ

On the Convergence of Stochastic Iterative Dynamic Programming Algorithms

抄録

収録刊行物

被引用文献 (5)*注記

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

参加プロジェクトリスト

詳細情報詳細情報について