An Analysis of Actor-Critic Algorithms Using Eligibility Traces : Reinforcement Learning with Imperfect Value Functions

  • KIMURA Hajime
    Graduate School of Interdisciplinary Science and Engineering, Tokyo Institute of Technology.
  • KOBAYASHI Shigenobu
    Graduate School of Interdisciplinary Science and Engineering, Tokyo Institute of Technology.

Bibliographic Information

Other Title
  • Actorに適正度の履歴を用いたActor-Criticアルゴリズム : 不完全なValue-Functionのもとでの強化学習
  • Actor ニ テキセイド ノ リレキ オ モチイタ Actor Critic アルゴリズム フカンゼン ナ Value Function ノ モト デ ノ キョウカ ガクシュウ

Search this article

Abstract

<p>We present an analysis of actor-critic algorithms, in which the actor updates its policy using eligibility traces of the policy parameters. Most of the theoretical results for eligibility traces have been for only critic's value iteration algorithms. This paper investigates what the actor's eligibility trace does. The results show that the algorithm is an extension of Williams' REINFORCE algorithms for infinite horizon reinforcement tasks, and then the critic provides an appropriate reinforcement baseline for the actor. Thanks to the actor's eligibility trace, the actor improves its policy by using a gradient of actual return, not by using a gradient of the estimated return in the critic. It enables the agent to learn a fairly good policy under the condition that the approximated value function in the critic is hopelessly inaccurate for conventional actor-critic algorithms. Also, if an accurate value function is estimated by the critic, the actor's learning is dramatically accelerated in our test cases. The behavior of the algorithm is demonstrated through simulations of a linear quadratic control problem and a pole balancing problem.</p>

Journal

Citations (27)*help

See more

References(26)*help

See more

Details 詳細情報について

Report a problem

Back to top