状態非依存の方策を用いた新しい強化学習手法の提案  [in Japanese] <b>Proposal of New Reinforcement Learning with a State-independent Policy</b>  [in Japanese]

Access this Article

Search this Article

Author(s)

Abstract

Usually, reinforcement learning (RL) algorithms have a difficulty to learn the optimal control policy as the dimensionality of the state (and action) becomes large, because of the explosive increase in the search space to optimize. To avoid such an unfavorable explosive increase, in this study, we propose BASLEM algorithm (Blind Action Sequence Learning with EM algorithm) which acquires a state-independent and time-dependent control policy starting from a certain fixed initial state. Numerical simulation to control a non-holonomic system shows that RL of state-independent and time-dependent policies attain great improvement in efficiency over the existing RL algorithm.

Journal

  • Transactions of the Institute of Systems, Control and Information Engineers

    Transactions of the Institute of Systems, Control and Information Engineers 27(8), 327-332, 2014

    THE INSTITUTE OF SYSTEMS, CONTROL AND INFORMATION ENGINEERS (ISCIE)

Codes

  • NII Article ID (NAID)
    130004707732
  • NII NACSIS-CAT ID (NCID)
    AN1013280X
  • Text Lang
    JPN
  • ISSN
    1342-5668
  • NDL Article ID
    025637975
  • NDL Call No.
    Z14-195
  • Data Source
    NDL  J-STAGE 
Page Top