<i>Combining Local and Global Exploration via Intrinsic Rewards</i>

DOI
  • BOUGIE Nicolas
    The Graduate University for Advanced Studies, SOKENDAI National Institute of Informatics
  • ICHISE Ryutaro
    The Graduate University for Advanced Studies, SOKENDAI National Institute of Informatics

抄録

<p>Reinforcement learning methods rely on well-designed rewards provided by the environment. However, rewards are often sparse in the real world, which entails that exploration remains one of the key challenges of reinforcement learning. While prior work on intrinsic motivation hold promise of better local exploration, discovering global exploration strategies is beyond the reach of current methods. We propose a novel end-to-end intrinsic reward formulation that introduces high-level exploration in reinforcement learning. Our technique decomposes the exploration bonus into a fast reward that deals with local exploration and a slow reward that incentivizes long-time horizon exploration. We formulate curiosity as the error in an agent’s ability to reconstruct the observations given their contexts. We further propose to balance local and high-level strategies by estimating state diversity. Experimental results show that this long-time horizon exploration bonus enables our agents to outperform prior work in most tasks, including Minigrid, and Atari games.</p>

収録刊行物

詳細情報 詳細情報について

  • CRID
    1390285300166126848
  • NII論文ID
    130007856969
  • DOI
    10.11517/pjsai.jsai2020.0_2k6es205
  • 本文言語コード
    ja
  • データソース種別
    • JaLC
    • CiNii Articles
  • 抄録ライセンスフラグ
    使用不可

問題の指摘

ページトップへ