Reinforcement Learning to Create Value and Policy Functions Using Minimax Tree Search in Hex

HANDLE オープンアクセス

抄録

Recently, the use of reinforcement-learning algorithms has been proposed to create value and policy functions, and their effectiveness has been demonstrated using Go, Chess, and Shogi. In previous studies, the policy function was trained to predict the search probabilities of each move output by Monte Carlo tree search; thus, a number of simulations were required to obtain the search probabilities. We propose a reinforcement-learning algorithm with game of self-play to create value and policy functions such that the policy function is trained directly from the game results without the search probabilities. In this study, we use Hex, a board game developed by Piet Hein, to evaluate the proposed method. We demonstrate the effectiveness of the proposed learning algorithm in terms of the policy function accuracy, and play a tournament with the proposed computer Hex algorithm DeepEZO and 2017 world-champion programs. The tournament results demonstrate that DeepEZO outperforms all programs. DeepEZO achieved a winning percentage of 79.3% against the world-champion program MoHex2.0 under the same search conditions on $13 \times 13$ board. We also show that the highly accurate policy functions can be created by training the policy functions to increase the number of moves to be searched in the loser position.

収録刊行物

  • IEEE Transactions on Games

    IEEE Transactions on Games 12 (1), 63-73, 2020-03

    IEEE (Institute of Electrical and Electronics Engineers)

詳細情報 詳細情報について

  • CRID
    1050003824771246336
  • NII論文ID
    120006843074
  • HANDLE
    2115/77885
  • ISSN
    24751502
  • 本文言語コード
    en
  • 資料種別
    journal article
  • データソース種別
    • IRDB
    • CiNii Articles

問題の指摘

ページトップへ