行動評価関数を用いたモンテカルロ木探索の重点化と見落としの抑制  [in Japanese] Knowledge Bias in Monte-Carlo Tree Search and Techniques for Reducing Oversight Mistakes  [in Japanese]

Access this Article

Search this Article

Author(s)

Abstract

モンテカルロ木探索は現在囲碁プログラムの主流であり,基本となるアルゴリズムにさまざまに工夫が加えられ用いられている.シミュレーション部分において,行動評価関数などを用いて良い手を高い確率で打つことは,全合法手を等確率で選ぶ場合に比べ効果的であることはよく知られる.この評価関数は木探索部分への利用も可能であり,有望な着手に探索を重点化したり,あるいはそれらのみに探索を着手限定したりといったことも行われる.本論文では,行動評価関数を木探索部分で活用する着手限定と重点化の2つの方法の効果やパラメータの影響,組合せた場合の性能を,囲碁プログラムNomitan,Fuegoを用いた実験により示す.そのうえで,着手限定で生じる"見落とし"を抑制するための3つの方法を提案し,NomitanのFuegoに対する勝率が4,000試合ずつの実験で57.7%から64.5%に向上したことを示す.Monte-Carlo Tree Search is now the most popular method for the game of Go, and many techniques and variations are used to improve the strength. It is already known that biased Monte-Carlo simulations using a probability model containing static knowledge are more efficient than random simulations. Such probability models can be also used in the tree search policy to bias the search or limit the search to a subset of the legal moves. In this article, first we describe more precisely how static knowledge can be used to improve the tree search policy. Then, we show how to reduce the oversight mistakes caused by the limitation of the number of searched moves. We confirm experimentally the efficiency of the proposed methods, with a large number of games using our Go program Nomitan, against Fuego, an open source program. The winning ratio of our program is increased from 57.7% to 64.5% (4,000 games for each).

Monte-Carlo Tree Search is now the most popular method for the game of Go, and many techniques and variations are used to improve the strength. It is already known that biased Monte-Carlo simulations using a probability model containing static knowledge are more efficient than random simulations. Such probability models can be also used in the tree search policy to bias the search or limit the search to a subset of the legal moves. In this article, first we describe more precisely how static knowledge can be used to improve the tree search policy. Then, we show how to reduce the oversight mistakes caused by the limitation of the number of searched moves. We confirm experimentally the efficiency of the proposed methods, with a large number of games using our Go program Nomitan, against Fuego, an open source program. The winning ratio of our program is increased from 57.7% to 64.5% (4,000 games for each).

Journal

  • IPSJ Journal

    IPSJ Journal 55(11), 2377-2388, 2014-11-15

    Information Processing Society of Japan (IPSJ)

Codes

  • NII Article ID (NAID)
    110009843044
  • NII NACSIS-CAT ID (NCID)
    AN00116647
  • Text Lang
    JPN
  • Article Type
    Journal Article
  • ISSN
    1882-7764
  • Data Source
    NII-ELS  IR  IPSJ 
Page Top