対局に基づいた教師データの重要度の学習  [in Japanese] Learning Weights of Training Data by Game Results  [in Japanese]

Access this Article

Search this Article

Author(s)

    • 佐藤 佳州 Yoshikuni Sato
    • 筑波大学大学院システム情報工学研究科|現在,パナソニック株式会社先端技術研究所 Graduate School of Systems and Information Engineering, University of Tsukuba | Presently with Advanced Technology Research Laboratories, Panasonic Corporation

Abstract

近年,ゲームプログラミングの分野では機械学習が大きな注目を集めており,評価関数,探索深さ,モンテカルロ木探索のplayoutの方策等,多くのパラメータの学習で成功を収めている.現在のゲームプログラミングにおける機械学習では,人間のエキスパートの棋譜を教師として,その指し手に近づけるようにパラメータの調整を行っている.しかし,将棋等のゲームでは,コンピュータはすでに人間のトッププレイヤに迫る強さとなっており,単純に人間の指し手を再現することが必ずしも「強い」プレイヤの生成に結び付くとは限らない.本論文では,このような課題を改善するため,教師データに重要度を導入した学習手法を提案する.提案手法では,勝率を適応度とした進化的計算による重要度の学習と,重要度に従ったパラメータ学習を組み合わせた学習を行う.提案手法を将棋の評価関数,実現確率,playoutの方策の学習へ適用した結果,従来手法との対局実験において有意に勝ち越すことに成功し,その有効性を示した.また,実験結果から局面の進行度や戦術等によって教師データの重要度に違いが生じることが分かり,教師データの効果的な利用により,より強いプログラムを実現する知識の獲得が可能となることを示した.Recently, machine learning is attracting much attention in the field of game programming, and it has succeeded in tuning evaluation functions, search depth, playout policies in Monte-Carlo Tree Search, etc. Existing machine learning methods in game programming tune parameters by using game records of human expert players. However, computer programs have almost the same strength as human professional players in some games such as shogi. Thus, learning by simply using human records is not necessarily good for generating strong computer players. In this paper, we propose a new learning method that estimates the importance of each training record by playing many games and tunes parameters according to the importance. The experimental results show the effectiveness of our method for learning evaluation functions, realization probability search, and playout policies. Moreover, the results show that features of training data such as progress of games or tactics affects their importance.

Recently, machine learning is attracting much attention in the field of game programming, and it has succeeded in tuning evaluation functions, search depth, playout policies in Monte-Carlo Tree Search, etc. Existing machine learning methods in game programming tune parameters by using game records of human expert players. However, computer programs have almost the same strength as human professional players in some games such as shogi. Thus, learning by simply using human records is not necessarily good for generating strong computer players. In this paper, we propose a new learning method that estimates the importance of each training record by playing many games and tunes parameters according to the importance. The experimental results show the effectiveness of our method for learning evaluation functions, realization probability search, and playout policies. Moreover, the results show that features of training data such as progress of games or tactics affects their importance.

Journal

  • IPSJ Journal

    IPSJ Journal 55(11), 2399-2409, 2014-11-15

    Information Processing Society of Japan (IPSJ)

Codes

  • NII Article ID (NAID)
    110009843047
  • NII NACSIS-CAT ID (NCID)
    AN00116647
  • Text Lang
    JPN
  • Article Type
    Journal Article
  • ISSN
    1882-7764
  • Data Source
    NII-ELS  IPSJ 
Page Top