GVG-AI のための Monte Carlo Tree Search の改善に関する研究

Oh, HyunWoo, 金子, 知適

General Game Playingは，未知で多様なゲームをプレイできるゲーム人工知能の構築を目的とする研究分野である．General Video Game-AI (GVG-AI)は，その中で，ビデオゲームを対象としているものである．本研究ではGVG-AIによく用いられているMonte Carlo Tree Searchの改善のために，２点の新しい変更を提案する．1点目は，未来の小さい報酬を発見する速度の向上のために追加報酬を利用するGreedyUCB1にmixmax backupsを適用して，GreedyUCB1より広い範囲に正確な探索をするようにしたもので，MixMax-Greedy-UCTと呼ぶ．2点目は，エージェントの行動の頻度によって行動にペナルティーを付与することで，新たな行動の実行を誘導する効果をねらったもので，Novelty of Action based Penalty (NAP)と呼ぶ．既存のplain-UCTを適用したエージェントと，MixMax-Greedy-UCTを適用したエージェント，NAPを適用したエージェント，MixMax-Greedy-UCTとNAPを一緒に適用したエージェントをGECCOの2015年のゲームセットで評価した結果，NAPでエージェントの性能を向上させることが示された．

General Game Playing(GGP) is aimed to develop game AI agents that can play diverse games without pretraining. General Video Game-AI (GVG-AI) is specialized for GGP in video games. In this research, we propose two enhancements for the Monte Carlo Tree Search which is commonly used for GVG-AI. The first one is to apply mixmax backups to GreedyUCB1 which gives additional rewards to improve the speed of finding small rewards of the future, to make an accurate search for wider range than GreedyUCB1. We call this method as MixMax-Greedy-UCT. The second one is for exploration of new states by giving a penalty to frequent actions. We call this method as Novelty of Action based Penalty (NAP). Our experiments comparing agent applying plain-UCT, agent applying MixMax-Greedy-UCT, agent applying NAP, agent applying MixMax-Greedy-UCT and NAP with GECCO 2015's game set showed that agent with NAP make better performance than the other agents.

GVG-AI のための Monte Carlo Tree Search の改善に関する研究

Bibliographic Information

Abstract

Journal

Related Projects

Keywords

Details 詳細情報について

Export

Report a problem

GVG-AI のための Monte Carlo Tree Search の改善に関する研究

Bibliographic Information

Abstract

Journal

Related Projects

Keywords

Details 詳細情報について

Export

Report a problem

Project list