囚人のジレンマゲームにおけるQ学習による協調の維持

森山 甲一

doi:10.11309/jssst.25.4_145

書誌事項

タイトル別名

How does Q-learning Maintain Cooperation in Prisoner's Dilemma Games?

抄録

This work deals with Q-learning in a multiagent environment. There are many multiagent Q-learning methods, and most of them aim to converge to a Nash equilibrium, which is not desirable in games like the Prisoner's Dilemma (PD). However, normal Q-learning agents that use a stochastic method in choosing actions to avoid local optima may bring mutual cooperation in PD. Although such mutual cooperation usually occurs singly, it can be maintained if the Q-function of cooperation becomes larger than that of defection after the cooperation. This work derives a theorem on how many times the cooperation is needed to make the Q-function of cooperation larger than that of defection. In addition, from the perspective of the author's previous works that discriminate utilities from rewards and use utilities for learning in PD, this work also derives a corollary on how much utility is necessary to make the Q-function larger by one-shot mutual cooperation.

収録刊行物

コンピュータソフトウェア

コンピュータソフトウェア 25 (4), 145-153, 2008

日本ソフトウェア科学会

詳細情報詳細情報について

CRID: 1390001204736359808

NII論文ID: 130004892116

DOI: 10.11309/jssst.25.4_145

ISSN: 02896540

データソース種別

JaLC
CiNii Articles

抄録ライセンスフラグ: 使用不可

囚人のジレンマゲームにおけるQ学習による協調の維持

書誌事項

抄録

収録刊行物

詳細情報詳細情報について

書き出し

問題の指摘

囚人のジレンマゲームにおけるQ学習による協調の維持

書誌事項

抄録

収録刊行物

詳細情報 詳細情報について

書き出し

問題の指摘

参加プロジェクトリスト

詳細情報詳細情報について