Reinforcement Learning of Multi-Party Trading Dialog Policies
-
- Hiraoka Takuya
- Nara Institute of Science and Technology
-
- Georgila Kallirroi
- University of Southern California Institute for Creative Technologies
-
- Nouri Elnaz
- University of Southern California Institute for Creative Technologies
-
- Traum David
- University of Southern California Institute for Creative Technologies
-
- Nakamura Satoshi
- Nara Institute of Science and Technology
Abstract
<p>Trading dialogs are a kind of negotiation in which an exchange of ownership of items is discussed, and these kinds of dialogs are pervasive in many situations. Recently, there has been an increasing amount of research on applying reinforcement learning (RL) to negotiation dialog domains. However, in previous research, the focus was on negotiation dialog between two participants only, ignoring cases where negotiation takes place between more than two interlocutors. In this paper, as a first study on multi-party negotiation, we apply RL to a multi-party trading scenario where the dialog system (learner) trades with one, two, or three other agents. We experiment with different RL algorithms and reward functions. We use Q-learning with linear function approximation, least-squares policy iteration, and neural fitted Q iteration. In addition, to make the learning process more efficient, we introduce an incremental reward function. The negotiation strategy of the learner is learned through simulated dialog with trader simulators. In our experiments, we evaluate how the performance of the learner varies depending on the RL algorithm used and the number of traders. Furthermore, we compare the learned dialog policies with two strong hand-crafted baseline dialog policies. Our results show that (1) even in simple multi-party trading dialog tasks, learning an effective negotiation policy is not a straightforward task and requires a lot of experimentation; and (2) the use of neural fitted Q iteration combined with an incremental reward function produces negotiation policies as effective or even better than the policies of the two strong hand-crafted baselines.</p>
Journal
-
- Transactions of the Japanese Society for Artificial Intelligence
-
Transactions of the Japanese Society for Artificial Intelligence 31 (4), B-FC1_1-14, 2015
The Japanese Society for Artificial Intelligence
- Tweet
Details 詳細情報について
-
- CRID
- 1390282680085825152
-
- NII Article ID
- 130006887093
-
- ISSN
- 13468030
- 13460714
-
- Text Lang
- en
-
- Data Source
-
- JaLC
- Crossref
- CiNii Articles
- KAKEN
-
- Abstract License Flag
- Disallowed