Reinforcement Learning of Multi-Party Trading Dialog Policies

DOI Web Site 18 References Open Access

Abstract

<p>Trading dialogs are a kind of negotiation in which an exchange of ownership of items is discussed, and these kinds of dialogs are pervasive in many situations. Recently, there has been an increasing amount of research on applying reinforcement learning (RL) to negotiation dialog domains. However, in previous research, the focus was on negotiation dialog between two participants only, ignoring cases where negotiation takes place between more than two interlocutors. In this paper, as a first study on multi-party negotiation, we apply RL to a multi-party trading scenario where the dialog system (learner) trades with one, two, or three other agents. We experiment with different RL algorithms and reward functions. We use Q-learning with linear function approximation, least-squares policy iteration, and neural fitted Q iteration. In addition, to make the learning process more efficient, we introduce an incremental reward function. The negotiation strategy of the learner is learned through simulated dialog with trader simulators. In our experiments, we evaluate how the performance of the learner varies depending on the RL algorithm used and the number of traders. Furthermore, we compare the learned dialog policies with two strong hand-crafted baseline dialog policies. Our results show that (1) even in simple multi-party trading dialog tasks, learning an effective negotiation policy is not a straightforward task and requires a lot of experimentation; and (2) the use of neural fitted Q iteration combined with an incremental reward function produces negotiation policies as effective or even better than the policies of the two strong hand-crafted baselines.</p>

Journal

References(18)*help

See more

Related Projects

See more

Details 詳細情報について

Report a problem

Back to top