Learning to understand multimodal commands and feedback for human-robot interaction ヒューマンロボットインタラクションのためのマルチモーダルなコマンドとフィードバックの理解の学習

Access this Article
Search this Article
Author
    • Anja Nicole Austermann アーニャ ニコル オースタマン
Bibliographic Information
Title

Learning to understand multimodal commands and feedback for human-robot interaction

Other Title

ヒューマンロボットインタラクションのためのマルチモーダルなコマンドとフィードバックの理解の学習

Author

Anja Nicole Austermann

Author(Another name)

アーニャ ニコル オースタマン

University

総合研究大学院大学

Types of degree

博士 (情報学)

Grant ID

甲第1384号

Degree year

2010-09-30

Note and Description

博士論文

Understanding a user's natural interaction is a challenge that needs to be addressed in order to enable novice users to use robots smoothly and intuitively. While using a set of hard-coded commands to control a robot is usually rather reliable and easy to implement, it is troublesome for the user, because it requires him/her to learn and remember special commands in order to interact with the robot and does not allow the user to use his or her natural interaction style. Understanding natural, unrestricted spoken language and multi-modal user behavior would be desirable but is still an unsolved problem. Therefore, this dissertation proposes a domain-specific approach to enable a robot to learn to understand its user's natural way of giving commands and feedback through natural interaction in special virtual training tasks. The user teaches the robot to understand his/her individual way of expressing approval, disapproval and a limited number of commands using speech, prosody and touch. In order to enable the robot to pro-actively explore how the user gives commands and provoke approving and disapproving reactions, the system uses special training tasks. During the training, the robot cannot actually understand its user. In order to enable the robot to react appropriately anyway, the training tasks are designed in such a way that the robot can anticipate the user's commands and feedback - e.g. by using games which allow the user to judge easily whether a move of the robot was good or bad and give appropriate feedback, so that the robot can accurately guess whether to expect positive or negative feedback and even provoke the feedback it wants to learn by deliberately making good or bad moves. In this work, "virtual" training tasks are used to avoid time-consuming walking motion and to enable the robot to access all properties of the task instantly. The task-scene is shown on a screen and the robot visualizes its actions by motion, sounds and its LEDs. A first experiment for learning positive and negative feedback uses easy games, like "Connect Four" and "Pairs" in which the robot could explore the user's feedback behavior by making good or bad moves. In a follow-up study, which was conducted with a child-sized humanoid robot as well as pet-robot AIBO, this work has been extended for learning simple commands. The experiments used a "virtual living room", a simplified living room scene, in which the user can ask the robot to fulfill tasks such as switching on the TV or serying a coffee. After learning the names of the different objects in the room by pointing at them and asking the user to name them, the robot requests from the task server to show a situation that requires a certain action to be performed by the robot: E.g. the light is switched off so that the room is too dark. The user responds to this situation by giving the appropriate command to the robot: "Hey robot, can you switch the light on?" or "It's too dark here!". By correct/incorrect performance, the robot can provoke positive/negative feedback from the user. One of the benefits of "virtual" training tasks is that the robot can learn commands, that the user cannot teach by demonstration, but which seem to be necessary for a service or entertainment robot, like showing the battery status, recharging, shutting down, etc. The robot learns by a two-staged algorithm based on Hidden Markov Models and classical conditioning, which is inspired by associative learning in humans and animals. In the first stage, which corresponds to the stimulus encoding in natural learning, unsupervised training of HMMs is used to model the incoming speech and prosody stimuli. Touch stimuli are represented using a simple duration-based model. Unsupervised training of HMMs allows the system to cluster similar perceptions without depending on explicit transcriptions of what the user has said or done, which are not available when learning through natural interaction. Utterances and meanings can usually not be mapped one-to-one, because the same meaning can be expressed by multiple utterances, and utterances can have different meanings. This is handled by the associative learning stage. It associates the trained HMMs with meanings and integrates perceptions from different modalities, using an implementation of classical conditioning. The meanings are inferred from the robot's situation. E.g. If the robot just requested the task server to show a dirty spot on the carpet, the robot assumes, the following utterance means clean(carpet), so the system first searches for a match of any of the HMMs, associated with the meaning "carpet". Then, the remainder of the utterance is used to train a HMM sequence to be associated with the meaning "to clean". The positions of the detected parameters are used to insert appropriate placeholders in the recognition grammar. In a first study, based on game-like tasks, the robot learned to discriminate between positive and negative feedback based on speech, prosody and touch with an average accuracy of 95.97%. The performance in the more complex command learning task is 84.45% for distinguishing eight commands with 16 possible parameters.

application/pdf

総研大甲第1384号

15access
Codes
  • NII Article ID (NAID)
    500000538784
  • NII Author ID (NRID)
    • 8000000540786
  • Text Lang
    • eng
  • NDLBibID
    • 000011198842
  • Source
    • Institutional Repository
    • NDL ONLINE
Page Top