音声認識結果とコンセプトヘの重みづけによるWFSTに基づく音声言語理解の高精度化 Improving WFST-based Language Understanding Accuracy by Weighting for ASR Results and Concepts

この論文にアクセスする

この論文をさがす

著者

    • 福林雄一朗 FUKUBAYASHI YUICHIRO
    • 京都大学大学院情報学研究科知能情報学専攻 Dept. of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University
    • 駒谷 和範 KOMATANI KAZUNORI
    • 京都大学大学院情報学研究科知能情報学専攻 Dept. of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University
    • 尾形 哲也 OGATA TETSUYA
    • 京都大学大学院情報学研究科知能情報学専攻 Dept. of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University
    • 奥乃博 OKUNO HlROSHI G.
    • 京都大学大学院情報学研究科知能情報学専攻 Dept. of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University

抄録

WCightedFiniteStaMransduccr(WFST)を用いた言語理解では,入力となる音声認識結果の単語列に対して,各単語に適切な重みを与えることで頑健な言語理解を実現する.しかし一般にその学習には大量のデータが必要であるため,新たなドメインで構築した音声対話システムにおいてWFSTを用いた言語理解は困難であった.そこで我々は,音声認識結果をフィラーや単語,コンセプトなどとして抽象化し,これらに対して音素数や音声認識の信頼度を利用した重みを割当てる方法を開発した.これにより,大量の学習データが用意できない状況でも頑健な言語理解部を容易に構築できる.評価実験では,発話の音声認識率に応じて重みを適切に設定することで,言語理解精度が向上することを確認した.この結果は,音声認識率やユーザなどの状況に合わせて重みづけを選択することで言語理解精度が向上する可能性を示したWeighted Finite State Transducers (WFST) have become common as language understanding modules in spoken dialogue systems. WFSTs promote robust language understanding by assigning appropriate weights to a sequence of recognized words. However, it is difficult to make a language understanding module for a new spoken dialogue system using a WFST because a lot of training data is required to learn the weights. To create a robust language understanding module with less training data, we developed a model in which ASR results are classified into two classes, fillers and accepted words and concepts are formed by the latter. We then assign appropriate weights to these simplified results and concepts. The weights are designed by considering the number of phonemes and their ASR confidence. Experimental results showed that the language understanding accuracy was improved when the optimal setting from these parameters was selected based on the ASR accuracy of the utterance. This result shows that a language understanding accuracy can be improved if the optimal setting is selected according to the environment such as users and ASR accuracy of the utterance.

Weighted Finite State Transducers (WFST) have become common as language understanding modules in spoken dialogue systems. WFSTs promote robust language understanding by assigning appropriate weights to a sequence of recognized words. However, it is difficult to make a language understanding module for a new spoken dialogue system using a WFST because a lot of training data is required to learn the weights. To create a robust language understanding module with less training data, we developed a model in which ASR results are classified into two classes, fillers and accepted words and concepts are formed by the latter. We then assign appropriate weights to these simplified results and concepts. The weights are designed by considering the number of phonemes and their ASR confidence. Experimental results showed that the language understanding accuracy was improved when the optimal setting from these parameters was selected based on the ASR accuracy of the utterance. This result shows that a language understanding accuracy can be improved if the optimal setting is selected according to the environment such as users and ASR accuracy of the utterance.

収録刊行物

  • 情報処理学会研究報告音声言語情報処理(SLP)

    情報処理学会研究報告音声言語情報処理(SLP) 2007(47(2007-SLP-066)), 43-48, 2007-05-25

    一般社団法人情報処理学会

各種コード

  • NII論文ID(NAID)
    110006291116
  • NII書誌ID(NCID)
    AN10442647
  • 本文言語コード
    JPN
  • 資料種別
    Technical Report
  • ISSN
    09196072
  • NDL 記事登録ID
    8762632
  • NDL 雑誌分類
    ZM13(科学技術--科学技術一般--データ処理・計算機)
  • NDL 請求記号
    Z14-1121
  • データ提供元
    NDL  NII-ELS  IPSJ 
ページトップへ