マルチモーダル音声認識におけるストリーム重み係数最適化の検討  [in Japanese] Investigation of a stream - weight optimization method for multi - modal speech recognition  [in Japanese]

Access this Article

Search this Article

Author(s)

Abstract

近年,音声認識の頑健性向上の手法の一つとして,音声情報に加え唇動画像の情報を利用するマルチモーダル音声認識が注目され,多くの研究が進められている.マルチモーダル音声認識で広く用いられているマルチストリームHMMでは,ストリーム重み係数を自動的に調整することが認識性向上に必要不可欠である.本研究では,正解(仮説)単語とその他の単語の尤度の差が最大となるよう,尤度費最大規準の基づくストリーム重み最適化手法を提案する.車載カメラで収録した実環境データを用いた認識実験により,教師なし条件で提案法の評価を行ったところ,MLLR適応と提案手法をあわせて行うことで,音響のみの結果と比べ,約29%の正解精度の改善,約76%の誤り率の削減に成功した.Researches on audio-visual multi-modal speech recognition have recently become very active for increasing the robustness of automatic speech recognition (ASR). For multi-stream HMMs that are widely used in multi-modal ASR, it is important to automatically and properly adjust stream weight factors. This paper proposes a stream-weight optimization technique based on a likelihood-ratio maximization criterion. Experiments were conducted using real-world data in an unsupervised manner. Combining the maximum likelihood liner regression (MLLR) adaptation and our optimization method, we achieved a 29% absolute accuracy improvement and a 76% relative error rate reduction compared with the audio-only scheme.

Researches on audio-visual multi-modal speech recognition have recently become very active for increasing the robustness of automatic speech recognition (ASR). For multi-stream HMMs that are widely used in multi-modal ASR, it is important to automatically and properly adjust stream weight factors. This paper proposes a stream-weight optimization technique based on a likelihood-ratio maximization criterion. Experiments were conducted using real-world data in an unsupervised manner. Combining the maximum likelihood linear regression (MLLR) adaptation and our optimization method, we achieved a 29% absolute accuracy improvement and a 76% relative error rate reduction compared with the audio-only scheme.

Journal

  • IPSJ SIG Notes

    IPSJ SIG Notes 2003(124(2003-SLP-049)), 241-246, 2003-12-18

    Information Processing Society of Japan (IPSJ)

References:  7

Codes

  • NII Article ID (NAID)
    110002913754
  • NII NACSIS-CAT ID (NCID)
    AN10442647
  • Text Lang
    JPN
  • Article Type
    Technical Report
  • ISSN
    09196072
  • NDL Article ID
    6824077
  • NDL Source Classification
    ZM13(科学技術--科学技術一般--データ処理・計算機)
  • NDL Call No.
    Z14-1121
  • Data Source
    CJP  NDL  NII-ELS  IPSJ 
Page Top