画像特徴量の正規化によるマルチモーダル音声認識の改善  [in Japanese] Improvement of multimodal speech recognition by normalizing visual features  [in Japanese]

Search this Article

Author(s)

Abstract

音声と口唇動画像を用いたマルチモーダル音声認識は,雑音に強い頑健な音声認識として注目され,研究が行われている.マルチモーダル音声認識では画像特徴量が重要な役割を果たし,オプティカルフローや主成分スコアなど,さまざまな特徴量でその有効性が示されている.画像特徴量に関しては,どのような情報を用いるのかに加え,どのように直交化や正規化などの処理を行うかも,認識性能に重要な影響を及ぼす.そこで本研究では,画像特徴量の直交化について,さまざまな検討を行った.具体的には,画像特徴量を特異値分解や主成分分析を用いることで,認識率の改善に成功した.

Multimodal speech recognition, namely MMASR, which uses speech and lip images has been developed as a robust automatic speech recognition (ASR) against various noises. Visual features, such as optical-flow parameters or principle component analysis (PCA) coefficients, play a great role in MMASR and their effectiveness are proven through experimental results. It is crucial for recognition accuracy not only which visual information should be adopted but also how feature orthogonalization and normalization should be applied. This paper compares conventional normalization methods of visual features and their performances; extracted visual features are converted into uncorrelated parameters using singular value decomposition or PCA, then using these features the recognition accuracy is improved.

Journal

  • IEICE technical report

    IEICE technical report 108(312), 7-12, 2008-11-13

    The Institute of Electronics, Information and Communication Engineers

References:  9

Cited by:  5

Codes

  • NII Article ID (NAID)
    110007114252
  • NII NACSIS-CAT ID (NCID)
    AN10013221
  • Text Lang
    JPN
  • Article Type
    Journal Article
  • ISSN
    09135685
  • NDL Article ID
    9738683
  • NDL Source Classification
    ZN33(科学技術--電気工学・電気機械工業--電子工学・電気通信)
  • NDL Call No.
    Z16-940
  • Data Source
    CJP  CJPref  NDL  NII-ELS 
Page Top