雑音環境における複数モデルを用いた十分統計量に基づく教師なし話者適応 Unsupervised Speaker Adaptation Based on HMM Sufficient Statistics Using Multiple Acoustic Models Under Noisy Environment

Access this Article

Search this Article

Author(s)

Abstract

音声認識において,話者ごとに異なる話者の声の特性を考慮して,音韻モデルの話者適応の研究が行われている.一方で,性別や年齢層などの話者クラスごとに学習したクラス依存音韻モデルを用いることで,不特定話者モデルよりも認識精度は向上する.本研究では,多様な音声データベースが整備されつつある現状を背景に,HMM十分統計量に基づく教師なし話者適応を複数のデータベースおよび複数の初期モデルに拡張する.従来法では単一の不特定話者モデルから適応を行っていたが,提案手法では年齢層や性別などの複数のクラス依存音韻モデルを元に適応を行うことで初期モデルの改善を図る.まず,入力音声に対してGMMから最も音響的特徴の近い話者集合を抽出する.その際に,そのリスト中の近傍話者の属するクラスから,入力音声に最も近いクラス依存音韻モデルを選択する.その後,それを基準モデルとして,そのクラスに対応する近傍話者の十分統計量から音韻モデルを再構築する.JNAS成人および高齢者のデータベースを用い,オフィス・人ごみ・展示会場ブース・車室内の各雑音環境において評価を行ったところ,従来手法に比べて精度が向上することが確かめられた.さらに,教師あり適応のMLLR法と比較したところ,10文章による教師あり適応よりも良い精度が得られることが示された.Speaker adaptation in speech recognition is necessary to achieve a high accuracy for wide varieties of speakers. On the other hand, using class-dependent (CD) acoustic model for specific gender / age class can result to a better accuracy than a single speaker-independent (SI) model. In this research, we extend the unsupervised speaker adaptation based on HMM Sufficient Statistics (HMM-SS) for multiple database and multiple initial models, given a wide varieties of speech database. As opposed to the conventional approach which utilizes only a single SI model as a base model, the proposed method makes use of multiple CD models to push up the performance of initial model before adaptation. A speaker's class is estimated from N-best neighbor speakers by Gaussian Mixture Models (GMM) on the way speaker selection, and the corresponding CD model is adopted as a base model. Then, the unsupervised speaker adaptation is performed by constructing HMM from HMM-SS of the selected speakers. Experiments were carried out on two database namely, adults and senior people by JNAS, and we performed testing under noisy environment conditions such as office, crowd, booth and car noise with 20dB SNR. Recognition results show that the proposed method based on multiple model outperforms the conventional approach. Moreover, comparison with the Maximum Likelihood Liner Regression (MLLR) adaptation with 10 supervised utterance confirms that our method performs better with only a single utterance input.

Speaker adaptation in speech recognition is necessary to achieve a high accuracy for wide varieties of speakers. On the other hand, using class-dependent (CD) acoustic model for specific gender/age class can result to a better accuracy than a single speaker-independent (SI) model. In this research, we extend the unsupervised speaker adaptation based on HMM Sufficient Statistics (HMM-SS) for multiple database and multiple initial models, given a wide varieties of speech database. As opposed to the conventional approach which utilizes only a single SI model as a base model, the proposed method makes use of multiple CD models to push up the performance of initial model before adaptation. A speaker's class is estimated from the N-best neighbor speakers by Gaussian Mixture Models (GMM) on the way of speaker selection, and the corresponding CD model is adopted as a base model. Then, the unsupervised speaker adaptation is performed by constructing HMM from HMM-SS of the selected speakers. Experiments were carried out on two database namely, adults and senior people, by JNAS, and we performed testing under noisy environment conditions such as office, crowd, booth and car noise with 20dB SNR. Recognition results show that the proposed method based on multiple model outperforms the conventional approach. Moreover, comparison with the Maximum Likelihood Linear Regression (MLLR) adaptation with 10 supervised utterance confirms that our method perfroms better with only a single utterance input.

Journal

  • IPSJ SIG Notes

    IPSJ SIG Notes 2004(131(2004-SLP-054)), 205-210, 2004-12-21

    Information Processing Society of Japan (IPSJ)

References:  8

Codes

  • NII Article ID (NAID)
    110002950608
  • NII NACSIS-CAT ID (NCID)
    AN10442647
  • Text Lang
    ENG
  • Article Type
    Technical Report
  • ISSN
    09196072
  • NDL Article ID
    7214202
  • NDL Source Classification
    ZM13(科学技術--科学技術一般--データ処理・計算機)
  • NDL Call No.
    Z14-1121
  • Data Source
    CJP  NDL  NII-ELS  IPSJ 
Page Top