Reducing Computation Time of the Rapid Unsupervised Speaker Adaptation based on HMM-Sufficient Statistics
この論文をさがす
抄録
In real-time speech recognition applications, there is a need to implement a fast and reliable adaptation algorithm. We propose a method to reduce adaptation time of the rapid unsupervised speaker adaptation based on HMM-Sufficient Statistics. We use only a single arbitrary utterance without transcriptions in selecting the N-best speakers' Sufficient Statistics created offline to provide data for adaptation to a target speaker. Further reduction of N-best implies a reduction in adaptation time. However, it degrades recognition performance due to insufficiency of data needed to robustly adapt the model. Linear interpolation of the global HMM-Sufficient Statistics offsets this negative effect and achieves a 50% reduction in adaptation time without compromising the recognition performance. Furthermore, we compared our method with Vocal Tract Length Normalization (VTLN), Maximum A Posteriori (MAP) and Maximum Likelihood Linear Regression (MLLR). Moreover, we tested in office, car, crowd and booth noise environments in 10dB, 15dB, 20dB and 25dB SNRs.
収録刊行物
-
- IEICE Transactions on Information and Systems
-
IEICE Transactions on Information and Systems E90-D (2), 554-561, 2007-02
電子情報通信学会
- Tweet
詳細情報 詳細情報について
-
- CRID
- 1050577309353406336
-
- NII論文ID
- 110007519501
-
- NII書誌ID
- AA10826272
-
- ISSN
- 09168532
-
- HANDLE
- 10061/7823
-
- 本文言語コード
- en
-
- 資料種別
- journal article
-
- データソース種別
-
- IRDB
- CiNii Articles