Speaker Adaptive Training Localizing Speaker Modules in DNN for Hybrid DNN-HMM Speech Recognizers
-
- OCHIAI Tsubasa
- Graduate School of Science and Engineering, Doshisha University
-
- MATSUDA Shigeki
- National Institution of Information and Communications Technology
-
- WATANABE Hideyuki
- National Institution of Information and Communications Technology
-
- LU Xugang
- National Institution of Information and Communications Technology
-
- HORI Chiori
- National Institution of Information and Communications Technology
-
- KAWAI Hisashi
- National Institution of Information and Communications Technology
-
- KATAGIRI Shigeru
- Graduate School of Science and Engineering, Doshisha University
Abstract
<p>Among various training concepts for speaker adaptation, Speaker Adaptive Training (SAT) has been successfully applied to a standard Hidden Markov Model (HMM) speech recognizer, whose state is associated with Gaussian Mixture Models (GMMs). On the other hand, focusing on the high discriminative power of Deep Neural Networks (DNNs), a new type of speech recognizer structure, which combines DNNs and HMMs, has been vigorously investigated in the speaker adaptation research field. Along these two lines, it is natural to conceive of further improvement to a DNN-HMM recognizer by employing the training concept of SAT. In this paper, we propose a novel speaker adaptation scheme that applies SAT to a DNN-HMM recognizer. Our SAT scheme allocates a Speaker Dependent (SD) module to one of the intermediate layers of DNN, treats its remaining layers as a Speaker Independent (SI) module, and jointly trains the SD and SI modules while switching the SD module in a speaker-by-speaker manner. We implement the scheme using a DNN-HMM recognizer, whose DNN has seven layers, and elaborate its utility over TED Talks corpus data. Our experimental results show that in the supervised adaptation scenario, our Speaker-Adapted (SA) SAT-based recognizer reduces the word error rate of the baseline SI recognizer and the lowest word error rate of the SA SI recognizer by 8.4% and 0.7%, respectively, and by 6.4% and 0.6% in the unsupervised adaptation scenario. The error reductions gained by our SA-SAT-based recognizers proved to be significant by statistical testing. The results also show that our SAT-based adaptation outperforms, regardless of the SD module layer selection, its counterpart SI-based adaptation, and that the inner layers of DNN seem more suitable for SD module allocation than the outer layers.</p>
Journal
-
- IEICE Transactions on Information and Systems
-
IEICE Transactions on Information and Systems E99.D (10), 2431-2443, 2016
The Institute of Electronics, Information and Communication Engineers
- Tweet
Details 詳細情報について
-
- CRID
- 1390282679356533760
-
- NII Article ID
- 130005598241
-
- ISSN
- 17451361
- 09168532
-
- Text Lang
- en
-
- Data Source
-
- JaLC
- Crossref
- CiNii Articles
- KAKEN
-
- Abstract License Flag
- Disallowed