A Training Method of Average Voice Model for HMM-Based Speech Synthesis
-
- YAMAGISHI Junichi
- Interdisciplinary Graduate School of Science and Engineering. Tokyo Institute of Technology
-
- TAMURA Masatsune
- Interdisciplinary Graduate School of Science and Engineering. Tokyo Institute of Technology
-
- MASUKO Takashi
- Interdisciplinary Graduate School of Science and Engineering. Tokyo Institute of Technology
-
- TOKUDA Keiichi
- Department of Computer Science, Nagoya Institute of Technology
-
- KOBAYASHI Takao
- Interdisciplinary Graduate School of Science and Engineering. Tokyo Institute of Technology
Search this article
Abstract
This paper describes a new training method of average voice model for speech synthesis in which arbitrary speaker's voice is generated based on speaker adaptation. When the amount of training data is limited, the distributions of average voice model often have bias depending on speaker and/or gender and this will degrade the quality of synthetic speech. In the proposed method, to reduce the influence of speaker dependence, we incorporate a context clustering technique called shared decision tree context clustering and speaker adaptive training into the training procedure of average voice model. From the results of subjective tests, we show that the average voice model trained using the proposed method generates more natural sounding speech than the conventional average voice model. Moreover, it is shown that voice characteristics and prosodic features of synthetic speech generated from the adapted model using the proposed method are closer to the target speaker than the conventional method.
Journal
-
- IEICE Trans. Fundamentals, A
-
IEICE Trans. Fundamentals, A 86 (8), 1956-1963, 2003-08-01
The Institute of Electronics, Information and Communication Engineers
- Tweet
Keywords
Details 詳細情報について
-
- CRID
- 1571980077248221568
-
- NII Article ID
- 110003221277
-
- NII Book ID
- AA10826239
-
- ISSN
- 09168508
-
- Text Lang
- en
-
- Data Source
-
- CiNii Articles