HMM-Based Style Control for Expressive Speech Synthesis with Arbitrary Speaker's Voice Using Model Adaptation
-
- NOSE Takashi
- Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology
-
- TACHIBANA Makoto
- Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology
-
- KOBAYASHI Takao
- Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology
この論文をさがす
抄録
This paper presents methods for controlling the intensity of emotional expressions and speaking styles of an arbitrary speaker's synthetic speech by using a small amount of his/her speech data in HMM-based speech synthesis. Model adaptation approaches are introduced into the style control technique based on the multiple-regression hidden semi-Markov model (MRHSMM). Two different approaches are proposed for training a target speaker's MRHSMMs. The first one is MRHSMM-based model adaptation in which the pretrained MRHSMM is adapted to the target speaker's model. For this purpose, we formulate the MLLR adaptation algorithm for the MRHSMM. The second method utilizes simultaneous adaptation of speaker and style from an average voice model to obtain the target speaker's style-dependent HSMMs which are used for the initialization of the MRHSMM. From the result of subjective evaluation using adaptation data of 50 sentences of each style, we show that the proposed methods outperform the conventional speaker-dependent model training when using the same size of speech data of the target speaker.
収録刊行物
-
- IEICE Transactions on Information and Systems
-
IEICE Transactions on Information and Systems E92-D (3), 489-497, 2009
一般社団法人 電子情報通信学会
- Tweet
キーワード
詳細情報 詳細情報について
-
- CRID
- 1390282679354109568
-
- NII論文ID
- 10026808084
-
- NII書誌ID
- AA10826272
-
- ISSN
- 17451361
- 09168532
-
- 本文言語コード
- en
-
- データソース種別
-
- JaLC
- Crossref
- CiNii Articles
-
- 抄録ライセンスフラグ
- 使用不可