HMM-Based Style Control for Expressive Speech Synthesis with Arbitrary Speaker's Voice Using Model Adaptation

  • NOSE Takashi
    Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology
  • TACHIBANA Makoto
    Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology
  • KOBAYASHI Takao
    Interdisciplinary Graduate School of Science and Engineering, Tokyo Institute of Technology

この論文をさがす

抄録

This paper presents methods for controlling the intensity of emotional expressions and speaking styles of an arbitrary speaker's synthetic speech by using a small amount of his/her speech data in HMM-based speech synthesis. Model adaptation approaches are introduced into the style control technique based on the multiple-regression hidden semi-Markov model (MRHSMM). Two different approaches are proposed for training a target speaker's MRHSMMs. The first one is MRHSMM-based model adaptation in which the pretrained MRHSMM is adapted to the target speaker's model. For this purpose, we formulate the MLLR adaptation algorithm for the MRHSMM. The second method utilizes simultaneous adaptation of speaker and style from an average voice model to obtain the target speaker's style-dependent HSMMs which are used for the initialization of the MRHSMM. From the result of subjective evaluation using adaptation data of 50 sentences of each style, we show that the proposed methods outperform the conventional speaker-dependent model training when using the same size of speech data of the target speaker.

収録刊行物

被引用文献 (8)*注記

もっと見る

参考文献 (25)*注記

もっと見る

詳細情報 詳細情報について

問題の指摘

ページトップへ