A Training Method of Average Voice Model for HMM-Based Speech Synthesis

YAMAGISHI Junichi, TAMURA Masatsune, MASUKO Takashi, TOKUDA Keiichi, KOBAYASHI Takao

抄録

This paper describes a new training method of average voice model for speech synthesis in which arbitrary speaker's voice is generated based on speaker adaptation. When the amount of training data is limited, the distributions of average voice model often have bias depending on speaker and/or gender and this will degrade the quality of synthetic speech. In the proposed method, to reduce the influence of speaker dependence, we incorporate a context clustering technique called shared decision tree context clustering and speaker adaptive training into the training procedure of average voice model. From the results of subjective tests, we show that the average voice model trained using the proposed method generates more natural sounding speech than the conventional average voice model. Moreover, it is shown that voice characteristics and prosodic features of synthetic speech generated from the adapted model using the proposed method are closer to the target speaker than the conventional method.

収録刊行物

IEICE transactions on fundamentals of electronics, communications and computer sciences

IEICE transactions on fundamentals of electronics, communications and computer sciences 86 (8), 1956-1963, 2003-08-01

一般社団法人電子情報通信学会

キーワード

詳細情報詳細情報について

CRID: 1571980077248221568

NII論文ID: 110003221277

NII書誌ID: AA10826239

ISSN: 09168508

本文言語コード: en

データソース種別

CiNii Articles

A Training Method of Average Voice Model for HMM-Based Speech Synthesis

この論文をさがす

抄録

収録刊行物

被引用文献 (12)*注記

参考文献 (18)*注記

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

A Training Method of Average Voice Model for HMM-Based Speech Synthesis

この論文をさがす

抄録

収録刊行物

被引用文献 (12)*注記

参考文献 (18)*注記

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

参加プロジェクトリスト

詳細情報詳細情報について