GMM-Based Voice Conversion Applied to Emotional Speech Synthesis

IR HANDLE Open Access

Abstract

Voice conversion method is applied to synthesizing emotional speech from standard reading ( neutral) speech. Pairs of neutral speech and emotional speech are used for conversion rule training. The conversion adopts GMM (Gaussian Mixture Model) with DFW (Dynamic Frequency Warping). We also adopt STRAIGHT, the high-quality speech analysis-synthesis algorithm. As conversion target emotions, (Hot) anger, (cold) sadness and (hot) happiness are used. The converted speech is evaluated objectively first using mel cepstrum distortion as a criterion. The result confirms the GMM-based voice conversion can reduce distortion between target speech and neutral speech. A subjective test is also carried out to investigate perceptual effect. From the viewpoint of influence of prosody, two kinds of prosody are used to synthesis. One is natural prosody extracted from neutral speech and the other is from emotional speech. The result shows that prosody mainly contribute to emotion and spectrum conversion can reinforce it.

Details 詳細情報について

Report a problem

Back to top