A system for the synthesis of high-quality speech from texts on general weather conditions
-
- HIROSE K.
- the Faculty of Engineering, The University of Tokyo
-
- Fujisaki Hiroya
- the Faculty of Fundamental Engineering, Science University
この論文をさがす
抄録
A text-to-speech conversion system for Japanese has been developed for the purpose of producing high-quality speech output. This system consists of four processing stages: 1) linguistic processing,2) phonological processing, 3) control parameter generation, and 4) speech waveform generation. Although the processing at the first stage is resticted to the texts on general weather conditions, the other three stages can also cope with texts of news and narrations on other topics. Since the prosodic features of speech are largely related to the linguistic information, such as word accent, syntactic structure and discourse structure, linguistic processing of a wider range than ever, at least a sentence, is indispensable to obtain good quality speech with respect to the prosody. From this point of view, input text was restricted to the weather forecast sentences and a method for linguistic processing was developed to conduct morpheme, syntactic and semantic analyses simultaneously. A quantitative model for generating fundamental frequency contours was adopted to make a good reflection of the linguistic information on the prosody of synthetic speech. A set of prosodic rules was constructed to generate prosodic symbols representing prosodic structures of the text from the linguistic information obtained at the first stage. A new speech synthesizer based on the terminal analog method was also developed to improve the segmental quality of synthetic speech, It consists of four paths of cascade connection of pole / zero filters and three waveform generators. The four paths are respectively used for the synthesis of vowels and vowel-like sounds, nasal murmur and buzz bar, friction, and plosion, while the three generators produce voicing source waveform approximated by polynomials, white Gaussian noise source for fricatives and impulse source for plosives. The validity of the approach above has been confirmed by the listening tests using speech synthesized by the developed system. Improvements both in the quality of prosodic features and in the quality of segmental features were realized for the synthetic speech.
収録刊行物
-
- IEICE Trans. Fundamentals, A
-
IEICE Trans. Fundamentals, A 76 (11), 1971-1980, 1993
一般社団法人電子情報通信学会
- Tweet
キーワード
詳細情報 詳細情報について
-
- CRID
- 1572543027348180864
-
- NII論文ID
- 110003215432
-
- NII書誌ID
- AA10826239
-
- ISSN
- 09168508
-
- 本文言語コード
- en
-
- データソース種別
-
- CiNii Articles