A system for the synthesis of high-quality speech from texts on general weather conditions

この論文をさがす

抄録

A text-to-speech conversion system for Japanese has been developed for the purpose of producing high-quality speech output. This system consists of four processing stages: 1) linguistic processing,2) phonological processing, 3) control parameter generation, and 4) speech waveform generation. Although the processing at the first stage is resticted to the texts on general weather conditions, the other three stages can also cope with texts of news and narrations on other topics. Since the prosodic features of speech are largely related to the linguistic information, such as word accent, syntactic structure and discourse structure, linguistic processing of a wider range than ever, at least a sentence, is indispensable to obtain good quality speech with respect to the prosody. From this point of view, input text was restricted to the weather forecast sentences and a method for linguistic processing was developed to conduct morpheme, syntactic and semantic analyses simultaneously. A quantitative model for generating fundamental frequency contours was adopted to make a good reflection of the linguistic information on the prosody of synthetic speech. A set of prosodic rules was constructed to generate prosodic symbols representing prosodic structures of the text from the linguistic information obtained at the first stage. A new speech synthesizer based on the terminal analog method was also developed to improve the segmental quality of synthetic speech, It consists of four paths of cascade connection of pole / zero filters and three waveform generators. The four paths are respectively used for the synthesis of vowels and vowel-like sounds, nasal murmur and buzz bar, friction, and plosion, while the three generators produce voicing source waveform approximated by polynomials, white Gaussian noise source for fricatives and impulse source for plosives. The validity of the approach above has been confirmed by the listening tests using speech synthesized by the developed system. Improvements both in the quality of prosodic features and in the quality of segmental features were realized for the synthetic speech.

収録刊行物

被引用文献 (16)*注記

もっと見る

詳細情報 詳細情報について

  • CRID
    1572543027348180864
  • NII論文ID
    110003215432
  • NII書誌ID
    AA10826239
  • ISSN
    09168508
  • 本文言語コード
    en
  • データソース種別
    • CiNii Articles

問題の指摘

ページトップへ