A Classification Method of Spoken Words in Continuous Speech for Many Speakers

この論文にアクセスする

この論文をさがす

著者

抄録

Speech wave is converted into a time series of short time spectra by 20-channel filter bank and is segmented into four groups: silence unvoiced-non-fricative unvoiced-nonplosive and voiced group. The unvoiced groups are classified into a unit of phoneme by heuristic algorithms and voiced group by Bayes rule. To normalize the variation of reference patterns among speakers vowel patterns are learned by the non-supervised learning method. The optimum matching between a just recognized phoneme string and a phoneme string of a given word in the word dictionary is performed by utilizing the phoneme similarity matrix and Dynamic Programming. According to the results tested upon 1 500 samples of isolated digits spoken by 20 male speakers about 97% were correctly recognized and in case of the system adapting for each speaker 98% correctly recognized.Speech wave is converted into a time series of short time spectra by 20-channel filter bank and is segmented into four groups: silence, unvoiced-non-fricative, unvoiced-nonplosive, and voiced group. The unvoiced groups are classified into a unit of phoneme by heuristic algorithms and voiced group by Bayes rule. To normalize the variation of reference patterns among speakers, vowel patterns are learned by the non-supervised learning method. The optimum matching between a just recognized phoneme string and a phoneme string of a given word in the word dictionary is performed by utilizing the phoneme similarity matrix and Dynamic Programming. According to the results tested upon 1,500 samples of isolated digits, spoken by 20 male speakers, about 97% were correctly recognized and, in case of the system adapting for each speaker, 98% correctly recognized.

Speech wave is converted into a time series of short time spectra by 20-channel filter bank and is segmented into four groups: silence, unvoiced-non-fricative, unvoiced-nonplosive, and voiced group. The unvoiced groups are classified into a unit of phoneme by heuristic algorithms and voiced group by Bayes rule. To normalize the variation of reference patterns among speakers, vowel patterns are learned by the non-supervised learning method. The optimum matching between a just recognized phoneme string and a phoneme string of a given word in the word dictionary is performed by utilizing the phoneme similarity matrix and Dynamic Programming. According to the results tested upon 1,500 samples of isolated digits, spoken by 20 male speakers, about 97% were correctly recognized and, in case of the system adapting for each speaker, 98% correctly recognized.

収録刊行物

  • Information Processing in Japan

    Information Processing in Japan 17(0), 6-13, 1977-01-01

    一般社団法人情報処理学会

各種コード

  • NII論文ID(NAID)
    110002672341
  • NII書誌ID(NCID)
    AA00674393
  • 本文言語コード
    ENG
  • 資料種別
    Article
  • データ提供元
    NII-ELS  IPSJ 
ページトップへ