Automatic Language Identification Using Sequential Information of Phonemes

ARAI Takayuki

In this paper approaches to language identification based on the sequential information of phonemes are described. These approaches assume that each language can be identified from its own phoneme structure, or phonotactics. To extract this phoneme structure, we use phoneme classifiers and grammars for each language. The phoneme classifier for each language is implemented as a multi-layer perceptron trained on quasi-phonetic hand-labeled transcriptions. After training the phoneme classifiers, the grammars for each language are calculated as a set of transition probabilities for each phoneme pair. Because of the interest in automatic language identification for worldwide voice communication, we decided to use telephone speech for this study. The data for this study were drawn from the OGI (Oregon Graduate Institute)-TS (telephone speech) corpus, a standard corpus for this type of research. To investigate the basic issues of this approach, two languages, Japanese and English, were selected. The language classification algorithms are based on Viterbi search constrained by a bigram grammar and by minimum and maximum durations. Using a phoneme classifier trained only on English phonemes, we achieved 81.1% accuracy. We achieved 79.3% accuracy using a phoneme classifier trained on Japanese phonemes. Using both the English and the Japanese phoneme classifiers together, we obtained our best result : 83.3%. Our results were comparable to those obtained by other methods such as that based on the hidden Markov model.

Automatic Language Identification Using Sequential Information of Phonemes

この論文をさがす

抄録

収録刊行物

被引用文献 (2)*注記

参考文献 (25)*注記

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

Automatic Language Identification Using Sequential Information of Phonemes

この論文をさがす

抄録

収録刊行物

被引用文献 (2)*注記

参考文献 (25)*注記

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

参加プロジェクトリスト

詳細情報詳細情報について