文法カテゴリ対制約を用いたA*探索に基づく大語彙連続音声認識パーザ

李晃伸, 河原, 達也, 堂下修司

文法に基づく連続音声認識(パージング)において大語彙の条件下で効率の良いA^*探索を実現するための手法を提案する. 大語彙においては探索時に仮説の単語ネットワークが爆発するため広く用いられている1パスのビーム探索ではビーム幅を大きくとる必要があり効率が悪い. また文法による次単語予測のみでは候補の絞り込みが不十分である. これに対して (1)元の文法から抽出したコンパクトな単語対制約によって仮説ネットワークの大きさを抑えかつ(2)文法カテゴリごとに単語辞書を本構造化することで効率的に強力なヒューリステイック計算を行う. さらに (3)この第1パスの結果をインデックス化しその音響的照合結果から展開単語を絞り込むことで大語彙で効率の良いA^*探索を実現する. この手法を実装した汎用連続音声認識パーザJulianを 5000語クラスの文法タスクでの認識実験において標準的な1パスビーム探索のデコーダと比較した. その結果本手法は大語彙ではるかに少ない計算量で探索が行えまた構文の複雑さによらずどのような文法でも安定して動作した. 最終的に実時間の2.2倍程度の処理時間で91.4%の単語認識精度を達成した.

We address an efficient A ^* search algorithm for grammar-based large vocabulary continuous speech recognition. While grammars can introduce long-distance constraint into search, the expanded word hypothesis network grows huge under large vocabulary. So convensional one-pass beam search needs extremely wide beam width to get optimum results. We propose an efficient two-pass search algorithm by (1) using word-pair constraint as heuristics and (2) tree-organizing the word lexicon for each grammar category, to represent the whole network in a compact loop structure. Futhermore, (3) the survived words on the first pass are indexed to eliminate candidates to be accessed on the second pass. We depeloped a portable FSA-based CSR parser named Julian and compared the performance with a typical one-pass beam decoder on 5,000-word task. Experimental results show that the proposed method achieves high accuracy with far less computation, and works stably with even more complex grammars. Finally, our parser archieved a word accuracy of 91.2% with process time of 2.5 times the real time.

文法カテゴリ対制約を用いたA*探索に基づく大語彙連続音声認識パーザ

書誌事項

この論文をさがす

抄録

収録刊行物

被引用文献 (30)*注記

参考文献 (18)*注記

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

文法カテゴリ対制約を用いたA*探索に基づく大語彙連続音声認識パーザ

書誌事項

この論文をさがす

抄録

収録刊行物

被引用文献 (30)*注記

参考文献 (18)*注記

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

参加プロジェクトリスト

詳細情報詳細情報について