名詞の類似表現拡張に基づくオープンドメイン音声質問応答システム用言語モデルの構築

ヴァルガ, イシュトヴァーン, 大竹, 清敬, 鳥澤, 健太郎, デサーガ, ステイン, 翠, 輝久, 松田, 繁樹, 風間, 淳一

本論文では，オープンドメイン音声質問応答システム「一休」で用いる音声認識言語モデル構築手法を提案する．「一休」は，幅広いトピックの比較的短い質問文をスマートフォン経由でユーザから受け取り，大規模なWWWコーパスから答えを探して出力する．オープンドメインの質問を正確に音声認識することを可能にする言語モデルの構築が課題となる．既存のドメインアダプテーションの手法と，名詞の分布類似度に基づくシードコーパスの拡張を組み合わせることで，低コストで高性能の言語モデルを作成した．500文のシードコーパスと6億文のWWWコーパスから41万語を網羅する言語モデルを作成した．WWWコーパスからランダムに抽出した文によって構築したベースライン言語モデルを単語誤り率で3.25%改善した．

This work presents a novel language model construction method for speech recognition, utilized with “Ikkyu”, an open-domain speech-based question answering system. Ikkyu accepts relatively short spoken questions concerning a large variety of topics as input through a smartphone, providing the answers retrieved from a large scale Web archive. Our challenge is to construct a language model that can accurately perform speech recognition of open domain questions with smartphones as input devices. We tackle this problem by combining an existing domain adaptation method and distributional word similarity. From 500 seed sentences and a corpus of 600 million Web pages we constructed a language model covering 413,000 words. We achieved an average improvement of 3.25 points in word error rate (WER) over a baseline model constructed from randomly sampled Web sentences.

名詞の類似表現拡張に基づくオープンドメイン音声質問応答システム用言語モデルの構築

書誌事項

この論文をさがす

抄録

収録刊行物

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

名詞の類似表現拡張に基づくオープンドメイン音声質問応答システム用言語モデルの構築

書誌事項

この論文をさがす

抄録

収録刊行物

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

参加プロジェクトリスト

詳細情報詳細情報について