種々のテキスト検索モデルの頑健性向上による音声ドキュメント検索の高精度化

市川, 賢, 北岡, 教英, 柘植, 覚, 武田, 一哉, 北, 研二

テキスト検索に用いられてきた従来の3つの主な検索手法（ベクトル空間モデル，クエリ尤度モデル，適合モデルに基づく手法）に対し統一的な枠組みで改良を加えることで，音声ドキュメント検索における語彙外単語や音声認識誤りに対処する手法を提案し，比較検討を行った．各検索手法に対し，新たな検索質問拡張手法，および音節の3連鎖を単語と同様に扱う検索を単語単位の検索とスコアレベルで組み合わせる手法を提案する．提案手法の有効性をNTCIR-9のSpokenDocタスクで評価した結果，各手法でBaseline手法よりも検索性能が向上した．特に，確率に基づくクエリ尤度モデルに基づく手法と適合モデルに基づく手法では検索性能が高かった．提案手法はNTCIR-9で公表されている公式の最高精度の結果を上回る結果を得た．

We apply modifications to typical text retrieval methods based on vector space model, query likelihood model, and relevance model, to make them robust to out-of-vocabulary words and misrecognition. We propose novel query expansion methods and combination methods of syllable recognition-based retrieval with word recognition-based retrieval, for these typical methods. We used NTCIR-9 SpokenDoc task to evaluate them. Each modified method achieved better result than baseline. The methods based on stochastic models like the query likelihood model and the relevance model achieved better performance than the vector space model. The performance of our proposed methods was better than the best result published in NTCIR-9 competition.

種々のテキスト検索モデルの頑健性向上による音声ドキュメント検索の高精度化

Search this article

Abstract

Journal

Related Projects

Keywords

Details 詳細情報について

Export

Report a problem

種々のテキスト検索モデルの頑健性向上による音声ドキュメント検索の高精度化

Search this article

Abstract

Journal

Related Projects

Keywords

Details 詳細情報について

Export

Report a problem

Project list