コンパラブルコーパスと対訳辞書による日英クロス言語検索

奥村 明俊, 石川 開, 佐藤 研治

doi:10.5715/jnlp.5.4_77

書誌事項

タイトル別名

Japanese-English Cross Language Information Retrieval based on Comparable Corpora and Bilingual Dictionary

抄録

This paper proposes a method to translate query terms for cross-language information retrieval (CLIR). CLIR is generally performed by query translation and information retrieval (IR). CLIR is less precise than IR because of query term translation ambiguities, especially in Japanese and English CLIR. We developed Double MAXimize criteria based on comparable corpora (DMAX), which is an equivalent translation selection method for machine translation (MT), by using term co-occurrence frequency in comparable corpora. Though a term should be translated into one word for MT, a query term should be translated into several appropriate terms for CLIR. This paper describes a generalized query term selection model, the GDMAX for CLIR. In this model, a source query is represented in the vector form of the term co-occurrence frequency in source corpora. Translation queries are searched by vector similarity calculation between a source query and a target query represented by the co-occurrence frequency in comparable target corpora. GDMAX was evaluated by using TREC6 (Text Retrieval Conference) English data and 15 Japanese queries. GDMAX queries had approximately 62% accuracy of human queries, and 6% higher accuracy than machine translation queries and 12% higher accuracy than bilingual dictionary-based aueries.

収録刊行物

自然言語処理

自然言語処理 5 (4), 77-93, 1998

一般社団法人　言語処理学会

キーワード

詳細情報詳細情報について

CRID: 1390282679452223232

NII論文ID: 130004292057

DOI: 10.5715/jnlp.5.4_77

ISSN: 21858314; 13407619

Web Site: http://www.jstage.jst.go.jp/article/jnlp1994/5/4/5_4_77/_pdf

データソース種別

JaLC
Crossref
CiNii Articles

抄録ライセンスフラグ: 使用不可

コンパラブルコーパスと対訳辞書による日英クロス言語検索

書誌事項

抄録

収録刊行物

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

コンパラブルコーパスと対訳辞書による日英クロス言語検索

書誌事項

抄録

収録刊行物

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

参加プロジェクトリスト

詳細情報詳細情報について