Extracting Bilingual Word Pairs with Maximum Entropy Modeling

  • SATO KENGO
    Department of Computer Science, Keio University
  • SAITO HIROAKI
    Department of Information and Computer Science, Keio University

Bibliographic Information

Other Title
  • 最大エントロピー法を用いた対訳単語対の抽出
  • サイダイ エントロピーホウ オ モチイタ タイヤク タンゴ ツイ ノ チュウシュツ

Search this article

Abstract

Translation dictionaries used in multilingual natural language processing such as machine translation have been made manually, but a great deal of labor is required for this work and it is difficult to keep the description of the dictionaries consistent. Therefore, researches of extracting bilingual word pairs from parallel corpora automatically become active recently. In this paper, we propose a learning and extracting method of bilingual word pairs from aligned parallel corpora with the maximum entropy modeling. We define a probabilistic model of bilingual word pairs and four types of feature functions which express statistical and linguistic properties such as co-occurrence information and morphlogical information. Co-occurrence information restricts the sense of words. Morphlogical information restricts the part-of-speech of words. Experiment results in which Japanese and English parallel corpora are used show that our method performs better than the previous methods and can extract the bilingual word pairs which do not appear in the training corpus with almost the same accuracy as the appeared pairs due to the property of the maximum entropy modeling.

Journal

Citations (2)*help

See more

References(19)*help

See more

Details 詳細情報について

Report a problem

Back to top