-
- Zhou Shuangshuang
- Graduate School of Information Sciences, Tohoku University
-
- Okazaki Naoaki
- Graduate School of Information Sciences, Tohoku University
-
- Matsuda Koji
- Graduate School of Information Sciences, Tohoku University
-
- Tian Ran
- Graduate School of Information Sciences, Tohoku University
-
- Inui Kentaro
- Graduate School of Information Sciences, Tohoku University
抄録
<p>Wikification is the task of connecting mentions in texts to entities in a large-scale knowledge base, Wikipedia. In this paper, we present a pipeline system for Japanese Wikification that consists of two components, namely candidate generation and candidate ranking. We investigate several techniques for each component, using a recently developed Japanese Wikification corpus. For candidate generation, we find that a name dictionary using anchor texts of Wikipedia is more effective than other methods based on similarity of surface forms. For candidate ranking, we verify that a set of features used in English Wikification is effective in Japanese Wikification as well. In addition, by using a corpus that links mentions to Japanese Wikipedia entries instead of to English Wikipedia entries, we are able to acquire rich contextual information from Japanese Wikipedia articles, which leads to improvements for Japanese mention disambiguation. We take this advantage by exploring several embedding models that encode context information of Wikipedia entities. The experimental results demonstrate that they improve candidate ranking. We also report the effect of each feature in detail. To sum, our system achieves 81.60% accuracy, significantly outperforming the previous work.</p>
収録刊行物
-
- Journal of Information Processing
-
Journal of Information Processing 25 (0), 341-350, 2017
一般社団法人 情報処理学会
- Tweet
詳細情報 詳細情報について
-
- CRID
- 1390001205295565312
-
- NII論文ID
- 130006900180
-
- ISSN
- 18826652
-
- 本文言語コード
- en
-
- データソース種別
-
- JaLC
- Crossref
- CiNii Articles
- KAKEN
-
- 抄録ライセンスフラグ
- 使用不可