Morphological Analysis of Kana Literature in Early Middle Japanese

Bibliographic Information

Other Title
  • 中古仮名文学作品の形態素解析
  • チュウコ カナ ブンガク サクヒン ノ ケイタイソ カイセキ

Search this article

Abstract

For high-level and accurate study of the classical Japanese language, a morphologically annotated diachronic corpus is essential. In order to construct an annotated corpus, automatic morphological analysis is necessary, but such morphological analysis of classical Japanese has been considered difficult to implement. Given this situation, we developed a new electronic dictionary, "UniDic for Early Middle Japanese", which makes analysis of classical Japanese practical. This dictionary was created by expanding the entries in UniDic (for Contemporary Japanese) and creating a training corpus of Early Middle Japanese based on technique of the statistical machine learning. The new dictionary achieves a high accuracy rate of approximately 97% (approximately 96% when the target text contains unknown words) in analyzing kana literature from the Heian era. This dictionary allows users to apply new research methods to classical Japanese, including complex searches and statistical analyses, which were previously impossible. Because UniDic entries are regularized in Short Unit Words, which are designed to reduce discrepancy and keep uniformity, UniDic users can compare results of analyzed texts beyond the difference of literature works and the times. UniDic for Early Middle Japanese is available to the public gratis and used for construction of the Heian period series of the Corpus of Historical Japanese.

Journal

Details 詳細情報について

Report a problem

Back to top