隠れマルコフモデルによる日本語形態素解析のパラメータ推定  [in Japanese] HMM Parameter Learning for Japanese Morphological Analyzer  [in Japanese]

Access this Article

Search this Article

Author(s)

    • 竹内 孔一 TAKEUCHI Koichi
    • 奈良先端科学技術大学院大学情報科学研究科 Graduate School of Information Science Nara Institute of Science and Technology
    • 松本 裕治 MATSUMOTO Yuji
    • 奈良先端科学技術大学院大学情報科学研究科 Graduate School of Information Science Nara Institute of Science and Technology

Abstract

本論文では日本語形態素解析システムにHMM (Hidden Markov Model)を適応する手法について提案する.日本語では英語と異なり,わかち書きがされていないため,HMMパラメータの初期確率を等確率にした単純な学習では精度が上がらない.よって以下の3つの手法に対するHMM学習の効果について実験を行った.1)初期確率の影響.2)文法制約の導入.3)スムージング.最初の実験から初期確率については少量であっても正確なタグ付きコーパスから獲得することがHMM学習に大きく効果があることを明らかにする.次に文法による制約と確率の再推定におけるスムージング化を行った場合,人手により整備されている日本語形態素解析システムと同等以上の解析精度が得られることを示す.This paper presents a method to apply Hidden Markov Model to parameter learning for Japanese morphological analyzer.When we pursued a simple approach based on HMM for Japanese part-of-speech tagging,it gives a poor performance since word boundaries are not clear in Japanese texts.We especially investigate how the following two information sources and a technique affect the results of the parameter learning:1)The initial value of parameters,i.e.,the initial probabilities,2)grammatical constraints that hold in Japanese sentences independently of any domain and 3)smoothing technique.The first results of the experiments show that initial probabilities learned from correctly tagged corpus affects greatly to the results and that even a small tagged corpus has an enough effect for the initial probabilities.The overall results gives that the total performance of the HMM-based parameter learning outperforms the human developed rule-based Japanese morphological analyzer.

This paper presents a method to apply Hidden Markov Model to parameter learning for Japanese morphological analyzer. When we pursued a simple approach based on HMM for Japanese part-of-speech tagging, It gives a poor performance since word boundaries are not clear in Japanese texts. We especially investigate how the following two information sources and a technique affect the results of the parameter learning: 1,) The initial value of parameters, i.e., the initial probabilities, 2) grammatical constraints that hold in Japanese sentences independently of any domain and 3) smoothing technique. The first results of the experiments show that initial probabilities learned from correctly tagged corpus affects greatly to the re-sults and that even a small tagged corpus has an enough effect for the initial probabilities. The overall results gives that the total performance of the HMM-based parameter learning outperforms the human developed rule-based Japanese morphological analyzer.

Journal

  • Transactions of Information Processing Society of Japan

    Transactions of Information Processing Society of Japan 38(3), 500-509, 1997-03-15

    Information Processing Society of Japan (IPSJ)

References:  11

Cited by:  11

Codes

  • NII Article ID (NAID)
    110002721502
  • NII NACSIS-CAT ID (NCID)
    AN00116647
  • Text Lang
    JPN
  • Article Type
    Journal Article
  • ISSN
    1882-7764
  • NDL Article ID
    4159276
  • NDL Source Classification
    ZM13(科学技術--科学技術一般--データ処理・計算機)
  • NDL Call No.
    Z14-741
  • Data Source
    CJP  CJPref  NDL  NII-ELS  IPSJ 
Page Top