日本語単語分割の分野適応のための部分的アノテーションを用いた条件付き確率場の学習  [in Japanese] Training Conditional Random Fields Using Partial Annotations for Domain Adaptation of Japanese Word Segmentation  [in Japanese]

Access this Article

Search this Article

Abstract

本研究では文の一部にのみ単語分割情報を付与する部分的アノテーションに注目する.重要な部分や作業負荷の少ない部分にのみアノテーションをすることにより,新しい分野に対応するための学習データを効率的に作成できる.この部分的アノテーションを使用して条件付き確率場(CRF)を学習する方法を提案する.CRFは単語分割および自然言語処理の様々な問題でその有効性が示されている手法であるが,その学習には文全体へのアノテーションが必要であった.提案法は周辺尤度を目的関数にすることで部分的アノテーションを用いたCRFのパラメータ推定を可能にした.日本語単語分割器の分野適応実験において部分的アノテーションによって効果的に性能を向上させることが可能であったことを報告する.In this paper, we address word-boundary annotations which are done only on part of sentences. By limiting our focus on crucial part of sentences, we can effectively create a training data for each new target domain by conducting such partial annotations. We propose a training algorithm for Conditional Random Fields (CRFs) using partial annotations. It is known that CRFs are wellsuited to word segmentation tasks and many other sequence labeling problems in NLP. However, conventional CRF learning algorithms require fully annotated sentences. The objective function of the proposed method is a marginal likelihood function, so that the CRF model incorporates such partial annotations. Through experiments, we show our method effectively utilizes partial annotations on a domain adaptation task of Japanese word segmentation.

In this paper, we address word-boundary annotations which are done only on part of sentences. By limiting our focus on crucial part of sentences, we can effectively create a training data for each new target domain by conducting such partial annotations. We propose a training algorithm for Conditional Random Fields (CRFs) using partial annotations. It is known that CRFs are wellsuited to word segmentation tasks and many other sequence labeling problems in NLP. However, conventional CRF learning algorithms require fully annotated sentences. The objective function of the proposed method is a marginal likelihood function, so that the CRF model incorporates such partial annotations. Through experiments, we show our method effectively utilizes partial annotations on a domain adaptation task of Japanese word segmentation.

Journal

  • 情報処理学会論文誌

    情報処理学会論文誌 50(6), 1622-1635, 2009-06-15

Cited by:  4

Codes

  • NII Article ID (NAID)
    110007970452
  • NII NACSIS-CAT ID (NCID)
    AN00116647
  • Text Lang
    JPN
  • Article Type
    Journal Article
  • ISSN
    1882-7764
  • Data Source
    CJPref  NII-ELS  IPSJ 
Page Top