局所的要約知識の自動獲得手法  [in Japanese] A new approach to acquiring linguistic knowledge for locally summarizing Japanese news sentences  [in Japanese]

Access this Article

Search this Article

Author(s)

Abstract

日本語ニュースを局所的要約する際に必要となる要約知識を, コーパスから自動獲得する手法について述べる. 局所的要約とは注目個所の近傍の情報 (局所的情報) を用いて行なう要約をいう. 局所的情報には注目個所そのものやその前後の単語列などがある. 本手法では要約知識として置換規則と置換条件を用い, これらを原文一要約文コーパスから自動獲得する. はじめに原文中の単語と要約文中の単語のすべての組み合わせに対して単語間の距離を計算し, DPマッチングによって最適な単語対応を求める. その結果より, 置換規則は単語対応上で不一致となる単語列として獲得する. 一方, 置換条件は置換規則の前後のグラムの単語列として獲得する. 原文と要約文にそれぞれNHKニュース原稿とNHK文字放送の原稿を使って実際に要約知識を自動獲得し, 得られた要約知識を評価する実験を行った. その結果, 妥当な要約知識が獲得できることを確認した.

This paper proposes a new approach to acquiring linguistic knowledge for local context-based summarization. Our summarization method can transform characters, words, and Bunsetsu-phrases to the shorter ones by using linguistic information on some words to be summarized and some words located before and after the summarized words. Our linguistic knowledge for summarization, which is composed of transformation rules and transformation conditions, is automatically acquired from Japanese news corpus. In our corpus, original articles and the human-summarized ones are collected from NHK news text and NHK teletext respectively. The proposed method analyzes original news sentences and the summarized ones by Japanese morphological analyzer, and aligns original words with the summarized words by DP matching based on distances between both of the words. Transformation rules are acquired as the result of the difference. Transformation conditions are extracted as n-gram words located near a transformation rule. We acquired linguistic knowledge from NHK news corpus and obtained a high accuracy rate as a result of a series of experiments to evaluate the linguistic knowledge.

Journal

  • Journal of Natural Language Processing

    Journal of Natural Language Processing 6(7), 73-92, 1999-10-10

    The Association for Natural Language Processing

References:  10

Cited by:  26

Codes

  • NII Article ID (NAID)
    10008829553
  • NII NACSIS-CAT ID (NCID)
    AN10472659
  • Text Lang
    JPN
  • Article Type
    Journal Article
  • ISSN
    13407619
  • NDL Article ID
    4888784
  • NDL Source Classification
    ZU8(書誌・図書館・一般年鑑--図書館・ドキュメンテーション・文書館)
  • NDL Call No.
    Z21-B168
  • Data Source
    CJP  CJPref  NDL  J-STAGE 
Page Top