被験者判定のゆれと要約モデル  [in Japanese] Exploiting Human Judgments for Automatic Text Summarization: An Empirical Comparison  [in Japanese]

Access this Article

Search this Article

Author(s)

Abstract

本稿では,人間の重要文判定データを教師あり(supervised)・教師なし(unsupervised)の学習パラダイムのもとでモデリングし,データの性質と特定のモデルの精度との関係について考察する.具体的には,確率的決定木をベースにした判定データを直接学習する手法と,判定データをまったく参照しないクラスタリングに基づく手法とを比較し,両者の精度と判定者間の一致度との関係を見る.実験の結果,以下の点が確認された.(a)クラスタリング手法がおおむね決定木手法に比べてパフォーマンスが良い.(b)クラスタリング手法は判定者間のゆれの影響をあまり受けないが,決定木手法は性能がゆれに左右される.(c)いずれの手法も性能が文章構造に強く影響される.The paper empirically examines how variation in human judgments onsentence extraction affects performance of summarizers. In particularwe will be concerned with how summarizers from two different learningparadigms, i.e., supervised v. unsupervised paradigms, fare when setto the task of extracting a summary from a text.We build a supervised summarizer on the probabilistic decisiontree and an unsupervised summarizer on the K-means clustering, andcompare performance of the two approaches, and some variation on them,on data elicited from human subjects. It is found that for the most ofthe time, the clustering approach outperforms the supervisedapproach. Somewhat to our surprise, we also foundthat the variability in judgment exerts no significant effect on howwell the clustering based approach performs, in contrast to thesupervised approach, which is hurt by the variability. Anothernotable result is that what we might call the topical structure of text apparently influences performance of thesummarizers, whether supervised or unsupervised.

The paper empirically examines how variation in human judgments on sentence extraction affects performance of summarizers. In particular we will be concerned with how summarizers from two different learning paradigms, i.e., supervised v. unsupervised paradigms, fare when set to the task of extracting a summary from a text. We build a supervised summarizer on the probabilistic decision tree and an unsupervised summarizer on the K-means clustering, and compare performance of the two approaches, and some variation on them, on data elicited from human subjects. It is found that for the most of the time, the clustering approach outperforms the supervised approach. Somewhat to our surprise, we also found that the variability in judgment exerts no significant effect on how well the clustering based approach performs, in contrast to the supervised approach, which is hurt by the variability. Another notable result is that what we might call the topical structure of text apparently influences performance of the summarizers, whether supervised or unsupervised.

Journal

  • IPSJ journal

    IPSJ journal 45(3), 794-808, 2004-03-15

    Information Processing Society of Japan (IPSJ)

References:  31

Codes

  • NII Article ID (NAID)
    110002712131
  • NII NACSIS-CAT ID (NCID)
    AN00116647
  • Text Lang
    JPN
  • Article Type
    Journal Article
  • ISSN
    1882-7764
  • NDL Article ID
    6885115
  • NDL Source Classification
    ZM13(科学技術--科学技術一般--データ処理・計算機)
  • NDL Call No.
    Z14-741
  • Data Source
    CJP  NDL  NII-ELS  IPSJ 
Page Top