Support Vector Machineを用いた重要文抽出法

平尾, 努, 磯崎, 秀樹, 前田, 英作, 松本, 裕治

文書から重要な情報を持った文を抽出する重要文抽出技術は，文書要約技術の1つであり，より自然な文書要約を実現するための基盤技術である．重要文の抽出精度を高めるためには，複数の手がかりを統合的かつ効果的に扱うことが必要とされており，機械学習手法を取り入れた重要文抽出法が着目されつつある．本稿では，汎化能力の高い機械学習手法とされるSupport Vector Machine（SVM）を用いた重要文抽出手法を提案する．Text Summarization Challenge（TSC）のデータを用いて評価実験を行い，提案手法はLead手法などの従来手法と比較して統計的に有意な差で優れていることを実証した．また，野本らのデータを用いた評価実験でもこれに近い成績が得られた．さらに，文書のジャンルを考慮することで重要文の抽出精度が向上すること，重要文抽出に有効な素性のジャンルによる違いを明らかにした．

Extracting from a text the sentences that contain important information is aform of text summarization.If done accurately, it supports the automatic generation of summaries similar to those written by humans.To achieve this, the algorithm must be able to handle heterogeneous information.Therefore, parameter tuning by machine learning techniques have received attention.In this paper, we propose a method of sentence extraction based onSupport Vector Machines (SVMs).To confirm the performance of our method, we conduct experiments on the Text Summarization Challenge (TSC) corpus and Nomoto's corpus.Results on the former show that our method is better (statistically significant) than the Lead-based method.Moreover, we discover that document genre is important with regard to extraction performance; the effective features of each genre are clarified.

Support Vector Machineを用いた重要文抽出法

Bibliographic Information

Search this article

Abstract

Journal

Citations (9)*help

References(29)*help

Keywords

Details 詳細情報について

Export

Report a problem

Support Vector Machineを用いた重要文抽出法

Bibliographic Information

Search this article

Abstract

Journal

Citations (9)*help

References(29)*help

Keywords

Details 詳細情報について

Export

Report a problem

Project list