統合的分類アルゴリズムを用いた文章の書き手の識別

  • 金 明哲
    同志社大学文化情報学部・文化情報学研究科

書誌事項

タイトル別名
  • Using Integrated Classification Algorithm to Identify a Text's Author
  • トウゴウテキ ブンルイ アルゴリズム オ モチイタ ブンショウ ノ カキテ ノ シキベツ
  • Using Integrated Classification Algorithm to Identify a Text's Author

この論文をさがす

抄録

Text classification results often vary depending on the detailed factors in data analysis, including feature data, classification method, and parameter sets adopted in the analysis. The author of an anonymous text can be generally identified by extracting a set of distinctive features of the text, and then using the features to find the most likely author. Numerous efforts have been made to develop the feature extraction technique with more robustness and the classification algorithm, but an important issue is how to select the features datasets and classification method. To address this issue, we propose an integrated classification algorithm that extracts multiple feature datasets from differing viewpoints and aspects of a text and applies multiple strong classifiers to the datasets. Our proposed method achieved 100% accuracy in identifying the authors of literary works and student essays, and identified the author of all but 1 out of 60 diaries which were written by 6 different people.Our proposed method achieved equivalent or better accuracy than the case when any a strong classifier applied to individual feature dataset. Furthermore, the accuracy in identifying the authors of student essays increased by roughly two percentage points.

収録刊行物

  • 行動計量学

    行動計量学 41 (1), 35-46, 2014

    日本行動計量学会

被引用文献 (3)*注記

もっと見る

参考文献 (7)*注記

もっと見る

詳細情報 詳細情報について

問題の指摘

ページトップへ