ユーザ印象評価データの分析に基づく印象マイニング手法の設計と評価

Bibliographic Information

Other Title
  • ユーザ インショウ ヒョウカ データ ノ ブンセキ ニ モトズク インショウ マイニング シュホウ ノ セッケイ ト ヒョウカ
  • Design and Evaluation of a Method for Mining Impressions of Text Based on Analysis of People's Impression Data

Search this article

Abstract

本論文では,「楽しい ⇔ 悲しい」,「うれしい ⇔ 怒り」,「のどか ⇔ 緊迫」という3種類の印象を対象に,新聞記事を読んだ人々が感じる印象の強さを数値的に求めるための手法を提案する.印象の強さ(すなわち印象値)を算出するためには,記事から抽出される特徴量が記事の印象に及ぼす影響力を数値化し,印象辞書に登録しておく必要がある.著者らは,先行研究において,それぞれの印象辞書を用いて算出される記事の印象値とその記事を読んだ人々が感じる印象の強さとの対応関係を高次の回帰分析により定式化することで,印象値を高精度に算出する手法を提案している.本論文では,さらに,それぞれの印象が独立ではない点に着目し,人々が感じる印象の強さと先行研究の手法を用いて算出される3つの印象値との対応関係を重回帰分析により定式化することで,それぞれの印象値をより高精度に算出し直す手法を提案する.未知データに対する提案手法の精度を5分割交差検定により調べてみたところ,それぞれの印象における平均誤差は1~7の7段階評価スケールに対して0.60,0.49,0.52であった.先行研究で提案した手法の平均誤差は0.69,0.49,0.64であったので,「うれしい ⇔ 怒り」に対しては同じ誤差を保ちつつ,「楽しい ⇔ 悲しい」と「のどか ⇔ 緊迫」に対する誤差が大幅に改善されていることが分かる.

The authors investigate the impressions people gain from reading newspaper articles, and propose a method for quantifying the strength of these impressions. Our target impressions are limited to those represented by three bipolar scales, “Happy - Sad,” “Glad - Angry,” and “Peaceful - Strained.” In order to compute the strength of each impression as an “impression value,” that is, a real number between 1 and 7, it is generally required to quantify the power of features extracted from articles to influence on their impressions and record it in an impression lexicon. An impression lexicon is usually constructed for each kind of impression and is used only to compute an impression value for an impression. We have already proposed a method for reducing the divergence between the values that were computed using each impression lexicon and those judged by readers, and our experimental results showed that the average root-mean-square errors (RMSEs) for unlearned data were 0.69, 0.49, and 0.64 for respective impressions. In this paper, we focus on the fact that the impressions are not independent of one another and adopt a new approach that recalculates each value with the values that were computed by the previous method. That is, we apply multiple regression analysis for each impression, where the values computed from articles using the previous method are used as one of the explanatory variables, and an average of the values that respondents used to rate each article using the corresponding scale in questionnaire surveys is used as the objective variable. Consequently, we obtain a multiple regression equation for each impression, which represents a correspondence relationship between the variables. We also perform five-fold cross-validation using the data obtained in the surveys to verify the effectiveness of the proposed method. The results show that the average RMSEs for unlearned data are 0.60, 0.49, and 0.52 for respective impressions. This means that the average RMSEs were greatly reduced in the “Happy - Sad” and “Peaceful - Strained” scales, while keeping the same average RMSE in the “Glad - Angry” scale.

Journal

Related Projects

See more

Details 詳細情報について

Report a problem

Back to top