Read/Search this Article
Abstract
本稿では,1万文の製品レビュー文に対して,人手による注釈付けを行い作成した評判情報コーパスについて,その特徴を分析した.我々は,評判情報抽出タスクにおいて,評判情報コーパスが必要不可欠と考え,作成を行ってきた.作成されたコーパス中において,評判情報がどのように出現するかを調べることで,今後の評判情報抽出への取り組み方を検討する.本稿では,製品の様態とそれに対する評価を分けるために項目(item),属性(attribute),属性値(value),評価(evaluation)の4つ組から成る評判情報モデルを用いている.コーパス中における各構成要素の表層表現の出現頻度,省略されている要素について統計的な調査を行った.また,evaluationが出現する場合と出現しない場合の比較,同一のattribute-valueに対して異なる極性のevaluationが出現する場合を調べることで,4つ組で評判情報をモデル化する際の効果を検証することが出来た.
In this paper, we analyzed features of a corpus for sentiment analysis that was made by manually annotating product reviews, which consist of 10000 sentences. We have been making the corpus because it is one of essential resources for automated extraction of sentiment information. By analyzing the corpus from the viewpoint of how sentiment information appears, we will examine the direction of sentiment information extraction in our future work. We have proposed a model of sentiment analysis, in which one unit of sentiment information consists of four elements, namely, item, attribute, value and evaluation. In this model, descriptions of attributes of a product are explicitly separated from the evaluations that reviewers made. We conducted statistical investigation about each specified element and each omitted element in the corpus. We also compared the following cases with each other: the case that evaluation elements appeared and the case that they did not, and the cases in which the same attribute-value pairs appear but the evaluation elements have different polarities. According the comparison, we confirmed the effectiveness of our model.
Journal
- IPSJ SIG Notes [List of Volumes]
-
IPSJ SIG Notes 2008(90), 99-106, 2008-09-17 [Table of Contents]
Information Processing Society of Japan (IPSJ)