実世界指向Web マイニングによる同姓同名人物の分離  [in Japanese] Distinguishing between People on the Web with the Same First and Last Name by Real-world Oriented Web Mining  [in Japanese]

Access this Article

Search this Article

Author(s)

Abstract

巨大なデータベースであるWeb から知識を抽出する一手法として実世界指向Web マイニングを提案する.従来のマイニングでは主に統計的な処理によりデータの特徴が抽出されていた.これに対し,実世界指向マイニングでは,実世界を意識したデータの解釈,具体的には,実世界のエンティティがデータの中にどのように現れ,相互にどういう関係を形成しているかを調べる.この考え方をWeb における人物の識別に適用し,同姓同名人物の分離を行った.これは,与えられた人名が出現するWeb ページを同一人物ごとにグループ分けするタスクで,本手法を用いた場合,平均9 割以上の高い率で正しく処理できることを確認した.This paper proposes a technique called "real-world oriented Web mining" for extracting knowledge from the Web regarded as a huge database. While conventional mining techniques search for characteristics of data mostly by statistical analysis, the proposed technique interprets data from real-world oriented point of view. In more concrete terms, it locates real-world entities in the data and analyzes relationships among them. This idea has been applied for performing a task to distinguish between people on the Web with the same first and last name. The task is to classify Web pages with a given person's name into groups each of which corresponds to a person in the real world. With the proposed technique, people have been identified with accuracy more than 90% on average.

This paper proposes a technique called "real-world oriented Web mining" for extracting knowledge from the Web regarded as a huge database. While conventional mining techniques search for characteristics of data mostly by statistical analysis, the proposed technique interprets data from real-world oriented point of view. In more concrete terms, it locates real-world entities in the data and analyzes relationships among them. This idea has been applied for performing a task to distinguish between people on the Web with the same first and last name. The task is to classify Web pages with a given person's name into groups each of which corresponds to a person in the real world. With the proposed technique, people have been identified with accuracy more than 90% on average.

Journal

  • 情報処理学会論文誌データベース(TOD)

    情報処理学会論文誌データベース(TOD) 46(SIG8(TOD26)), 26-36, 2005-06-15

    Information Processing Society of Japan (IPSJ)

References:  13

Cited by:  4

Codes

  • NII Article ID (NAID)
    110002768776
  • NII NACSIS-CAT ID (NCID)
    AA11464847
  • Text Lang
    JPN
  • Article Type
    Journal Article
  • ISSN
    1882-7799
  • NDL Article ID
    7966205
  • NDL Call No.
    Z74-C192
  • Data Source
    CJP  CJPref  NDL  NII-ELS  IPSJ 
Page Top