Combining Page Group Structure and Content for Roughly Filtering Researchers' Homepages with High Recall (特集:情報融合) Combining Page Group Structure and Content for Roughly Filtering Researchers' HomePages with High Recall(Special Issue on Fusions of Information Contents and Information Technologies)

    • WANG YUXIN
    • The Graduate University of Advanced Studies (SOKENDAI)
    • OYAMA KEIZO
    • The Graduate University of Advanced Studies (SOKENDAI) : National Institute of Informatics (NII)

Abstract

This paper proposes a method for gathering researchers' homepages (or entry pages) by applying new simple and effective page group models for exploiting the mutual relations between the structure and content of a page group, aiming at narrowing down the candidates with a very high recall. First, 12 property-based keyword lists that correspond to researchers' common properties are created and are assigned either organization-related or other. Next, several page group models (PGMs) are introduced taking into consideration the link structure and URL hierarchy. Although the application of PGMs generally causes a lot of noises, modified PGMs with two original techniques are introduced to reduce these noises. Then based on the PGMs, the keywords are propagated to a potential entry page from its surrounding pages, composing a virtual entry page. Finally, the virtual entry pages that score at least a threshold number are selected. The effectiveness of the method is shown by comparing it to a single-page-based method through experiments using a 100GB web data set and a manually created sample data set.

Journal

IPSJ Transactions on Databases   [List of Volumes]

IPSJ Transactions on Databases 47(SIG_8(TOD_30)), 11-23, 2006-06-15  [Table of Contents]

Information Processing Society of Japan (IPSJ)

References:  15

You must have a user ID to see the references.If you already have a user ID, please click "Login" to access the info.New users can click "Sign Up" to register for an user ID.

Cited by:  1

You must have a user ID to see the cited references.If you already have a user ID, please click "Login" to access the info.New users can click "Sign Up" to register for an user ID.

Preview

Preview

Codes

  • NII Article ID (NAID) :
    110006390934
  • NII NACSIS-CAT ID (NCID) :
    AA11464847
  • Text Lang :
    ENG
  • Article Type :
    Journal Article
  • ISSN :
    03875806
  • NDL Article ID :
    8011073
  • NDL Source Classification :
    ZM13(科学技術--科学技術一般--データ処理・計算機)
  • NDL Call No. :
    Z74-C192
  • Databases :
    CJP  CJPref  NDL  NII-ELS