HTMLファイルからのトピック抽出に基づく興味推定とWWW検索

書誌事項

タイトル別名
  • Estimation of User's Interests and WWW Retrieval Based on Topic Extraction from HTML Files
  • HTML ファイル カラ ノ トピック チュウシュツ ニ モトヅク キョウミ スイテイ ト WWW ケンサク

この論文をさがす

抄録

In recent years, many methods which assist WWW retrieval based on user's interests have been proposed. However, it is generally difficult to estimate user's interests directly from HTML files, since they often contain multiple topics some of which may not interest a user. In this paper, we propose a method of estimating user's interests and WWW retrieval both of which are based on topics extracted from HTML files. The characteristics of the method are as follows: (1) Topics in a HTML file are extracted by identifying repetitive sequence of its HTML tags, (2) User's interests are estimated by clustering topics extracted from HTML files which contain user's interesting portions. (3) The accuracy of estimation of user's interests as well as WWW retrieval is improved by incorporating retrieved topics into the case-base as positive or negative cases which are specified by a user. Experimental results for 151 HTML files show that the method improves the precision in both of estimating user's interests and WWW retrieval, compared with a method without the extraction of topics.

収録刊行物

被引用文献 (1)*注記

もっと見る

参考文献 (13)*注記

もっと見る

詳細情報 詳細情報について

問題の指摘

ページトップへ