サポートベクターマシンを用いた対話的文書検索

この論文にアクセスする

この論文をさがす

著者

    • 村田, 博士 ムラタ, ヒロシ

書誌事項

タイトル

サポートベクターマシンを用いた対話的文書検索

著者名

村田, 博士

著者別名

ムラタ, ヒロシ

学位授与大学

総合研究大学院大学

取得学位

博士 (情報学)

学位授与番号

甲第1510号

学位授与年月日

2012-03-23

注記・抄録

博士論文

  We propose a heuristics which improves learning efficiency and retrievalefficiency in interactive document retrieval for selection of displayed doc-uments to a user. This heuristics is based on the extreme bias betweenpositive and negative example.  We conducted experiments to evaluate the effectiveness of our proposedheuristics for active learning. We use a set of articles which is widely usedin the text retrieval conference TREC. For comparison with our approach,two information retrieval methods were adopted. The first is conventionalRocchio-based relevance feedback. The second is conventional selectionrule for SVM-based active learning. Then we confirmed our proposedsystem outperformed other ones.  Ordering of displayed documents is accomplished by calculation of thedegree of relevance in interactive document retrieval. In SVM-based inter-active document retrieval, the degree of relevance is evaluated by signeddistance from optimal hyperplane. It is not made clear how the signeddistance on the SVMs has characteristics in Vector Space Model which isused in Rocchio-based method. We show that SVM-based retrieval hasan association with conventional Rocchio-based method by comparativeanalysis of relevance evaluation.  As a result of their analysis, equation of weight vector of relevancefeedback based on SVMs is equivalent to update equation of query vectorof Rocchio-based method. The degree of relevance on SVM based methodevaluates scalar product of norm of target document vector and cosinesimilarity of weight vector. On the other hand, the degree of relevanceon Rocchio-based method evaluates cosine similarity of query vector.  From this knowledge, we propose a cosine kernel equivalent to cosinesimilarity that is suitable for SVM-based interactive document retrieval.The effectiveness of a method using our proposed cosine kernel was con-firmed, and it was experimentally compared with conventional relevancefeedback for the Boolean, term frequency (TF) and term frequency-inverse document frequency (TFIDF) representations of document vec-tors. The experimental results for a Text Retrieval Conference data setshow that the cosine kernel is effective for all document representations,especially TF representation.

2アクセス

各種コード

  • NII論文ID(NAID)
    500000564010
  • NII著者ID(NRID)
    • 8000000566233
  • 本文言語コード
    • jpn
  • NDL書誌ID
    • 024027665
  • データ提供元
    • 機関リポジトリ
    • NDL-OPAC
ページトップへ