検索結果を統合するための情報量の概念を考慮したスコア正規化手法

鈴木, 優, 波多野, 賢治, 吉川, 正俊, 植村, 俊亮

複数の検索システムへ問合せを行い，それぞれの結果を統合して精度を向上させる手法であるメタ検索エンジンが，Web文書や画像の検索などに広く活用されている．この手法では，1つの検索対象オブジェクトに対して複数の検索システムがそれぞれスコアを付与し，関数を用いてそれらを統合する．メタ検索エンジンの各検索システムが計算するスコア群の平均，分散，総和などはそれぞれ異なるため，統合する前に正規化する必要がある．ところが従来のスコア正規化手法では，各検索システムが検索対象オブジェクトに付与したスコアの分布に偏りがあるようなスコア群を正規化することができないため，最適な正規化ができない場合がある．本論文では，各検索システムが計算したスコアに対する検索対象オブジェクト数の分布を考慮したスコアの正規化手法を提案する．本提案では，ある検索対象オブジェクトのスコアが同じであっても，そのスコア周辺での検索対象オブジェクトの数が多い場合は少ない場合に比べて相対的に低い値となるべきであると考えた．つまり，検索対象オブジェクト数が多いスコアの範囲では正規化後のスコアを低くし，検索対象オブジェクト数が少ないスコアの範囲では正規化後のスコアを高くする．その結果，高い値を持つスコアが多い検索結果と低い値を持つスコアが多い検索結果を，それぞれ相互に統合可能なスコアへ正規化を行うことが可能となる．

In metasearch engines, many individual retrieval systems calculate raw relevant scores to a retrieval target respectively, and the metasearch engine directly combines these raw relevant scores into the similarities between the user's query and the retrieval target. At this time, the metasearch engine should normalize the raw relevant scores to be equivalent with each other, because the raw relevant scores are not always adequate to be combined. That is to say, the same normalized relevant scores should indicate the same similarities between the user's query and the retrieval targets, even if the normalized relevant scores are calculated by different retrieval systems. Many normaization methods have been proposed so far, but these normalization methods are not always sufficient for normalizing the raw relevance scores. In this paper, we propose a method for normalizing raw relevant scores using Shannon's information measure. By applying Shannon's information measure to our proposed normalization method, the high raw relevant scores are convert to low normalized relevant scores if a retrieval system calculates many high raw relevant scores. On the contrary, the high raw relevant scores are convert to high normalized relevant scores if a retrieval system calculates only a few number of high raw relevant scores. We assume that the retrieval targets are relevant for the user's query if both the raw relevance scores and the Shannon's information measure are high. From our experimental result, we confirmed that the accuracy of our normalization method is better than others. Consequently, we confirmed that the proposed assumption is correct for relevant score normalization method.

検索結果を統合するための情報量の概念を考慮したスコア正規化手法

書誌事項

この論文をさがす

抄録

収録刊行物

被引用文献 (2)*注記

参考文献 (18)*注記

関連プロジェクト

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

検索結果を統合するための情報量の概念を考慮したスコア正規化手法

書誌事項

この論文をさがす

抄録

収録刊行物

被引用文献 (2)*注記

参考文献 (18)*注記

関連プロジェクト

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

参加プロジェクトリスト

詳細情報詳細情報について