大規模テストコレクションNTCIR－2の構築　－対話型追加検索と言語横断的プーリングの効果

栗山, 和子, 吉岡, 真治, 神門, 典子

大規模テストコレクションNTCIR-2 の正解文書リストは，NTCIR ワークショップ2 において各参加者から提出された検索結果を用いて，プーリング法に基づいて作成された．本稿では，NTCIR-2の正解文書リストの作成過程において行われた，言語横断的プーリング，および対話型検索システムを用いた追加検索が，検索システムのNTCIR-2 を用いた相対的評価にどのような影響を与えるかを考察する．また，プーリングに参加しなかったシステムを評価する場合に，プーリング法で作成したNTCIR-2 が有効であるかどうか調べる．本研究では，NTCIR-2 の正解文書リストと，NTCIR ワークショップ2 の参加チームの提出結果を用いて評価実験を行った．まず，最終的な正解文書リストFと，F から追加の対話型検索I だけで見つかった文書を除いた正解文書リストF ? I を用いて，提出結果の評価を行った．次に，各サブタスクごとの提出結果からプーリングを行い，このサブタスクごとのプールを正解文書リストとして用いた場合の評価を行った．さらに，プーリングに参加しなかったシステムの評価をシミュレートするため，F から同じシステムの提出結果の集合だけに含まれている正解文書S を除いた正解文書リストF ? S を用いて，そのシステムの提出結果を評価した．いずれの場合でも，提出結果の平均精度の平均による順位付けを行い，相対的評価とした．結果として，どの文書リストを正解文書リストとして用いて提出結果の評価を行っても，提出結果の相対的な順位はほとんど変化しなかった．また，そのシステムだけが見つけた正解文書を除いても，すなわち，そのシステムがプーリングに参加しなくても，そのシステムの提出結果の評価にはほとんど影響がなく，プーリングに参加した他システムとの相対的評価についても影響がないことが分かった．このことから，プーリング法に基づいて作成したテストコレクションの信頼性を確かめることができた．

The purposes of this study are to examine whether there is an effect on the relative evalu- ation of the IR systems using the relevance judgments of the test collection NTCIR-2 made by the pooling method and additional interactive searches,and to investigate whether the NTCIR-2 is effective for evaluating the IR systems,the search results of which were not used for the pooling.We carried out experiments using different lists of relevance judgments and search results submitted for the test of the 2nd NTCIR Workshop. First,we evaluated the search results using the list of the ?nal relevance judgments F of NTCIR-2 and F 竏驤 I that is,the F without the unique relevant documents found by the additional interactive searches I Second,we made pools from the search results for each of the sub-tasks and evaluated the search results using the pools as lists of relevance judgments.Third,we evaluated the search results using F 竏驤 S that is,the F without the unique relevant documents found by an IR system S in order to simulate evaluation of the IR system which was not used for the pooling. Almost the same rankings of the search results were produced by using the pools as lists of relevance judgments for system evaluation.When the search results by an IR system were not used for the pooling,there is a very little effect on evaluation of the system itself and relative evaluation among it and other systems.Therefore our results veri ?ed the reliability of test collection as an evaluation tool,which was based on pooling method.

大規模テストコレクションNTCIR－2の構築　－対話型追加検索と言語横断的プーリングの効果

書誌事項

この論文をさがす

抄録

収録刊行物

被引用文献 (2)*注記

参考文献 (14)*注記

キーワード

詳細情報詳細情報について

書き出し

問題の指摘

大規模テストコレクションNTCIR－2の構築 －対話型追加検索と言語横断的プーリングの効果

書誌事項

この論文をさがす

抄録

収録刊行物

被引用文献 (2)*注記

参考文献 (14)*注記

キーワード

詳細情報 詳細情報について

書き出し

問題の指摘

参加プロジェクトリスト

大規模テストコレクションNTCIR－2の構築　－対話型追加検索と言語横断的プーリングの効果

詳細情報詳細情報について