被検索文書の絞り込みと補強,クエリ拡張に基づく統計データ向けアドホック検索

IPSJ Open Access

Bibliographic Information

Other Title
  • Ad Hoc Search for Statistical Data Based on Refinement and Augmentation of Retrieved Documents and Query Expansion

Search this article

Abstract

本稿では,被検索文書の絞り込みと補強,クエリ拡張に基づく統計データに対するアドホック検索手法を提案する.近年,政府や様々な団体が保有する公共的データを日常生活や社会のために有効活用するためのオープンデータの利用基盤整備が世界的に進んでおり,オープンデータの一種である統計データに対するアドホック検索基盤の重要性が高まっている.統計データは一般に表形式で記載されており,文章形式で記載される従来のアドホック検索の被検索文書とは異なる特徴を持つ.本稿では,被検索文書とクエリをカテゴリ分類し,候補となる被検索文書を絞り込む手法,統計データのメタデータにはない情報を統計表本体から抽出し,被検索文書を補強する手法,および,クエリに類似した拡張語を用いる手法で構成されるランキング手法を提案する.実験では,提案手法の構成要素の様々な組合せで性能を比較し,最良となる組合せを検証する.

In this paper, we propose an ad hoc search method for statistical data based on narrowing down and augmenting documents to be searched and query expansion. In recent years, there has been a worldwide trend towards the development of an open data infrastructure for the effective use of public data held by governments and other organisations for the benefit of everyday life and society, and the importance of ad hoc search infrastructures for statistical data, typically used as open data, is increasing. Statistical data is generally described in tabular format, and has different characteristics from the documents to be searched by conventional ad hoc search, which are described mailnly in text format. In this paper, we propose a ranking method which consists of three parts: (1) a method to categorize retrieved documents and queries to narrow down candidate documents, (2) a method to augment retrieved documents by extracting information from the statistical table itself which is not in the metadata of the statistical data, and (3) a method to use extended words which are similar to queries. In the experiments, we compare the performance of various combinations of the components of the proposed method and verify the best combination.

Journal

Related Projects

See more

Details 詳細情報について

Report a problem

Back to top