Compressing Inverted Index Using Optimal FastPFOR

この論文をさがす

抄録

Indexing plays an important role for storing and retrieving the data in Information Retrieval System (IRS). Inverted Index is the most frequently used indexing structure in IRS. In order to reduce the size of the index and retrieve the data efficiently, compression schemes are used, because the retrieval of compressed data is faster than uncompressed data. High speed compression schemes can improve the performance of IRS. In this paper, we have studied and analyzed various compression techniques for 32-bit integer sequences. The previously proposed compression schemes achieved either better compression rates or fast decoding, hence their decompression speed (disk access + decoding) might not be better. In this paper, we propose a new compression technique, called Optimal FastPFOR, based on FastPFOR. The proposed method uses better integer representation and storage structure for compressing inverted index to improve the decompression performance. We have used TREC data collection in our experiments and the results show that the proposed code could achieve better compression and decompression compared to FastPFOR and other existing related compression techniques.------------------------------This is a preprint of an article intended for publication Journal ofInformation Processing(JIP). This preprint should not be cited. Thisarticle should be cited as: Journal of Information Processing Vol.23(2015) No.2 (online)DOI http://dx.doi.org/10.2197/ipsjjip.23.185------------------------------

Indexing plays an important role for storing and retrieving the data in Information Retrieval System (IRS). Inverted Index is the most frequently used indexing structure in IRS. In order to reduce the size of the index and retrieve the data efficiently, compression schemes are used, because the retrieval of compressed data is faster than uncompressed data. High speed compression schemes can improve the performance of IRS. In this paper, we have studied and analyzed various compression techniques for 32-bit integer sequences. The previously proposed compression schemes achieved either better compression rates or fast decoding, hence their decompression speed (disk access + decoding) might not be better. In this paper, we propose a new compression technique, called Optimal FastPFOR, based on FastPFOR. The proposed method uses better integer representation and storage structure for compressing inverted index to improve the decompression performance. We have used TREC data collection in our experiments and the results show that the proposed code could achieve better compression and decompression compared to FastPFOR and other existing related compression techniques.------------------------------This is a preprint of an article intended for publication Journal ofInformation Processing(JIP). This preprint should not be cited. Thisarticle should be cited as: Journal of Information Processing Vol.23(2015) No.2 (online)DOI http://dx.doi.org/10.2197/ipsjjip.23.185------------------------------

収録刊行物

詳細情報 詳細情報について

  • CRID
    1050282812882444800
  • NII論文ID
    110009884095
  • NII書誌ID
    AN00116647
  • ISSN
    18827764
  • Web Site
    http://id.nii.ac.jp/1001/00122999/
  • 本文言語コード
    en
  • 資料種別
    journal article
  • データソース種別
    • IRDB
    • CiNii Articles

問題の指摘

ページトップへ