Dimensionality Reduction of Vector Space Model for Information Retrieval using Simple Principal Component Analysis

Bibliographic Information

Other Title
  • Simple PCAを用いたベクトル空間情報検索モデルの次元削減
  • Simple PCA オ モチイタ ベクトル クウカン ジョウホウ ケンサク モデル ノ ジゲン サクゲン

Search this article

Abstract

In this paper, we propose to use the Simple Principal Component Analysis (SPCA) for dimensionality reduction of the vector space information retrieval model. The SPCA algorithm is a data-oriented fast method which does not require the computation of the variance-covariance matrix. In SPCA, principal components are estimated iteratively so we also propose a criteria to determine the convergence. The optimum number of iterations for each principal component can be determined using the criteria. Experimentally, we show that the SPCA-based method offers improvement over the conventional SVD-based method despite its small amount of computation. This advantage of SPCA can be attributed to its iterative procedure which is similar to clustering methods such as k-means clustering. On the other hand, the proposed method which orthogonalizes the basis vectors also achieved much higher accuracy than the conventional random projection method based on k-means clustering.

Journal

References(8)*help

See more

Related Projects

See more

Details 詳細情報について

Report a problem

Back to top