A Robust K-Means for Document Clustering

DOI HANDLE Web Site Web Site Open Access
  • Yu, Hengjun
    Department of Communication Design Science, Kyushu University
  • Inoue, Kohei
    Department of Communication Design Science, Kyushu University
  • Hara, Kenji
    Department of Communication Design Science, Kyushu University
  • Urahama, Kiichi
    Department of Communication Design Science, Kyushu University

Bibliographic Information

Other Title
  • Two-step Variable Screening Method for the Mahalanobis-Taguchi Method with Small Training Data

Search this article

Abstract

We propose a robust K-means clustering algorithm for document clustering, where we suppose that a document-term matrix is given as an input dataset, and the documents in the dataset are clustered on the basis of the frequency of terms that occur in each document. We introduce a robust loss function to K-means clustering to obtain its robust version, and also propose a feature transform method for improving the performance of document clustering. Experimental results show that the proposed method improves the robustness of K-means to outliers and the performance of document clustering demonstrated on one of the BBC datasets originating from the BBC News.

Journal

Related Projects

See more

Details 詳細情報について

Report a problem

Back to top