BOOTSTRAPPING K-MEANS CLUSTERING

Abstract

Independent observations X_1, X_2,…, X_n are made on a distribution F on R^d. To devide these observations into k clusters, first choose a vector of optimal cluster centers b_n=(b_<n1>, b_<n2>, …, b_<nk>) to minimize [numerical formula] as a function of a=(a_1, a_2, …, a_k), then assign each observation to its nearest cluster center. Each b_<nj> is the mean of observations in its cluster. Pollard (1982) obtained a central limit theorem for the means of the k-clusters. In this paper, it is shown that the bootstrap distribution of the centered b_n has the same limiting distribution ; the argument rests on asymptotic behavior of empirical processes on Vapnik-Chervonenkis classes in triangular array setting. Advantages of the bootstrap methods are discussed and the performance of bootstrap confidence sets is compared with Pollard's confidence sets by Monte Carlo simulation.

Journal

Journal of the Japanese Society of Computational Statistics   [List of Volumes]

Journal of the Japanese Society of Computational Statistics 3(1), 1-14, 1990-12  [Table of Contents]

Japanese Society of Computational Statistics

Preview

Preview

Codes

  • NII Article ID (NAID) :
    110001235576
  • NII NACSIS-CAT ID (NCID) :
    AA10823693
  • Text Lang :
    ENG
  • ISSN :
    09152350
  • Databases :
    NII-ELS 

Export