Mixed Factors Analysis: Unsupervized Statistical Discrimination with Kernel Feature Extraction

DOI

Abstract

<p>We address the problem of clustering and feature extraction of exceedingly high-dimensional data, referred to as n ≪ p data, where the dimensionality of the feature space p is much higher than the number of training samples n. For such a sparsely-distributed dataset, direct application of conventional model-based clustering might be impractical due to occurrence of an over-learning. In order to overcome the limit of application, we developed the mixed factors model in Yoshida et al. (2004),which was originally aimed at solving the over-learning problem in the unsupervised discriminant analysis of gene expression profiles. The idea is to extract the feature variables involved in the underlying group structure, and then, train an unsupervized discriminative classifier by using the extracted features which are projected onto the lower-dimensional factor space. By alternating projection and clustering, the method seeks an optimal direction of projection such that the overlap of the projected clusters is small. One main purpose of this paper is to elucidate the statistical machineries of the feature extraction system offered by the mixed factors model. Particularly, we give the connection to Fisher's discriminant analysis and the principal component analysis. After showing some theoretical consequences, we also attempt to present a more generic approach of clustering within the framework of kernel machine learning. By this extension, we can deal with much more complicated shapes of clusters and clustering on the generic feature spaces.</p>

Journal

Details 詳細情報について

Report a problem

Back to top