☆ 4.1 Article

Frequency-sensitive competitive learning for scalable balanced clustering on high-dimensional hyperspheres

IEEE TRANSACTIONS ON NEURAL NETWORKS (2004)

Journal

IEEE TRANSACTIONS ON NEURAL NETWORKS

Volume 15, Issue 3, Pages 702-719

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TNN.2004.824416

Keywords

balanced clustering; expectation maximization (EM); frequency-sensitive competitive learning (FSCL); high-dimensional clustering; kmeans; normalized data; scalable clustering; streaming data; text clustering

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Competitive learning mechanisms for clustering, in general, suffer from poor performance for very high-dimensional (>1000) data because of curse of dimensionality effects. In applications such as document clustering, it is customary to normalize the high-dimensional input vectors to unit length, and it is sometimes also desirable to obtain balanced clusters, i.e., clusters of comparable sizes. The spherical kmeans (S(D)kmeans) algorithm, which normalizes the cluster centers as well as the inputs, has been successfully used to cluster normalized text documents in 2000+ dimensional space. Unfortunately, like regular kmeans and its soft expectation-maximization-based version, S(D)kmeans tends to generate extremely imbalanced clusters in high-dimensional spaces when the desired number of clusters is large (tens or more). This paper first shows that the s(D)kmeans algorithm can be derived from a certain maximum likelihood formulation using a mixture of von Mises-Fisher distributions as the generative model, and in fact, it can be considered as a batch-mode version of (normalized) competitive learning. The proposed generative model is then adapted in a principled way to yield three frequency-sensitive competitive learning variants that are applicable to static data and produced high-quality and well-balanced clusters for high-dimensional data. Like kmeans, each iteration is linear in the number of data points and in the number of clusters for all the three algorithms. A frequency-sensitive algorithm to cluster streaming data is also proposed. Experimental results on clustering of high-dimensional text data sets are provided to show the effectiveness and applicability of the proposed techniques.

Frequency-sensitive competitive learning for scalable balanced clustering on high-dimensional hyperspheres

Journal

IEEE TRANSACTIONS ON NEURAL NETWORKS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Frequency-sensitive competitive learning for scalable balanced clustering on high-dimensional hyperspheres

Journal

IEEE TRANSACTIONS ON NEURAL NETWORKS

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper