4.8 Article

Detecting Meaningful Clusters From High-Dimensional Data: A Strongly Consistent Sparse Center-Based Clustering Approach

Journal

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TPAMI.2020.3047489

Keywords

Clustering; sparse clustering; feature selection; feature weighting; strong consistency

Ask authors/readers for more resources

In this paper, a simple and efficient sparse clustering algorithm called LW-k-means is proposed for high-dimensional data. The algorithm incorporates feature weighting to enable feature selection and has a time complexity similar to traditional algorithms. The strong consistency of the LW-k-means procedure is also established. Experimental results on synthetic and real-life datasets demonstrate that LW-k-means performs competitively in terms of clustering accuracy and computational time compared to existing methods for center-based high-dimensional clustering.
In context to high-dimensional clustering, the concept of feature weighting has gained considerable importance over the years to capture the relative degrees of importance of different features in revealing the cluster structure of the dataset. However, the popular techniques in this area either fail to perform feature selection or do not preserve the simplicity of Lloyd's heuristic to solve the k-means problem and the like. In this paper, we propose a Lasso Weighted k-means (LW-k-means) algorithm, as a simple yet efficient sparse clustering procedure for high-dimensional data where the number of features (p) can be much higher than the number of observations (n). The LW-k-means method imposes an e l regularization term involving the feature weights directly to induce feature selection in a sparse clustering framework. We develop a simple block-coordinate descent type algorithm with time-complexity resembling that of Lloyd's method, to optimize the proposed objective. In addition, we establish the strong consistency of the LW-k-means procedure. Such an analysis of the large sample properties is not available for the conventional sparse k-means algorithms, in general. LW-k-means is tested on a number of synthetic and real-life datasets and through a detailed experimental analysis, we find that the performance of the method is highly competitive against the baselines as well as the state-of-the-art procedures for center-based high-dimensional clustering, not only in terms of clustering accuracy but also with respect to computational time.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available