4.6 Article

Variable selection for model-based clustering

Journal

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION
Volume 101, Issue 473, Pages 168-178

Publisher

TAYLOR & FRANCIS INC
DOI: 10.1198/016214506000000113

Keywords

Bayes factor; BIC; feature selection; model-based clustering; unsupervised learning; variable selection

Ask authors/readers for more resources

We consider the problem of variable or feature selection for model-based clustering. The problem of comparing two nested subsets of variables is recast as a model comparison problem and addressed using approximate Bayes factors. A greedy search algorithm is proposed for finding a local optimum in model space. The resulting method selects variables (or features), the number of clusters, and the clustering model simultaneously. We applied the method to several simulated and real examples and found that removing irrelevant variables often improved performance. Compared with methods based on all of the variables, our variable selection method consistently yielded more accurate estimates of the number of groups and lower classification error rates, as well as more parsimonious clustering models and easier visualization of results.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available