☆ 4.2 Article

The next-generation K-means algorithm

STATISTICAL ANALYSIS AND DATA MINING (2018)

期刊

STATISTICAL ANALYSIS AND DATA MINING

卷 11, 期 4, 页码 153-166

出版社

WILEY

DOI: 10.1002/sam.11379

关键词

clusterwise regression; hard classification; K-medians; maximum likelihood; multilevel data; robust clustering; SigClust

类别

Computer Science, Artificial Intelligence Computer Science, Interdisciplinary Applications Statistics & Probability

资金

National Cancer Institute [R01 CA200994, R01 CA211869, U01CA196386]
National Library of Medicine [1R56LM12371-01A1, LM012012-03U01CA196386, R01 LM012012-03]
NATIONAL CANCER INSTITUTE [P01CA190193] Funding Source: NIH RePORTER

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Typically, when referring to a model-based classification, the mixture distribution approach is understood. In contrast, we revive the hard-classification model-based approach developed by Banfield and Raftery (1993) for which K-means is equivalent to the maximum likelihood (ML) estimation. The next-generation K-means algorithm does not end after the classification is achieved, but moves forward to answer the following fundamental questions: Are there clusters, how many clusters are there, what are the statistical properties of the estimated means and index sets, what is the distribution of the coefficients in the clusterwise regression, and how to classify multilevel data? The statistical model-based approach for the K-means algorithm is the key, because it allows statistical simulations and studying the properties of classification following the track of the classical statistics. This paper illustrates the application of the ML classification to testing the no-clusters hypothesis, to studying various methods for selection of the number of clusters using simulations, robust clustering using Laplace distribution, studying properties of the coefficients in clusterwise regression, and finally to multilevel data by marrying the variance components model with K-means.

The next-generation K-means algorithm

期刊

STATISTICAL ANALYSIS AND DATA MINING

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

The next-generation K-means algorithm

期刊

STATISTICAL ANALYSIS AND DATA MINING

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文