☆ 4.7 Article

A semiparametric method for clustering mixed data

MACHINE LEARNING (2016)

期刊

MACHINE LEARNING

卷 105, 期 3, 页码 419-458

出版社

SPRINGER

DOI: 10.1007/s10994-016-5575-7

关键词

Clustering; Unsupervised learning; Mixed data; k-means; Finite mixture models; Big data

类别

Computer Science, Artificial Intelligence

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Despite the existence of a large number of clustering algorithms, clustering remains a challenging problem. As large datasets become increasingly common in a number of different domains, it is often the case that clustering algorithms must be applied to heterogeneous sets of variables, creating an acute need for robust and scalable clustering methods for mixed continuous and categorical scale data. We show that current clustering methods for mixed-type data are generally unable to equitably balance the contribution of continuous and categorical variables without strong parametric assumptions. We develop KAMILA (KAy-means for MIxed LArge data), a clustering method that addresses this fundamental problem directly. We study theoretical aspects of our method and demonstrate its effectiveness in a series of Monte Carlo simulation studies and a set of real-world applications.

A semiparametric method for clustering mixed data

期刊

MACHINE LEARNING

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A semiparametric method for clustering mixed data

期刊

MACHINE LEARNING

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文