4.6 Article

RSKC: An R Package for a Robust and Sparse K-Means Clustering Algorithm

期刊

JOURNAL OF STATISTICAL SOFTWARE
卷 72, 期 5, 页码 1-26

出版社

JOURNAL STATISTICAL SOFTWARE
DOI: 10.18637/jss.v072.i05

关键词

K-means; robust clustering; sparse clustering; trimmed K-means

向作者/读者索取更多资源

Witten and Tibshirani (2010) proposed an algorithim to simultaneously find clusters and select clustering variables, called sparse K-means (SK-means). SK-means is particularly useful when the dataset has a large fraction of noise variables (that is, variables without useful information to separate the clusters). SK-means works very well on clean and complete data but cannot handle outliers nor missing data. To remedy these problems we introduce a new robust and sparse K-means clustering algorithm implemented in the R package RSKC. We demonstrate the use of our package on four datasets. We also conduct a Monte Carlo study to compare the performances of RSK-means and SK-means regarding the selection of important variables and identification of clusters. Our simulation study shows that RSK-means performs well on clean data and better than SK-means and other competitors on outlier-contaminated data.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据