4.5 Article

Best K: critical clustering structures in categorical datasets

期刊

KNOWLEDGE AND INFORMATION SYSTEMS
卷 20, 期 1, 页码 1-33

出版社

SPRINGER LONDON LTD
DOI: 10.1007/s10115-008-0159-x

关键词

Categorical data clustering; Entropy; Cluster validation

向作者/读者索取更多资源

The demand on cluster analysis for categorical data continues to grow over the last decade. A well-known problem in categorical clustering is to determine the best K number of clusters. Although several categorical clustering algorithms have been developed, surprisingly, none has satisfactorily addressed the problem of best K for categorical clustering. Since categorical data does not have an inherent distance function as the similarity measure, traditional cluster validation techniques based on geometric shapes and density distributions are not appropriate for categorical data. In this paper, we study the entropy property between the clustering results of categorical data with different K number of clusters, and propose the BKPlot method to address the three important cluster validation problems: (1) How can we determine whether there is significant clustering structure in a categorical dataset? (2) If there is significant clustering structure, what is the set of candidate best Ks? (3) If the dataset is large, how can we efficiently and reliably determine the best Ks?.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据