期刊
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
卷 28, 期 8, 页码 2173-2186出版社
IEEE COMPUTER SOC
DOI: 10.1109/TKDE.2016.2551240
关键词
Clustering; external validity index; cluster validation; comparing clusterings; normalization; correction for chance; adjustment for chance
类别
资金
- MOPIS
- Nokia Foundation
Comparing two clustering results of a data set is a challenging task in cluster analysis. Many external validity measures have been proposed in the literature. A good measure should be invariant to the changes of data size, cluster size, and number of clusters. We give an overview of existing set matching indexes and analyze their properties. Set matching measures are based on matching clusters from two clusterings. We analyze the measures in three parts: 1) cluster similarity, 2) matching, and 3) overall measurement. Correction for chance is also investigated and we prove that normalized mutual information and variation of information are intrinsically corrected. We propose a new scheme of experiments based on synthetic data for evaluation of an external validity index. Accordingly, popular external indexes are evaluated and compared when applied to clusterings of different data size, cluster size, and number of clusters. The experiments show that set matching measures are clearly better than the other tested. Based on the analytical comparisons, we introduce a new index called Pair Sets Index (PSI).
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据