期刊
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING
卷 14, 期 4, 页码 673-690出版社
IEEE COMPUTER SOC
DOI: 10.1109/TKDE.2002.1019208
关键词
agglomerative clustering; conceptual clustering; feature weighting; interpretation; knowledge discovery; mixed numeric and nominal data; similarity measures; chi(2) aggregation
This paper presents a Similarity-Based Agglomerative Clustering (SBAC) algorithm that works well for data with mixed numeric and nominal features. Asimilarity measure, proposed by Goodall for biological taxonomy [15], that gives greater weight to uncommon feature value matches in similarity computations and makes no assumptions of the underlying distributions of the feature values, is adopted to define the similarity measure between pairs of objects. An agglomerative algorithm is employed to construct a dendrogram and a simple distinctness heuristic is used to extract a partition of the data. The performance of SBAC has been studied on real and artificially generated data sets. Results demonstrate the effectiveness of this algorithm in unsupervised discovery tasks. Comparisons with other clustering schemes illustrate the superior performance of this approach.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据