4.7 Article

PUMA: Parallel subspace clustering of categorical data using multi-attribute weights

期刊

EXPERT SYSTEMS WITH APPLICATIONS
卷 126, 期 -, 页码 233-245

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2019.02.030

关键词

Parallel subspace clustering; Multi-attribute weights; High dimension; Categorical data; MapReduce

资金

  1. National Natural Science Foundation of P. R. China [61876122]
  2. Science and Technological Innovation Team of Shanxi Province [201805D131007]
  3. U.S. National Science Foundation [CCF-0845257]

向作者/读者索取更多资源

There are two main reasons why traditional clustering schemes are incompetent for high-dimensional categorical data. First, traditional methods usually represent each cluster by all dimensions without difference; and second, traditional clustering methods only rely on an individual dimension of projection as an attribute's weight ignoring relevance among attributes. We solve these two problems by a MapReduce-based subspace clustering algorithm (called PUMA) using multi-attribute weights. The attribute subspaces are constructed in our PUMA by calculating an attribute-value weight based on the co-occurrence probability of attribute values among different dimensions. PUMA obtains sub-clusters corresponding to respective attribute subspaces from each computing node in parallel. Lastly, PUMA measures various scale clusters by applying the hierarchical clustering method to iteratively merge sub-clusters. We implement PUMA on a 24-node Hadoop cluster. Experimental results reveal that using multi-attribute weights with subspace clustering can achieve better clustering accuracy on both synthetic and real-world high dimensional datasets. Experimental results also show that PUMA achieves high performance in terms of extensibility, scalability and the nearly linear speedup with respect to number of nodes. Additionally, experimental results demonstrate that PUMA is reasonable, effective, and practical to expert systems such as knowledge acquisition, word sense disambiguation, automatic abstracting and recommender systems. (C) 2019 Elsevier Ltd. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据