☆ 4.7 Article

PUMA: Parallel subspace clustering of categorical data using multi-attribute weights

EXPERT SYSTEMS WITH APPLICATIONS (2019)

期刊

EXPERT SYSTEMS WITH APPLICATIONS

卷 126, 期 -, 页码 233-245

出版社

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.eswa.2019.02.030

关键词

Parallel subspace clustering; Multi-attribute weights; High dimension; Categorical data; MapReduce

类别

Computer Science, Artificial Intelligence Engineering, Electrical & Electronic Operations Research & Management Science

资金

National Natural Science Foundation of P. R. China [61876122]
Science and Technological Innovation Team of Shanxi Province [201805D131007]
U.S. National Science Foundation [CCF-0845257]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

There are two main reasons why traditional clustering schemes are incompetent for high-dimensional categorical data. First, traditional methods usually represent each cluster by all dimensions without difference; and second, traditional clustering methods only rely on an individual dimension of projection as an attribute's weight ignoring relevance among attributes. We solve these two problems by a MapReduce-based subspace clustering algorithm (called PUMA) using multi-attribute weights. The attribute subspaces are constructed in our PUMA by calculating an attribute-value weight based on the co-occurrence probability of attribute values among different dimensions. PUMA obtains sub-clusters corresponding to respective attribute subspaces from each computing node in parallel. Lastly, PUMA measures various scale clusters by applying the hierarchical clustering method to iteratively merge sub-clusters. We implement PUMA on a 24-node Hadoop cluster. Experimental results reveal that using multi-attribute weights with subspace clustering can achieve better clustering accuracy on both synthetic and real-world high dimensional datasets. Experimental results also show that PUMA achieves high performance in terms of extensibility, scalability and the nearly linear speedup with respect to number of nodes. Additionally, experimental results demonstrate that PUMA is reasonable, effective, and practical to expert systems such as knowledge acquisition, word sense disambiguation, automatic abstracting and recommender systems. (C) 2019 Elsevier Ltd. All rights reserved.

PUMA: Parallel subspace clustering of categorical data using multi-attribute weights

期刊

EXPERT SYSTEMS WITH APPLICATIONS

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

PUMA: Parallel subspace clustering of categorical data using multi-attribute weights

期刊

EXPERT SYSTEMS WITH APPLICATIONS

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文