☆ 4.6 Article

A parallel text document clustering algorithm based on neighbors

CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS (2015)

期刊

CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS

卷 18, 期 2, 页码 933-948

出版社

SPRINGER

DOI: 10.1007/s10586-015-0450-z

关键词

Document clustering; Text mining; k-Means; Parallel algorithm; Cluster computing; Performance analysis

类别

Computer Science, Information Systems Computer Science, Theory & Methods

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

In this paper, we propose a new parallel algorithm for text document clustering based on the concept of neighbor (Guha et al. in Inf Syst 25(5): 345-366, 2000). If two documents are similar enough, they are considered as neighbors of each other. The new algorithm is named parallel k-means based on neighbors (PKBN), and it is a parallel version of sequential k-means based on neighbors (SKBN) that we proposed in Luo et al. (Data Knowl Eng 68(11): 1271-1288, 2009). PKBN fully exploits the data-parallelism of SKBN and adopts a new parallel pair-generating method to build the neighbor matrix. Our new parallel pair-generating method causes less communication overhead between processors than existing methods. PKBN is designed for message-passing multiprocessor systems and is implemented on a cluster of Linux workstations to analyze its performance. Our experimental results on real-life data sets demonstrate that PKBN is very efficient and has good scalability with respect to the number of processors and the size of data set.

A parallel text document clustering algorithm based on neighbors

期刊

CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A parallel text document clustering algorithm based on neighbors

期刊

CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文