4.6 Article

Automatic Scientific Document Clustering Using Self-organized Multi-objective Differential Evolution

期刊

COGNITIVE COMPUTATION
卷 11, 期 2, 页码 271-293

出版社

SPRINGER
DOI: 10.1007/s12559-018-9611-8

关键词

Clustering; Cluster validity indices; Self Organizing Map (SOM); Differential Evolution (DE); Polynomial mutation; Multi-objective Optimization (MOO)

资金

  1. SERB Women in Excellence Award [SB/WEA/08/2017]

向作者/读者索取更多资源

Document clustering is the partitioning of a given collection of documents into various K- groups based on some similarity/dissimilarity criterion. This task has applications in scope detection of journals/conferences, development of some automated peer-review support systems, topic-modeling, latest cognitive-inspired works on text summarization, and classification of documents based on semantics, etc. In the current paper, a cognitive-inspired multi-objective automatic document clustering technique is proposed which is a fusion of self-organizing map (SOM) and multi-objective differential evolution approach. The variable number of cluster centers are encoded in different solutions of the population to determine the number of clusters from a data set in an automated way. These solutions undergo various genetic operations during evolution. The concept of SOM is utilized in designing new genetic operators for the proposed clustering technique. In order to measure the goodness of a clustering solution, two cluster validity indices, Pakhira-Bandyopadhyay-Maulik index, and Silhouette index, are optimized simultaneously. The effectiveness of the proposed approach, namely self-organizing map based multi-objective document clustering technique (SMODoc_clust) is shown in automatic classification of some scientific articles and web-documents. Different representation schemas including tf, tf-idf and word-embedding are employed to convert articles in vector-forms. Comparative results with respect to internal cluster validity indices, namely, Dunn index and Davies-Bouldin index, are shown against several state-of-the-art clustering techniques including three multi-objective clustering techniques namely MOCK, VAMOSA, NSGA-II-Clust, single objective genetic algorithm (SOGA) based clustering technique, K-means, and single-linkage clustering. Results obtained clearly show that our approach is better than existing approaches. The validation of the obtained results is also shown using statistical significant t tests.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据