☆ 3.8 Article

An Efficient Topic Modeling Approach for Text Mining and Information Retrieval through K-means Clustering

MEHRAN UNIVERSITY RESEARCH JOURNAL OF ENGINEERING AND TECHNOLOGY (2020)

期刊

MEHRAN UNIVERSITY RESEARCH JOURNAL OF ENGINEERING AND TECHNOLOGY

卷 39, 期 1, 页码 213-222

出版社

MEHRAN UNIV ENGINEERING & TECHNOLOGY

DOI: 10.22581/muet1982.2001.20

关键词

Topic Modeling; Local term weighting; Entropy; Bag-of-words; Principal component analysis; K-means

类别

Engineering, Multidisciplinary

资金

University of Engineering and Technology, Taxila, Pakistan

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Topic modeling is an effective text mining and information retrieval approach to organizing knowledge with various contents under a specific topic. Text documents in form of news articles are increasing very fast on the web. Analysis of these documents is very important in the fields of text mining and information retrieval. Meaningful information extraction from these documents is a challenging task. One approach for discovering the theme from text documents is topic modeling but this approach still needs a new perspective to improve its performance. In topic modeling, documents have topics and topics are the collection of words. In this paper, we propose a new k-means topic modeling (KTM) approach by using the k-means clustering algorithm. KTM discovers better semantic topics from a collection of documents. Experiments on two real-world Reuters 21578 and BBC News datasets show that KTM performance is better than state-of-the-art topic models like LDA (Latent Dirichlet Allocation) and LSA (Latent Semantic Analysis). The KTM is also applicable for classification and clustering tasks in text mining and achieves higher performance with a comparison of its competitors LDA and LSA.

An Efficient Topic Modeling Approach for Text Mining and Information Retrieval through K-means Clustering

期刊

MEHRAN UNIVERSITY RESEARCH JOURNAL OF ENGINEERING AND TECHNOLOGY

出版社

MEHRAN UNIV ENGINEERING & TECHNOLOGY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

An Efficient Topic Modeling Approach for Text Mining and Information Retrieval through K-means Clustering

期刊

MEHRAN UNIVERSITY RESEARCH JOURNAL OF ENGINEERING AND TECHNOLOGY

出版社

MEHRAN UNIV ENGINEERING & TECHNOLOGY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文