☆ 3.8 Article

Improved Text Clustering Using k-Mean Bayesian Vectoriser

JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT (2014)

期刊

JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT

卷 13, 期 3, 页码 -

出版社

WORLD SCIENTIFIC PUBL CO PTE LTD

DOI: 10.1142/S0219649214500269

关键词

k-means; naive Bayes; text clustering; Arabic text

类别

Information Science & Library Science

资金

Universiti Teknologi Malaysia (UTM), Ministry of Higher Education (MOHE) Malaysia [03H02, 01G72]
Umm Al-Qura University
Ministry of Higher Education Saudi Arabia

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

In literature studies, high-dimensional data reduces the efficiency of clustering algorithms and maximises execution time. Therefore, in this paper, we propose an approach called a BV-kmeans (Bayesian Vectorisation along with k-means) that aims to improve document representation models for text clustering. This approach consists of integrating the k-means document clustering with the Bayesian Vectoriser that is used to compute the probability distribution of the documents in the vector space in order to overcome the problems of high-dimensional data and lower the consumption time. We have used various similarity measures which are namely: K divergence, Squared Euclidean distance and Squared chi(2) distance in order to determine the effective metrics for modelling the similarity between documents with the proposed approach. We have evaluated the proposed approach on a set of common newspaper websites that have highly dimensional data. Experimental results show that the proposed approach can increase the degree to which a cluster encases documents from a speciffic category by 85%. This is in comparison with the standard k-means algorithm and it has succeeded in lowering the runtime using the proposed approach by 95% compared to the standard k-means algorithm.

Improved Text Clustering Using k-Mean Bayesian Vectoriser

期刊

JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT

出版社

WORLD SCIENTIFIC PUBL CO PTE LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Improved Text Clustering Using k-Mean Bayesian Vectoriser

期刊

JOURNAL OF INFORMATION & KNOWLEDGE MANAGEMENT

出版社

WORLD SCIENTIFIC PUBL CO PTE LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文