☆ 3.8 Article

Urdu News Clustering Using K-Mean Algorithm On The Basis Of Jaccard Coefficient And Dice Coefficient Similarity

ADCAIJ-ADVANCES IN DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE JOURNAL (2021)

期刊

ADCAIJ-ADVANCES IN DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE JOURNAL

卷 10, 期 4, 页码 381-399

出版社

EDICIONES UNIV SALAMANCA

DOI: 10.14201/ADCAIJ2021104381399

关键词

Urdu News; Clustering Mechanism; Jaccard Coefficient; Dice coefficient; Python; WEKA; K-mean; MSE

类别

Computer Science, Artificial Intelligence

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The research utilized Jaccard and Dice coefficients, along with the k-means algorithm, for similarity analysis and clustering of Urdu news documents, with the experimental results indicating that the Dice coefficient outperforms Jaccard similarity.

Clustering is the unsupervised machine learning process that group data objects into clusters such that objects within the same cluster are highly similar to one another. Every day the quantity of Urdu text is increasing at a high speed on the internet. Grouping Urdu news manually is almost impossible, and there is an utmost need to device a mechanism which cluster Urdu news documents based on their similarity. Clustering Urdu news documents with accuracy is a research issue and it can be solved by using similarity techniques i.e., Jaccard and Dice coefficient, and clustering k-mean algorithm. In this research, the Jaccard and Dice coefficient has been used to find the similarity score of Urdu News documents in python programming language. For the purpose of clustering, the similarity results have been loaded to Waikato Environment for Knowledge Analysis (WEKA), by using k-mean algorithm the Urdu news documents have been clustered into five clusters. The obtained cluster's results were evaluated in terms of Accuracy and Mean Square Error (MSE). The Accuracy and MSE of Jaccard was 85% and 44.4%, while the Accuracy and MSE of Dice coefficient was 87% and 35.76%. The experimental result shows that Dice coefficient is better as compared to Jaccard similarity on the basis of Accuracy and MSE.

Urdu News Clustering Using K-Mean Algorithm On The Basis Of Jaccard Coefficient And Dice Coefficient Similarity

期刊

ADCAIJ-ADVANCES IN DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE JOURNAL

出版社

EDICIONES UNIV SALAMANCA

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Urdu News Clustering Using K-Mean Algorithm On The Basis Of Jaccard Coefficient And Dice Coefficient Similarity

期刊

ADCAIJ-ADVANCES IN DISTRIBUTED COMPUTING AND ARTIFICIAL INTELLIGENCE JOURNAL

出版社

EDICIONES UNIV SALAMANCA

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文