☆ 4.1 Article

IMPROVING MULTI-LABEL TEXT CLASSIFICATION USING WEIGHTED INFORMATION GAIN AND CO-TRAINED MULTINOMIAL NAIVE BAYES CLASSIFIER

MALAYSIAN JOURNAL OF COMPUTER SCIENCE (2022)

期刊

MALAYSIAN JOURNAL OF COMPUTER SCIENCE

卷 35, 期 1, 页码 21-36

出版社

UNIV MALAYA, FAC COMPUTER SCIENCE & INFORMATION TECH

DOI: 10.22452/mjcs.vol35no1.2

关键词

Text classification; Multi-label; Feature selection; Weighted Information Gain; Multinomial Naive Bayes

类别

Computer Science, Artificial Intelligence Computer Science, Theory & Methods

资金

University of Malaya [UMRG RP059C 17SBS]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper examines the weighted information gain method for improving text classification accuracy. The proposed algorithm is trained and tested using a corpus from Facebook pages, and incorporates the weighted information gain feature selection technique with a co-trained Naive Bayes classification algorithm. The results show an improvement in classification to 61%.

Over recent years, the emergence of electronic text processing systems has generated a vast amount of structured and unstructured data, thus creating a challenging situation for users to rummage through irrelevant information. Therefore, studies are continually looking to improve the classification process to produce more accurate results that would benefit users. This paper looks into the weighted information gain method that re-assigns wrongly classified features with new weights to provide better classification. The method focuses on the weights of the frequency bins, assuming every time a certain word frequency bin is iterated, it provides information on the target word feature. Therefore, the more iteration and re-assigning of weight occur within the bin, the more important the bin becomes, eventually providing better classification. The proposed algorithm was trained and tested using a corpus extracted from dedicated Facebook pages related to diabetes. The weighted information gain feature selection technique is then fed into a co-trained Multinomial Naive Bayes classification algorithm that captures the labels' dependencies. The algorithm incorporates class value dependencies since the dataset used multi-label data before converting string vectors that allow the sparse distribution between features to be minimised, thus producing more accurate results. The results of this study show an improvement in classification to 61%.

IMPROVING MULTI-LABEL TEXT CLASSIFICATION USING WEIGHTED INFORMATION GAIN AND CO-TRAINED MULTINOMIAL NAIVE BAYES CLASSIFIER

期刊

MALAYSIAN JOURNAL OF COMPUTER SCIENCE

出版社

UNIV MALAYA, FAC COMPUTER SCIENCE & INFORMATION TECH

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

IMPROVING MULTI-LABEL TEXT CLASSIFICATION USING WEIGHTED INFORMATION GAIN AND CO-TRAINED MULTINOMIAL NAIVE BAYES CLASSIFIER

期刊

MALAYSIAN JOURNAL OF COMPUTER SCIENCE

出版社

UNIV MALAYA, FAC COMPUTER SCIENCE & INFORMATION TECH

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文