4.7 Article

TF-IGM revisited: Imbalance text classification with relative imbalance ratio

期刊

EXPERT SYSTEMS WITH APPLICATIONS
卷 217, 期 -, 页码 -

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2023.119578

关键词

Class imbalance; Term weighting; Inverse gravity moment; Relative imbalance ratio

向作者/读者索取更多资源

Inverse gravity moment (IGM) is a text classification weighting scheme that focuses on distinguishing terms. However, it fails to address the issue of class imbalance in text classification, where minority classes are often neglected. We propose a method using relative imbalance ratio (RIR) to amplify the scores of terms from minority classes in IGM. Experimental results demonstrate that our method outperforms the original IGM, improved IGM, and other state-of-the-art term weighting schemes in terms of f1-macro results without compromising f1-micro.
Inverse gravity moment (IGM) is a recent term weighting scheme in the text classification literature. The idea is that a distinguishing term should concentrate around preferably one or limited number of classes. IGM considers document frequencies of a term over all classes. However, it cannot handle the class imbalance problem. The natural distribution of documents in the text classification is frequently imbalanced. The classifier generally tend to bias toward majority classes, classes with many samples. Therefore, documents from minority classes might be ignored. In this study, we tackle the class imbalance problem in IGM and propose to use a factor called relative imbalance ratio (RIR). The aim of RIR coefficient is to scale document frequencies of the terms from minority classes in order to amplify the IGM score for the terms from the minority classes. Otherwise, those terms might be dwarfed due to the fact that majority classes have many more documents. Experimental results with three data sets, two of which are imbalanced, show that our proposed method manage to outperform the original IGM method as well as the improved IGM (IIGM) and seven other the state-of-the-art term weighting schemes (TF-ICF, TF-ICSDF, TF-RF, TF-PROB, TF-MONO, RE, AFE-MERT) in terms of f1 -macro results while not comprising f1 -micro.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据