4.7 Article

A network-based feature extraction model for imbalanced text data

期刊

EXPERT SYSTEMS WITH APPLICATIONS
卷 195, 期 -, 页码 -

出版社

PERGAMON-ELSEVIER SCIENCE LTD
DOI: 10.1016/j.eswa.2022.116600

关键词

Complex Network; CNN; Text Analysis; Imbalanced Data; Random Walk

资金

  1. National Natural Science Foundation of China [71942006, 71621001]
  2. Research Foundation of State Key Laboratory of Railway Traffic Control and Safety, China, Beijing Jiaotong University [RCS2021ZT001]

向作者/读者索取更多资源

This paper introduces a network-based Convolutional Neural Network (NCNN) to address the issue of imbalanced data. The proposed model mitigates the effect of imbalanced data by generating new synthetic samples and improves performance by introducing an additional layer. The effectiveness of the proposed NCNN model on imbalanced text data is demonstrated through experiments.
The explosive growth of text data has attracted many researchers to explore the efficient method to extract valuable hidden information. Many technologies, especially deep learning methods, have achieved great success in text analysis. However, the most powerful methods always require a considerable quantity of data for training, which may suffer from imbalanced data in some cases. In this paper, we propose a network-based Convolution Neural Network (NCNN) to mitigate the effect of imbalanced data. The proposed model first generates new synthetic samples for the imbalanced data based on the random walking of the network. Then an extra layer called Polar Layer is introduced to connect the output from the network model of the text to the classical CNN. Two electing strategies (n-NCNN and x-NCNN) are proposed to improve the performance of NCNN further. In the experimental section, the proposed model is applied to Reuters 21578 and WebKb. By comparing with six approaches, we prove the effectiveness of the proposed NCNN model on the imbalanced text data.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据