期刊
INFORMATION PROCESSING & MANAGEMENT
卷 59, 期 4, 页码 -出版社
ELSEVIER SCI LTD
DOI: 10.1016/j.ipm.2022.102946
关键词
Text classification; Graph Neural Networks; Graph contrastive learning; Data augmentation
资金
- National Natural Science Foundation of China [61872161, 61976103]
- Foundation of the National Key Research and Development of China [2021ZD0112500]
- Nature Science Foundation of Jilin Province [20200201297JC]
- Foundation of Development and Reform of Jilin Province [2019C053-8]
- Foundation of Jilin Educational Committee [JJKH20191257KJ]
- Interdisciplinary and integrated innovation of JLU [JLUXKJC2020207]
- Fundamental Research Funds for the Central Universities , JLU
CGA2TC is a new graph-based model for text classification that combines contrastive learning and adaptive augmentation strategy to obtain more robust node representation. It constructs a text graph using word co-occurrence and document word relationships and designs an augmentation strategy to solve the noise problem and preserve essential structures. The model handles labeled and unlabeled nodes differently and adopts random sampling to reduce resource consumption. Experimental results demonstrate the effectiveness of CGA2TC in text classification tasks.
Text classification is an important research topic in natural language processing (NLP), and Graph Neural Networks (GNNs) have recently been applied in this task. However, in existing graph-based models, text graphs constructed by rules are not real graph data and introduce massive noise. More importantly, for fixed corpus-level graph structure, these models cannot sufficiently exploit the labeled and unlabeled information of nodes. Meanwhile, contrastive learning has been developed as an effective method in graph domain to fully utilize the information of nodes. Therefore, we propose a new graph-based model for text classification named CGA2TC, which introduces contrastive learning with an adaptive augmentation strategy into obtaining more robust node representation. First, we explore word co-occurrence and document word relationships to construct a text graph. Then, we design an adaptive augmentation strategy for the text graph with noise to generate two contrastive views that effectively solve the noise problem and preserve essential structure. Specifically, we design noise-based and centrality-based augmentation strategies on the topological structure of text graph to disturb the unimportant connections and thus highlight the relatively important edges. As for the labeled nodes, we take the nodes with same label as multiple positive samples and assign them to anchor node, while we employ consistency training on unlabeled nodes to constrain model predictions. Finally, to reduce the resource consumption of contrastive learning, we adopt a random sample method to select some nodes to calculate contrastive loss. The experimental results on several benchmark datasets can demonstrate the effectiveness of CGA2TC on the text classification task.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据