☆ 4.7 Article

A Chinese text classification based on active

APPLIED SOFT COMPUTING (2024)

Journal

APPLIED SOFT COMPUTING

Volume 150, Issue -, Pages -

Publisher

ELSEVIER

DOI: 10.1016/j.asoc.2023.111067

Keywords

Natural language processing; Deep active learning; Hierarchical confidence; Power text; Knowledge graph

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

In this paper, we propose a Chinese text classification algorithm based on deep active learning for the power system, which addresses the challenge of specialized text classification. By applying a hierarchical confidence strategy, our model achieves higher classification accuracy with fewer labeled training data.

The construction of knowledge graph is beneficial for grid production, electrical safety protection, fault diagnosis and traceability in an observable and controllable way. Highly-precision text classification algorithm is crucial to build a professional knowledge graph in power system. Unfortunately, there are a large number of poorly described and specialized texts in the power business system, and the amount of data containing valid labels in these texts is low. This will bring great challenges to improve the precision of text classification models. To offset the gap, we propose a classification algorithm for Chinese text in the power system based on deep active learning (CCTP-DAL). Our core idea is to apply a hierarchical confidence strategy to a deep active learning model, to balance the trade-offs between the amount of training data and the accuracy of text classification. Our CCTP-DAL (1) trains the Bert model using a small amount of labeled data to calculate the confidence level of each short text, (2) selects high-confidence text data with optimal model generalization capability based on the hierarchical confidence level, and (3) fuses deep learning models and active learning strategies to ensure high text classification accuracy with less labeled training data. We benchmark our model on a real crawler data on the web with extensive experiments. The experimental results demonstrate that our proposed model can achieve higher text classification accuracy with less labeled training data compared with other deep learning models.

A Chinese text classification based on active

Journal

APPLIED SOFT COMPUTING

Publisher

ELSEVIER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A Chinese text classification based on active

Journal

APPLIED SOFT COMPUTING

Publisher

ELSEVIER

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper