☆ 4.6 Article

Electric Power Audit Text Classification With Multi-Grained Pre-Trained Language Model

IEEE ACCESS (2023)

期刊

IEEE ACCESS

卷 11, 期 -, 页码 13510-13518

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/ACCESS.2023.3240162

关键词

Power systems; Task analysis; Text categorization; Bit error rate; Data models; Computational modeling; Natural language processing; Pre-trained language model; text classification; electric power audit text; natural language processing; masked language model

类别

Computer Science, Information Systems Engineering, Electrical & Electronic Telecommunications

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Electric power audit text classification is an important research problem in electric power systems, and automatic classification methods based on machine learning or deep learning have been applied. However, existing pre-training models usually use general corpus, neglecting texts related to electric power, especially electric power audit texts. This paper proposes EPAT-BERT, a BERT-based model pre-trained using two-granularity pre-training tasks to learn abundant morphology and semantics about electric power. Experimental results show that EPAT-BERT significantly outperforms existing models in various evaluation metrics, indicating its potential for electric power audit text classification.

Electric power audit text classification is one of the important research problem in electric power systems. Recently, kinds of automatic classification methods for these texts based on machine learning or deep learning models have been applied. At present, the development of computing technology makes pre-training and fine-tuning the newest paradigm of text classification, which achieves better results than previous fully-supervised models. Based on pre-training theory, domain-related pre-training tasks can enhance the performance of downstream tasks in the specific domain. However, existing pre-training models usually use general corpus for pre-training, and do not use texts related to the field of electric power, especially electric power audit texts. This results in that the model does not learn too much electric-power-related morphology or semantics in the pre-training stage, so that less information can be used in the fine-tuning stage. Based on the research status, in this paper, we propose EPAT-BERT, a BERT-based model pre-trained by two-granularity pre-training tasks: word-level masked language model and entity-level masked language model. These two tasks predict word and entity in electric-power-related texts to learn abundant morphology and semantics about electric power. We then fine-tune EPAT-BERT for electric power audit text classification task. The experimental results show that, compared with fully supervised machine learning models, neural network models, and general pre-trained language models, EPAT-BERT can significantly outperform existing models in a variety of evaluation metrics. Therefore, EPAT-BERT can be further applied to electric power audit text classification. We also conduct ablation studies to prove the effectiveness of each component in EPAT-BERT to further illustrate our motivations.

Electric Power Audit Text Classification With Multi-Grained Pre-Trained Language Model

期刊

IEEE ACCESS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Electric Power Audit Text Classification With Multi-Grained Pre-Trained Language Model

期刊

IEEE ACCESS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文