☆ 4.6 Article

Iterative Named Entity Recognition with Conditional Random Fields

APPLIED SCIENCES-BASEL (2022)

期刊

APPLIED SCIENCES-BASEL

卷 12, 期 1, 页码 -

出版社

MDPI

DOI: 10.3390/app12010330

关键词

active learning; self-learning; text; annotation; language

类别

Chemistry, Multidisciplinary Engineering, Multidisciplinary Materials Science, Multidisciplinary Physics, Applied

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Named entity recognition (NER) is an important step in processing unstructured text content. This study investigates the use of Conditional Random Fields (CRF) for efficient NER training in German texts through an iterative process. The combination of self-learning with manual annotation-active learning improves the model's F1-score and reduces the amount of manual annotation required to train the model. A model with an F1-score of 0.885 was trained in 11.4 hours.

Named entity recognition (NER) constitutes an important step in the processing of unstructured text content for the extraction of information as well as for the computer-supported analysis of large amounts of digital data via machine learning methods. However, NER often relies on domain-specific knowledge, being conducted manually in a time- and human-resource-intensive process. These can be reduced with statistical models performing NER automatically. The current work investigates whether Conditional Random Fields (CRF) can be efficiently trained for NER in German texts, by means of an iterative procedure combining self-learning with a manual annotation-active learning-component. The training dataset increases continuously with the iterative procedure. Whilst self-learning did not markedly improve the performance of the CRF for NER, the manual annotation of sentences with the lowest probability of correct prediction clearly improved the model F1-score and simultaneously reduced the amount of manual annotation required to train the model. A model with an F1-score of 0.885 was able to be trained in 11.4 h.

Iterative Named Entity Recognition with Conditional Random Fields

期刊

APPLIED SCIENCES-BASEL

出版社

MDPI

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Iterative Named Entity Recognition with Conditional Random Fields

期刊

APPLIED SCIENCES-BASEL

出版社

MDPI

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文