4.6 Article

Iterative Named Entity Recognition with Conditional Random Fields

期刊

APPLIED SCIENCES-BASEL
卷 12, 期 1, 页码 -

出版社

MDPI
DOI: 10.3390/app12010330

关键词

active learning; self-learning; text; annotation; language

向作者/读者索取更多资源

Named entity recognition (NER) is an important step in processing unstructured text content. This study investigates the use of Conditional Random Fields (CRF) for efficient NER training in German texts through an iterative process. The combination of self-learning with manual annotation-active learning improves the model's F1-score and reduces the amount of manual annotation required to train the model. A model with an F1-score of 0.885 was trained in 11.4 hours.
Named entity recognition (NER) constitutes an important step in the processing of unstructured text content for the extraction of information as well as for the computer-supported analysis of large amounts of digital data via machine learning methods. However, NER often relies on domain-specific knowledge, being conducted manually in a time- and human-resource-intensive process. These can be reduced with statistical models performing NER automatically. The current work investigates whether Conditional Random Fields (CRF) can be efficiently trained for NER in German texts, by means of an iterative procedure combining self-learning with a manual annotation-active learning-component. The training dataset increases continuously with the iterative procedure. Whilst self-learning did not markedly improve the performance of the CRF for NER, the manual annotation of sentences with the lowest probability of correct prediction clearly improved the model F1-score and simultaneously reduced the amount of manual annotation required to train the model. A model with an F1-score of 0.885 was able to be trained in 11.4 h.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据