4.7 Article

EPIC: An epidemiological investigation of COVID-19 dataset for Chinese named entity recognition

期刊

出版社

ELSEVIER SCI LTD
DOI: 10.1016/j.ipm.2023.103541

关键词

Named entity recognition; Epidemiological investigation; Chinese character structural features; COVID-19

向作者/读者索取更多资源

Since the outbreak of COVID-19, this paper proposes a new three-stage annotation method and constructs an epidemiological investigation dataset for Chinese named entity recognition (CNER) to effectively analyze and utilize the reports. The SECCSF method improves the accuracy of segmentation and entity category determination. The experiments show the effectiveness of the SECCSF method on the EPIC dataset.
Since the outbreak of COVID-19, it has had a huge impact on the whole world. In China, there have been a large number of epidemiological investigation reports in response to COVID-19. In order to more effectively analyze and utilize these reports for future large-scale epidemics, this paper proposes a new three-stage annotation method and utilizes the method to construct an epidemiological investigation of COVID-19 dataset for Chinese named entity recognition (CNER)-EPIC (EPidemiological Investigation of COVID-19). EPIC contains 10 categories of named entities, focusing on the travel history of confirmed cases. The corpus of the EPIC consists of 226 official epidemiological investigation reports and the inter-annotator agreement of the EPIC reaches 0.97. Based on EPIC, this paper proposes the Semantic Embedding with Chinese Character Structural Features (SECCSF) method to improve the accuracy of segmentation boundary detection and entity category determination in CNER. In the experimental phase, this paper implements several baselines to conduct experiments on EPIC. The baseline with the SECCSF method achieves an F1 value of 0.892. This indicates the effectiveness of the SECCSF method for the NER task on the EPIC. We release the EPIC at: https://github.com/tinyyhorm/ EPIC.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据