4.7 Article

Epidemiologic information discovery from open-access COVID-19 case reports via pretrained language model

Journal

ISCIENCE
Volume 25, Issue 10, Pages -

Publisher

CELL PRESS
DOI: 10.1016/j.isci.2022.105079

Keywords

-

Funding

  1. National Natural Science Foundation of China [61773091, 62173065, 72025405, 82041020, 91846301]
  2. Liaoning Revitalization Talents Program [XLYC1807106]
  3. Grand Challenges ICODA pilot initiative
  4. Minderoo Foundation
  5. Japan Society for the Promotion of Science KAKENHI [18H03336]
  6. Fundamental Research Funds for the Central Universities [DUT22ZD205]
  7. Bill & Melinda Gates Foundation

Ask authors/readers for more resources

This paper proposes a computational framework that can automatically extract epidemiological information from open-access COVID-19 case reports, and provides an open-access online platform to implement the algorithm.
Although open-access data are increasing common and useful to epidemiological research, curation of such datasets is resource-intensive and time-consuming. Despite a major source of COVID-19 data, the regularly disclosed case reports were often written in natural language with unstructured format. Here we propose a computational framework that can automatically extract epidemiological information from open-access COVID-19 case reports. We develop this framework by coupling language model developed using deep neural networks with training samples compiled using an optimized data annotation strategy. When applying to the COVID-19 case reports collected from mainland China, our novel framework outstrips all other state-of-the-art deep learning models. The information extracted from our approach is highly consistent with that obtained from the gold-standard manual coding, with a matching rate of 80%. To implement our algorithm, we provide an open-access online platform that can accurately estimate epidemiological statistics in real-time with substantially reduced burden in data curation.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available