☆ 4.6 Article

Building an OMOP common data model-compliant annotated corpus for COVID-19 clinical trials

JOURNAL OF BIOMEDICAL INFORMATICS (2021)

期刊

JOURNAL OF BIOMEDICAL INFORMATICS

卷 118, 期 -, 页码 -

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

DOI: 10.1016/j.jbi.2021.103790

关键词

Clinical trial; Eligibility criteria; COVID-19; Structured text corpus; Machine readable dataset

类别

Computer Science, Interdisciplinary Applications Medical Informatics

资金

National Library of Medicine [R01LM009886-11]
National Center for Advancing Clinical and Translational Science grants [UL1TR001873, 3U24TR001579-05]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This study used 700 COVID-19 trials to develop a semi-automatic approach to create an annotated corpus called COVIC for COVID-19 clinical trial eligibility criteria, providing a benchmark for machine learning based criteria extraction and aiding in COVID-19 trial search and analytics.

Clinical trials are essential for generating reliable medical evidence, but often suffer from expensive and delayed patient recruitment because the unstructured eligibility criteria description prevents automatic query generation for eligibility screening. In response to the COVID-19 pandemic, many trials have been created but their information is not computable. We included 700 COVID-19 trials available at the point of study and developed a semi-automatic approach to generate an annotated corpus for COVID-19 clinical trial eligibility criteria called COVIC. A hierarchical annotation schema based on the OMOP Common Data Model was developed to accommodate four levels of annotation granularity: i.e., study cohort, eligibility criteria, named entity and standard concept. In COVIC, 39 trials with more than one study cohorts were identified and labelled with an identifier for each cohort. 1,943 criteria for non-clinical characteristics such as informed consent, exclusivity of participation were annotated. 9767 criteria were represented by 18,161 entities in 8 domains, 7,743 attributes of 7 attribute types and 16,443 relationships of 11 relationship types. 17,171 entities were mapped to standard medical concepts and 1,009 attributes were normalized into computable representations. COVIC can serve as a corpus indexed by semantic tags for COVID-19 trial search and analytics, and a benchmark for machine learning based criteria extraction.

Building an OMOP common data model-compliant annotated corpus for COVID-19 clinical trials

期刊

JOURNAL OF BIOMEDICAL INFORMATICS

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Building an OMOP common data model-compliant annotated corpus for COVID-19 clinical trials

期刊

JOURNAL OF BIOMEDICAL INFORMATICS

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文