☆ 4.6 Article

An open natural language processing (NLP) framework for EHR-based clinical research: a case demonstration using the National COVID Cohort Collaborative (N3C)

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION (2023)

Journal

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION

Volume -, Issue -, Pages -

Publisher

OXFORD UNIV PRESS

DOI: 10.1093/jamia/ocad134

Keywords

electronic healthy records; natural language processing; federated learning; multi-institutional data annotation

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Despite recent advancements in clinical natural language processing (NLP), the adoption of clinical NLP models in translational research is hindered by process heterogeneity and human factor variations. Developing NLP models in multi-site settings is challenging, but essential for algorithm robustness and generalizability. This study reports on the development of an NLP solution for COVID-19 signs and symptom extraction using an open NLP framework, highlighting the benefits of multi-site data and the need for federated annotation and evaluation to overcome challenges.

Despite recent methodology advancements in clinical natural language processing (NLP), the adoption of clinical NLP models within the translational research community remains hindered by process heterogeneity and human factor variations. Concurrently, these factors also dramatically increase the difficulty in developing NLP models in multi-site settings, which is necessary for algorithm robustness and generalizability. Here, we reported on our experience developing an NLP solution for Coronavirus Disease 2019 (COVID-19) signs and symptom extraction in an open NLP framework from a subset of sites participating in the National COVID Cohort (N3C). We then empirically highlight the benefits of multi-site data for both symbolic and statistical methods, as well as highlight the need for federated annotation and evaluation to resolve several pitfalls encountered in the course of these efforts.

An open natural language processing (NLP) framework for EHR-based clinical research: a case demonstration using the National COVID Cohort Collaborative (N3C)

Journal

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

An open natural language processing (NLP) framework for EHR-based clinical research: a case demonstration using the National COVID Cohort Collaborative (N3C)

Journal

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper