4.7 Article

Multi-domain clinical natural language processing with MedCAT: The Medical Concept Annotation Toolkit

Journal

ARTIFICIAL INTELLIGENCE IN MEDICINE
Volume 117, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.artmed.2021.102083

Keywords

Electronic health record information extraction; Clinical natural language processing; Clinical concept embeddings; Clinical ontology embeddings

Ask authors/readers for more resources

MedCAT is an open source Medical Concept Annotation Toolkit that provides a novel self-supervised machine learning algorithm, an annotation interface, and integrations with the broader CogStack ecosystem. It shows improved performance in extracting UMLS concepts and strong transferability between hospitals, datasets, and concept types through real-world validation with self-supervised training and fine-tuning.
Electronic health records (EHR) contain large volumes of unstructured text, requiring the application of information extraction (IE) technologies to enable clinical analysis. We present the open source Medical Concept Annotation Toolkit (MedCAT) that provides: (a) a novel self-supervised machine learning algorithm for extracting concepts using any concept vocabulary including UMLS/SNOMED-CT; (b) a feature-rich annotation interface for customizing and training IE models; and (c) integrations to the broader CogStack ecosystem for vendor-agnostic health system deployment. We show improved performance in extracting UMLS concepts from open datasets (F1:0.448-0.738 vs 0.429-0.650). Further real-world validation demonstrates SNOMED-CT extraction at 3 large London hospitals with self-supervised training over -8.8B words from -17M clinical records and further fine-tuning with -6K clinician annotated examples. We show strong transferability (F1 > 0.94) between hospitals, datasets and concept types indicating cross-domain EHR-agnostic utility for accelerated clinical and research use cases.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available