☆ 4.6 Article

medExtractR: A targeted, customizable approach to medication extraction from electronic health records

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION (2020)

期刊

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION

卷 27, 期 3, 页码 407-418

出版社

OXFORD UNIV PRESS

DOI: 10.1093/jamia/ocz207

关键词

natural language processing; medication extraction; real world data; medication population study

类别

Computer Science, Information Systems Computer Science, Interdisciplinary Applications Health Care Sciences & Services Information Science & Library Science Medical Informatics

资金

National Institute of General Medical Sciences [R01-GM123109]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Objective: We developed medExtractR, a natural language processing system to extract medication information from clinical notes. Using a targeted approach, medExtractR focuses on individual drugs to facilitate creation of medication-specific research datasets from electronic health records. Materials and Methods: Written using the R programming language, medExtractR combines lexicon dictionaries and regular expressions to identify relevant medication entities (eg, drug name, strength, frequency). MedExtractR was developed on notes from Vanderbilt University Medical Center, using medications prescribed with varying complexity. We evaluated medExtractR and compared it with 3 existing systems: MedEx, MedXN, and CLAMP (Clinical Language Annotation, Modeling, and Processing). We also demonstrated how medExtractR can be easily tuned for better performance on an outside dataset using the MIMIC-III (Medical Information Mart for Intensive Care III) database. Results: On 50 test notes per development drug and 110 test notes for an additional drug, medExtractR achieved high overall performance (F-measures >0.95), exceeding performance of the 3 existing systems across all drugs. MedExtractR achieved the highest F-measure for each individual entity, except drug name and dose amount for allopurinol. With tuning and customization, medExtractR achieved F-measures >0.90 in the MIMIC-III dataset. Discussion: The medExtractR system successfully extracted entities for medications of interest. High performance in entity-level extraction provides a strong foundation for developing robust research datasets for pharmacological research. When working with new datasets, medExtractR should be tuned on a small sample of notes before being broadly applied. Conclusions: The medExtractR system achieved high performance extracting specific medications from clinical text, leading to higher-quality research datasets for drug-related studies than some existing general-purpose medication extraction tools.

medExtractR: A targeted, customizable approach to medication extraction from electronic health records

期刊

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

medExtractR: A targeted, customizable approach to medication extraction from electronic health records

期刊

JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文