☆ 4.4 Article

Acquisition of a Lexicon for Family History Information: Bidirectional Encoder Representations From Transformers-Assisted Sublanguage Analysis

JMIR MEDICAL INFORMATICS (2023)

期刊

JMIR MEDICAL INFORMATICS

卷 11, 期 -, 页码 -

出版社

JMIR PUBLICATIONS, INC

DOI: 10.2196/48072

关键词

electronic health record; natural language processing; family history; sublanguage analysis; rule -based system; deep learning

类别

Medical Informatics

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Without a standardized method to capture family history (FH) information, FH information in electronic health records is difficult to use in data analytics or clinical decision support applications. This study aimed to construct an FH lexical resource for information extraction and normalization. Using a transformer-based method, a lexicon was developed and demonstrated through the development of rule-based and deep learning-based FH systems. The evaluation showed that the rule-based FH system performed well, and combining rule-based and deep learning-based systems improved FH information recall.

Background: A patient's family history (FH) information significantly influences downstream clinical care. Despite this importance, there is no standardized method to capture FH information in electronic health records and a substantial portion of FH information is frequently embedded in clinical notes. This renders FH information difficult to use in downstream data analytics or clinical decision support applications. To address this issue, a natural language processing system capable of extracting and normalizing FH information can be used.Objective: In this study, we aimed to construct an FH lexical resource for information extraction and normalization.Methods: We exploited a transformer-based method to construct an FH lexical resource leveraging a corpus consisting of clinical notes generated as part of primary care. The usability of the lexicon was demonstrated through the development of a rule-based FH system that extracts FH entities and relations as specified in previous FH challenges. We also experimented with a deep learning-based FH system for FH information extraction. Previous FH challenge data sets were used for evaluation.Results: The resulting lexicon contains 33,603 lexicon entries normalized to 6408 concept unique identifiers of the Unified Medical Language System and 15,126 codes of the Systematized Nomenclature of Medicine Clinical Terms, with an average number of 5.4 variants per concept. The performance evaluation demonstrated that the rule-based FH system achieved reasonable performance. The combination of the rule-based FH system with a state-of-the-art deep learning-based FH system can improve the recall of FH information evaluated using the BioCreative/N2C2 FH challenge data set, with the F1 score varied but comparable.Conclusions: The resulting lexicon and rule-based FH system are freely available through the Open Health Natural Language Processing GitHub.

Acquisition of a Lexicon for Family History Information: Bidirectional Encoder Representations From Transformers-Assisted Sublanguage Analysis

期刊

JMIR MEDICAL INFORMATICS

出版社

JMIR PUBLICATIONS, INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Acquisition of a Lexicon for Family History Information: Bidirectional Encoder Representations From Transformers-Assisted Sublanguage Analysis

期刊

JMIR MEDICAL INFORMATICS

出版社

JMIR PUBLICATIONS, INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文