☆ 4.5 Article

A study into patient similarity through representation learning from medical records

KNOWLEDGE AND INFORMATION SYSTEMS (2022)

期刊

KNOWLEDGE AND INFORMATION SYSTEMS

卷 64, 期 12, 页码 3293-3324

出版社

SPRINGER LONDON LTD

DOI: 10.1007/s10115-022-01740-2

关键词

Patient similarity analytics; Patient representation learning; Natural language processing; Health informatics

类别

Computer Science, Artificial Intelligence Computer Science, Information Systems

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This study introduces a new method for representing EMRs based on clinical narratives, utilizing an unsupervised approach to integrate structured and unstructured data extracted from patients' EMRs. A tree structure model is employed to capture temporal relations of multiple medical events, and new relabeling methods for non-leaf nodes are developed to capture temporal aspects. Evaluation showed the proposed model leads to improved performance in patient similarity and mortality prediction tasks compared to baseline methods.

Patient similarity assessment, which identifies patients similar to a given patient, is a fundamental component of many secondary uses of medical data. The assessment can be performed using electronic medical records (EMRs). Patient similarity measurement requires converting heterogeneous EMRs into comparable formats to calculate distance. This study presents a new data representation method for EMRs that considers the information in clinical narratives. To address the limitations of previous approaches in handling complex parts of EMR data, an unsupervised manner is proposed for building a patient representation, which integrates unstructured and structured data extracted from patients' EMRs. We employed a tree structure to model the extracted data that capture the temporal relations of multiple medical events from EMR. We processed clinical notes to extract medical concepts using Python libraries such as MedspaCy and ScispaCy and mapped entities to the Unified Medical Language System (UMLS). To capture temporal aspects of the extracted events, we developed two new relabeling methods for the non-leaf nodes of the tree. To create an embedding vector for each patient, we traversed the tree to generate sequences that the Doc2vec algorithm would use. The comprehensive evaluation of the proposed method for patient similarity and mortality prediction tasks demonstrated that our proposed model leads to lower mean-squared error (MSE), higher precision, and normalized discounted cumulative gain (NDCG) relative to baselines.

A study into patient similarity through representation learning from medical records

期刊

KNOWLEDGE AND INFORMATION SYSTEMS

出版社

SPRINGER LONDON LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A study into patient similarity through representation learning from medical records

期刊

KNOWLEDGE AND INFORMATION SYSTEMS

出版社

SPRINGER LONDON LTD

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文