☆ 4.4 Article

Evaluating semantic similarity methods for comparison of text-derived phenotype profiles

BMC MEDICAL INFORMATICS AND DECISION MAKING (2022)

期刊

BMC MEDICAL INFORMATICS AND DECISION MAKING

卷 22, 期 1, 页码 -

出版社

BMC

DOI: 10.1186/s12911-022-01770-4

关键词

Semantic web; Ontology; Differential diagnosis; MIMIC-III; Semantic similarity

类别

Medical Informatics

资金

NIHR Birmingham ECMC
Nanocommons H2020-EU [731032]
NIHR Birmingham Biomedical Research Centre
MRC HDR UK [HDRUK/CFC/01]
UK Research and Innovation, Department of Health and Social Care (England)
King Abdullah University of Science and Technology (KAUST) Office of Sponsored Research (OSR) [URF/1/3790-01-01]
Medical Research Council [MR/S003991/1]
NIHR Birmingham SRMRC

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Semantic similarity is a valuable tool in biomedical analysis, especially for patient phenotype analysis in clinical tasks. This study developed a reproducible benchmarking platform to evaluate patient phenotype similarity in uncurated phenotype profiles. The results showed that term-specificity and annotation-frequency measures performed the best among the evaluated configurations.

Background Semantic similarity is a valuable tool for analysis in biomedicine. When applied to phenotype profiles derived from clinical text, they have the capacity to enable and enhance 'patient-like me' analyses, automated coding, differential diagnosis, and outcome prediction. While a large body of work exists exploring the use of semantic similarity for multiple tasks, including protein interaction prediction, and rare disease differential diagnosis, there is less work exploring comparison of patient phenotype profiles for clinical tasks. Moreover, there are no experimental explorations of optimal parameters or better methods in the area. Methods We develop a platform for reproducible benchmarking and comparison of experimental conditions for patient phentoype similarity. Using the platform, we evaluate the task of ranking shared primary diagnosis from uncurated phenotype profiles derived from all text narrative associated with admissions in the medical information mart for intensive care (MIMIC-III). Results 300 semantic similarity configurations were evaluated, as well as one embedding-based approach. On average, measures that did not make use of an external information content measure performed slightly better, however the best-performing configurations when measured by area under receiver operating characteristic curve and Top Ten Accuracy used term-specificity and annotation-frequency measures. Conclusion We identified and interpreted the performance of a large number of semantic similarity configurations for the task of classifying diagnosis from text-derived phenotype profiles in one setting. We also provided a basis for further research on other settings and related tasks in the area.

Evaluating semantic similarity methods for comparison of text-derived phenotype profiles

期刊

BMC MEDICAL INFORMATICS AND DECISION MAKING

出版社

BMC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Evaluating semantic similarity methods for comparison of text-derived phenotype profiles

期刊

BMC MEDICAL INFORMATICS AND DECISION MAKING

出版社

BMC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文