☆ 4.7 Article

Semi-supervised encoding for outlier detection in clinical observation data

COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE (2019)

期刊

COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE

卷 181, 期 -, 页码 -

出版社

ELSEVIER IRELAND LTD

DOI: 10.1016/j.cmpb.2019.01.002

关键词

Neural Networks; Encoding; Semi-supervised encoding; Outlier detection; Data quality; Electronic Health Records

类别

Computer Science, Interdisciplinary Applications Computer Science, Theory & Methods Engineering, Biomedical Medical Informatics

资金

Patient-Centered Outcomes Research Institute (PCORI) [CDRN-1306-04608]
NIH [R01-HG009174]
NLM [T15LM007092]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Background and Objective: Electronic Health Record (EHR) data often include observation records that are unlikely to represent the truth about a patient at a given clinical encounter. Due to their high throughput, examples of such implausible observations are frequent in records of laboratory test results and vital signs. Outlier detection methods can offer low-cost solutions to flagging implausible EHR observations. This article evaluates the utility of a semi-supervised encoding approach (super-encoding) for constructing non-linear exemplar data distributions from EHR observation data and detecting non-conforming observations as outliers. Methods: Two hypotheses are tested using experimental design and non-parametric hypothesis testing procedures: (1) adding demographic features (e.g., age, gender, race/ethnicity) can increase precision in outlier detection, (2) sampling small subsets of the large EHR data can increase outlier detection by reducing noise-to-signal ratio. The experiments involved applying 492 encoder configurations (involving different input features, architectures, sampling ratios, and error margins) to a set of 30 datasets EHR observations including laboratory tests and vital sign records extracted from the Research Patient Data Registry (RPDR) from Partners HealthCare. Results: Results are obtained from (30 x 492) 14,760 encoders. The semi-supervised encoding approach (super-encoding) outperformed conventional autoencoders in outlier detection. Adding age of the patient at the observation (encounter) to the baseline encoder that only included observation value as the input feature slightly improved outlier detection. Top-nine performing encoders are introduced. The best outlier detection performance was from a semi-supervised encoder, with observation value as the single feature and a single hidden layer, built on one percent of the data and one percent reconstruction error. At least one encoder configurations had a Youden's J index higher than 0.9999 for all 30 observation types. Conclusion: Given the multiplicity of distributions for a single observation in EHR data (i.e., same observation represented with different names or units), as well as non-linearity of human observations, encoding offers huge promises for outlier detection in large-scale data repositories. (C) 2019 Elsevier B.V. All rights reserved.

Semi-supervised encoding for outlier detection in clinical observation data

期刊

COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE

出版社

ELSEVIER IRELAND LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Semi-supervised encoding for outlier detection in clinical observation data

期刊

COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE

出版社

ELSEVIER IRELAND LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文