☆ 4.6 Article

PARAMO: A PARAllel predictive MOdeling platform for healthcare analytic research using electronic health records

JOURNAL OF BIOMEDICAL INFORMATICS (2014)

期刊

JOURNAL OF BIOMEDICAL INFORMATICS

卷 48, 期 -, 页码 160-170

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

DOI: 10.1016/j.jbi.2013.12.012

关键词

Predictive modeling; Electronic health records; Scientific workflows; Parallel computing; Map reduce

类别

Computer Science, Interdisciplinary Applications Medical Informatics

资金

IBM Research
National Center for Advancing Translational Sciences [UL1TR000445]
National Heart, Lung, and Blood Institute [1R01HL116832-01]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Objective: Healthcare analytics research increasingly involves the construction of predictive models for disease targets across varying patient cohorts using electronic health records (EHRs). To facilitate this process, it is critical to support a pipeline of tasks: (1) cohort construction, (2) feature construction, (3) cross-validation, (4) feature selection, and (5) classification. To develop an appropriate model, it is necessary to compare and refine models derived from a diversity of cohorts, patient-specific features, and statistical frameworks. The goal of this work is to develop and evaluate a predictive modeling platform that can be used to simplify and expedite this process for health data. Methods: To support this goal, we developed a PARAllel predictive MOdeling (PARAMO) platform which (1) constructs a dependency graph of tasks from specifications of predictive modeling pipelines, (2) schedules the tasks in a topological ordering of the graph, and (3) executes those tasks in parallel. We implemented this platform using Map-Reduce to enable independent tasks to run in parallel in a cluster computing environment. Different task scheduling preferences are also supported. Results: We assess the performance of PARAMO on various workloads using three datasets derived from the EHR systems in place at Geisinger Health System and Vanderbilt University Medical Center and an anonymous longitudinal claims database. We demonstrate significant gains in computational efficiency against a standard approach. In particular, PARAMO can build 800 different models on a 300,000 patient data set in 3 h in parallel compared to 9 days if running sequentially. Conclusion: This work demonstrates that an efficient parallel predictive modeling platform can be developed for EHR data. This platform can facilitate large-scale modeling endeavors and speed-up the research workflow and reuse of health information. This platform is only a first step and provides the foundation for our ultimate goal of building analytic pipelines that are specialized for health data researchers. (C) 2013 Elsevier Inc. All rights reserved.

PARAMO: A PARAllel predictive MOdeling platform for healthcare analytic research using electronic health records

期刊

JOURNAL OF BIOMEDICAL INFORMATICS

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

PARAMO: A PARAllel predictive MOdeling platform for healthcare analytic research using electronic health records

期刊

JOURNAL OF BIOMEDICAL INFORMATICS

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文