☆ 4.7 Article

A Pipeline for Integrated Theory and Data-Driven Modeling of Biomedical Data

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS (2021)

Journal

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS

Volume 18, Issue 3, Pages 811-822

Publisher

IEEE COMPUTER SOC

DOI: 10.1109/TCBB.2020.3019237

Keywords

Bioinformatics; Graphical models; Feature extraction; Biological system modeling; Computational modeling; Data models; Genomics; Genomics; graphical models; feature selection; phenotype prediction

Funding

NIH [U01HL137159, R01LM012087, T32CA082084]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

Genome sequencing technologies have the potential to transform clinical decision making and biomedical research by enabling high-throughput measurements of the genome at a granular level. It is important to integrate high-throughput genomic data with demographic, phenotypic, environmental, and behavioral information, and infer relationships between these data types for better understanding of disease mechanisms and prediction of medical interventions. A new methodology called piPref-Div has been proposed to select informative variables for probabilistic graphical models, improving breast cancer outcome prediction and providing biologically interpretable views of gene expression data.

Genome sequencing technologies have the potential to transform clinical decision making and biomedical research by enabling high-throughput measurements of the genome at a granular level. However, to truly understand mechanisms of disease and predict the effects of medical interventions, high-throughput data must be integrated with demographic, phenotypic, environmental, and behavioral data from individuals. Further, effective knowledge discovery methods must infer relationships between these data types. We recently proposed a pipeline (CausalMGM) to achieve this. CausalMGM uses probabilistic graphical models to infer the relationships between variables in the data; however, CausalMGM's graphical structure learning algorithm can only handle small datasets efficiently. We propose a new methodology (piPref-Div) that selects the most informative variables for CausalMGM, enabling it to scale. We validate the efficacy of piPref-Div against other feature selection methods and demonstrate how the use of the full pipeline improves breast cancer outcome prediction and provides biologically interpretable views of gene expression data.

A Pipeline for Integrated Theory and Data-Driven Modeling of Biomedical Data

Journal

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A Pipeline for Integrated Theory and Data-Driven Modeling of Biomedical Data

Journal

IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper