☆ 4.7 Article

Biological impact of missing-value imputation on downstream analyses of gene expression profiles

BIOINFORMATICS (2011)

期刊

BIOINFORMATICS

卷 27, 期 1, 页码 78-86

出版社

OXFORD UNIV PRESS

DOI: 10.1093/bioinformatics/btq613

关键词

类别

Biochemical Research Methods Biotechnology & Applied Microbiology Computer Science, Interdisciplinary Applications Mathematical & Computational Biology Statistics & Probability

资金

University of Louisville Cardinal Research Cluster
National Institutes of Health [KL2 RR024154-03, P30ES014443, P20RR017702, RC2AA019385-01]
University of Pittsburgh
Department of Energy [10EM00542]
NATIONAL CENTER FOR RESEARCH RESOURCES [KL2RR024154, P20RR017702] Funding Source: NIH RePORTER
NATIONAL INSTITUTE OF ENVIRONMENTAL HEALTH SCIENCES [P30ES014443] Funding Source: NIH RePORTER
NATIONAL INSTITUTE ON ALCOHOL ABUSE AND ALCOHOLISM [RC2AA019385] Funding Source: NIH RePORTER

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Motivation: Microarray experiments frequently produce multiple missing values (MVs) due to flaws such as dust, scratches, insufficient resolution or hybridization errors on the chips. Unfortunately, many downstream algorithms require a complete data matrix. The motivation of this work is to determine the impact of MV imputation on downstream analysis, and whether ranking of imputation methods by imputation accuracy correlates well with the biological impact of the imputation. Methods: Using eight datasets for differential expression (DE) and classification analysis and eight datasets for gene clustering, we demonstrate the biological impact of missing-value imputation on statistical downstream analyses, including three commonly employed DE methods, four classifiers and three gene-clustering methods. Correlation between the rankings of imputation methods based on three root-mean squared error (RMSE) measures and the rankings based on the downstream analysis methods was used to investigate which RMSE measure was most consistent with the biological impact measures, and which downstream analysis methods were the most sensitive to the choice of imputation procedure. Results: DE was the most sensitive to the choice of imputation procedure, while classification was the least sensitive and clustering was intermediate between the two. The logged RMSE (LRMSE) measure had the highest correlation with the imputation rankings based on the DE results, indicating that the LRMSE is the best representative surrogate among the three RMSE-based measures. Bayesian principal component analysis and least squares adaptive appeared to be the best performing methods in the empirical downstream evaluation.

Biological impact of missing-value imputation on downstream analyses of gene expression profiles

期刊

BIOINFORMATICS

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Biological impact of missing-value imputation on downstream analyses of gene expression profiles

期刊

BIOINFORMATICS

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文