☆ 4.7 Article

ProJect: a powerful mixed-model missing value imputation method

BRIEFINGS IN BIOINFORMATICS (2023)

期刊

BRIEFINGS IN BIOINFORMATICS

卷 -, 期 -, 页码 -

出版社

OXFORD UNIV PRESS

DOI: 10.1093/bib/bbad233

关键词

bioinformatics; missing at random (MAR); missing not at random (MNAR); missing value imputation (MVI); statistics

类别

Biochemical Research Methods Mathematical & Computational Biology

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Missing values can have negative effects on data analysis and machine learning model development. A new mixed-model method called ProJect (Protein inJection) is proposed for missing value imputation, which is an improvement over existing methods. ProJect consistently outperforms other methods in tests on various high-throughput data types. It handles different types of missing values and achieves more accurate and reliable imputation outcomes.

Missing values (MVs) can adversely impact data analysis and machine-learning model development. We propose a novel mixed-model method for missing value imputation (MVI). This method, ProJect (short for Protein inJection), is a powerful and meaningful improvement over existing MVI methods such as Bayesian principal component analysis (PCA), probabilistic PCA, local least squares and quantile regression imputation of left-censored data. We rigorously tested ProJect on various high-throughput data types, including genomics and mass spectrometry (MS)-based proteomics. Specifically, we utilized renal cancer (RC) data acquired using DIA-SWATH, ovarian cancer (OC) data acquired using DIA-MS, bladder (BladderBatch) and glioblastoma (GBM) microarray gene expression dataset. Our results demonstrate that ProJect consistently performs better than other referenced MVI methods. It achieves the lowest normalized root mean square error (on average, scoring 45.92% less error in RC_C, 27.37% in RC_full, 29.22% in OC, 23.65% in BladderBatch and 20.20% in GBM relative to the closest competing method) and the Procrustes sum of squared error (Procrustes SS) (exhibits 79.71% less error in RC_C, 38.36% in RC full, 18.13% in OC, 74.74% in BladderBatch and 30.79% in GBM compared to the next best method). ProJect also leads with the highest correlation coefficient among all types of MV combinations (0.64% higher in RC_C, 0.24% in RC full, 0.55% in OC, 0.39% in BladderBatch and 0.27% in GBM versus the second-best performing method). ProJect's key strength is its ability to handle different types of MVs commonly found in real-world data. Unlike most MVI methods that are designed to handle only one type of MV, ProJect employs a decision-making algorithm that first determines if an MV is missing at random or missing not at random. It then employs targeted imputation strategies for each MV type, resulting in more accurate and reliable imputation outcomes. An R implementation of ProJect is available at .

ProJect: a powerful mixed-model missing value imputation method

期刊

BRIEFINGS IN BIOINFORMATICS

出版社

OXFORD UNIV PRESS

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

ProJect: a powerful mixed-model missing value imputation method

期刊

BRIEFINGS IN BIOINFORMATICS

出版社

OXFORD UNIV PRESS

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文