☆ 4.7 Article

A comparative study of evaluating missing value imputation methods in label-free proteomics

SCIENTIFIC REPORTS (2021)

期刊

SCIENTIFIC REPORTS

卷 11, 期 1, 页码 -

出版社

NATURE PORTFOLIO

DOI: 10.1038/s41598-021-81279-4

关键词

类别

Multidisciplinary Sciences

资金

AbbVie

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The presence of missing values in label-free quantitative proteomics significantly reduces data completeness, and the choice of imputation method is crucial for accuracy and reliability. A comparative study evaluated seven popular imputation methods, finding that accuracy is primarily influenced by MNAR rates. Random forest-based imputation consistently outperformed other methods, demonstrating the highest accuracy and impact on downstream analysis.

The presence of missing values (MVs) in label-free quantitative proteomics greatly reduces the completeness of data. Imputation has been widely utilized to handle MVs, and selection of the proper method is critical for the accuracy and reliability of imputation. Here we present a comparative study that evaluates the performance of seven popular imputation methods with a large-scale benchmark dataset and an immune cell dataset. Simulated MVs were incorporated into the complete part of each dataset with different combinations of MV rates and missing not at random (MNAR) rates. Normalized root mean square error (NRMSE) was applied to evaluate the accuracy of protein abundances and intergroup protein ratios after imputation. Detection of true positives (TPs) and false altered-protein discovery rate (FADR) between groups were also compared using the benchmark dataset. Furthermore, the accuracy of handling real MVs was assessed by comparing enriched pathways and signature genes of cell activation after imputing the immune cell dataset. We observed that the accuracy of imputation is primarily affected by the MNAR rate rather than the MV rate, and downstream analysis can be largely impacted by the selection of imputation methods. A random forest-based imputation method consistently outperformed other popular methods by achieving the lowest NRMSE, high amount of TPs with the average FADR<5%, and the best detection of relevant pathways and signature genes, highlighting it as the most suitable method for label-free proteomics.

A comparative study of evaluating missing value imputation methods in label-free proteomics

期刊

SCIENTIFIC REPORTS

出版社

NATURE PORTFOLIO

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A comparative study of evaluating missing value imputation methods in label-free proteomics

期刊

SCIENTIFIC REPORTS

出版社

NATURE PORTFOLIO

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文