☆ 4.5 Article

Why have so few proteomic biomarkers survived validation? (Sample size and independent validation considerations)

PROTEOMICS (2014)

期刊

PROTEOMICS

卷 14, 期 13-14, 页码 1587-1592

出版社

WILEY

DOI: 10.1002/pmic.201300377

关键词

Bioinformatics; Biomarker panels; Cross-validation; Proteomic discovery; Random forest; Sample size

类别

Biochemical Research Methods Biochemistry & Molecular Biology

资金

Irish Research Council
Science Foundation Ireland
Health Research Board [HRA_POR/2011/125]
Irish Cancer Society [PCI11WAT]
Program for Research in Third Level Institutions
Health Research Board (HRB) [HRA-POR-2011-125] Funding Source: Health Research Board (HRB)

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Proteomic biomarker discovery has led to the identification of numerous potential candidates for disease diagnosis, prognosis, and prediction of response to therapy. However, very few of these identified candidate biomarkers reach clinical validation and go on to be routinely used in clinical practice. One particular issue with biomarker discovery is the identification of significantly changing proteins in the initial discovery experiment that do not validate when subsequently tested on separate patient sample cohorts. Here, we seek to highlight some of the statistical challenges surrounding the analysis of LC-MS proteomic data for biomarker candidate discovery. We show that common statistical algorithms run on data with low sample sizes can overfit and yield misleading misclassification rates and AUC values. A common solution to this problem is to prefilter variables (via, e.g. ANOVA and or use of correction methods such as Bonferonni or false discovery rate) to give a smaller dataset and reduce the size of the apparent statistical challenge. However, we show that this exacerbates the problem yielding even higher performance metrics while reducing the predictive accuracy of the biomarker panel. To illustrate some of these limitations, we have run simulation analyses with known biomarkers. For our chosen algorithm (random forests), we show that the above problems are substantially reduced if a sufficient number of samples are analyzed and the data are not prefiltered. Our view is that LC-MS proteomic biomarker discovery data should be analyzed without prefiltering and that increasing the sample size in biomarker discovery experiments should be a very high priority.

Why have so few proteomic biomarkers survived validation? (Sample size and independent validation considerations)

期刊

PROTEOMICS

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Why have so few proteomic biomarkers survived validation? (Sample size and independent validation considerations)

期刊

PROTEOMICS

出版社

WILEY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文