☆ 4.7 Article

A Critical Assessment of Feature Selection Methods for Biomarker Discovery in Clinical Proteomics

MOLECULAR & CELLULAR PROTEOMICS (2013)

期刊

MOLECULAR & CELLULAR PROTEOMICS

卷 12, 期 1, 页码 263-276

出版社

AMER SOC BIOCHEMISTRY MOLECULAR BIOLOGY INC

DOI: 10.1074/mcp.M112.022566

关键词

类别

Biochemical Research Methods

资金

Netherlands Bioinformatics Center
joint Gaining Momentum Initiative of the Netherlands Bioinformatics
Netherlands Proteomics Centers

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

In this paper, we compare the performance of six different feature selection methods for LC-MS-based proteomics and metabolomics biomarker discovery-t test, the Mann-Whitney-Wilcoxon test (mww test), nearest shrunken centroid (NSC), linear support vector machine-recursive features elimination (SVM-RFE), principal component discriminant analysis (PCDA), and partial least squares discriminant analysis (PLSDA)-using human urine and porcine cerebrospinal fluid samples that were spiked with a range of peptides at different concentration levels. The ideal feature selection method should select the complete list of discriminating features that are related to the spiked peptides without selecting unrelated features. Whereas many studies have to rely on classification error to judge the reliability of the selected biomarker candidates, we assessed the accuracy of selection directly from the list of spiked peptides. The feature selection methods were applied to data sets with different sample sizes and extents of sample class separation determined by the concentration level of spiked compounds. For each feature selection method and data set, the performance for selecting a set of features related to spiked compounds was assessed using the harmonic mean of the recall and the precision (f-score) and the geometric mean of the recall and the true negative rate (g-score). We conclude that the univariate t test and the mww test with multiple testing corrections are not applicable to data sets with small sample sizes (n = 6), but their performance improves markedly with increasing sample size up to a point (n > 12) at which they outperform the other methods. PCDA and PLSDA select small feature sets with high precision but miss many true positive features related to the spiked peptides. NSC strikes a reasonable compromise between recall and precision for all data sets independent of spiking level and number of samples. Linear SVM-RFE performs poorly for selecting features related to the spiked compounds, even though the classification error is relatively low. Molecular & Cellular Proteomics 12: 10.1074/mcp.M112.022566, 263-276, 2013.

A Critical Assessment of Feature Selection Methods for Biomarker Discovery in Clinical Proteomics

期刊

MOLECULAR & CELLULAR PROTEOMICS

出版社

AMER SOC BIOCHEMISTRY MOLECULAR BIOLOGY INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A Critical Assessment of Feature Selection Methods for Biomarker Discovery in Clinical Proteomics

期刊

MOLECULAR & CELLULAR PROTEOMICS

出版社

AMER SOC BIOCHEMISTRY MOLECULAR BIOLOGY INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文