☆ 4.7 Article

Feature Selection Methods for Protein Biomarker Discovery from Proteomics or Multiomics Data

MOLECULAR & CELLULAR PROTEOMICS (2021)

期刊

MOLECULAR & CELLULAR PROTEOMICS

卷 20, 期 -, 页码 -

出版社

ELSEVIER

DOI: 10.1016/j.mcpro.2021.100083

关键词

类别

Biochemical Research Methods

资金

National Cancer Institute [R01CA245903]
Cancer Prevention and Research Institute of Texas [CPRIT RR160027]
McNair Medical Institute at The Robert and Janice McNair Foundation

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

ProMS is a computational algorithm for selecting protein markers based on clustering, showing superior performance in two clinically relevant classification problems compared to existing feature selection methods. It can be extended to the multiomics setting through a constrained weighted k-medoids clustering algorithm, leading to improved performance on independent test data. In addition to performance, ProMS and ProMS_mo provide functional interpretation of selected protein markers and facilitate robust transition to verification and validation platforms.

Untargeted mass spectrometry (MS)-based proteomics provides a powerful platform for protein biomarker discovery, but clinical translation depends on the selection of a small number of proteins for downstream verification and validation. Due to the small sample size of typical discovery studies, protein markers identified from discovery data may not be generalizable to independent datasets. In addition, a good protein marker identified using a discovery platform may be difficult to implement in verification and validation platforms. Moreover, although multiomics characterization is being increasingly used in discovery cohort studies, there is no existing method for multiomics-facilitated protein biomarker selection. Here, we present ProMS, a computational algorithm for protein marker selection. The algorithm is based on the hypothesis that a phenotype is characterized by a few underlying biological functions, each manifested by a group of coexpressed proteins. A weighted k-medoids clustering algorithm is applied to all univariately informative proteins to identify both coexpressed protein clusters and a representative protein for each cluster as markers. In two clinically important classification problems, ProMS shows superior performance compared with existing feature selection methods. ProMS can be extended to the multiomics setting (ProMS_mo) through a constrained weighted k-medoids clustering algorithm, and the protein panels selected by ProMS_mo show improved performance on independent test data compared with ProMS. In addition to superior performance, ProMS and ProMS_mo also have two unique strengths. First, the feature clusters enable functional interpretation of the selected protein markers. Second, the feature clusters provide an opportunity to select replacement protein markers, facilitating a robust transition to the verification and validation platforms. In summary, this study provides a unified and effective computational framework for selecting protein biomarkers using proteomics or multiomics data. The software implementation is publicly available at https://github.com/bzhanglab/proms.

Feature Selection Methods for Protein Biomarker Discovery from Proteomics or Multiomics Data

期刊

MOLECULAR & CELLULAR PROTEOMICS

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Feature Selection Methods for Protein Biomarker Discovery from Proteomics or Multiomics Data

期刊

MOLECULAR & CELLULAR PROTEOMICS

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文