4.7 Article

Modeling Lower-Order Statistics to Enable Decoy-Free FDR Estimation in Proteomics

期刊

JOURNAL OF PROTEOME RESEARCH
卷 22, 期 4, 页码 1159-1171

出版社

AMER CHEMICAL SOC
DOI: 10.1021/acs.jproteome.2c00604

关键词

false discovery rate; order statistics; peptide identification

向作者/读者索取更多资源

One of the primary objectives in mass spectrometry based peptide identification in proteomics is to validate top-scoring peptide-spectrum matches (PSMs) statistically by estimating the false discovery rate (FDR). Existing methods rely on decoys to construct a null model to estimate FDR, but this approach increases computational cost and assumes that decoy PSMs are representative of all possible incorrect target PSMs. This study proposes a novel decoy-free procedure using transformed e-value (TEV) scores and the distributions of non-top-scoring target PSMs to develop null models for top-scoring PSMs. The method shows comparable or improved performance in FDR estimation compared to popular decoy-free and decoy-based methods.
One of the chief objectives in mass spectrometry based peptide identification in proteomics is the statistical validation of top-scoring peptide-spectrum matches (PSMs) in the form of false discovery rate (FDR) estimation. Existing methods construct a null model that captures the characteristics of incorrect target PSMs to estimate the FDR, most often with the help of decoys. Decoy-based methods, however, increase the computational cost and rely on the difficult-to-verify assumption that decoy PSMs constitute a sufficient and representative sample of the population of possible incorrect target PSMs. On the other hand, the possibility of FDR estimation assisted by the plentiful non-top-scoring PSMs, which are almost always incorrect, has been scarcely explored. In this work, we propose a novel decoy-free procedure for developing null models for top-scoring PSMs using the transformed e-value (TEV) score and the distributions of non-top-scoring target PSMs. The method relies on a theoretically derivable relationship between the parameters of the distributions of lower-order statistics of the TEV score and a necessary empirical optimization to fit a single parameter to actual data. The framework was tested on multiple different data sets and two search engines. We present evidence that our method is comparable to and occasionally outperforms popular decoy-free and decoy based methods in FDR estimation.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据