4.7 Article Proceedings Paper

Non-parametric estimation of posterior error probabilities associated with peptides identified by tandem mass spectrometry

Journal

BIOINFORMATICS
Volume 24, Issue 16, Pages I42-I48

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btn294

Keywords

-

Funding

  1. NCRR NIH HHS [P41 RR011823-13, P41 RR011823, P41 RR11823] Funding Source: Medline
  2. NIBIB NIH HHS [R01 EB007057] Funding Source: Medline

Ask authors/readers for more resources

Motivation: A mass spectrum produced via tandem mass spectrometry can be tentatively matched to a peptide sequence via database search. Here, we address the problem of assigning a posterior error probability (PEP) to a given peptide-spectrum match (PSM). This problem is considerably more difficult than the related problem of estimating the error rate associated with a large collection of PSMs. Existing methods for estimating PEPs rely on a parametric or semiparametric model of the underlying score distribution. Results: We demonstrate how to apply non-parametric logistic regression to this problem. The method makes no explicit assumptions about the form of the underlying score distribution; instead, the method relies upon decoy PSMs, produced by searching the spectra against a decoy sequence database, to provide a model of the null score distribution. We show that our non-parametric logistic regression method produces accurate PEP estimates for six different commonly used PSM score functions. In particular, the estimates produced by our method are comparable in accuracy to those of PeptideProphet, which uses a parametric or semiparametric model designed specifically to work with SEQUEST. The advantage of the non-parametric approach is applicability and robustness to new score functions and new types of data.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available