4.7 Article

Prediction of missed cleavage sites in tryptic peptides aids protein identification in proteomics

Journal

JOURNAL OF PROTEOME RESEARCH
Volume 6, Issue 1, Pages 399-408

Publisher

AMER CHEMICAL SOC
DOI: 10.1021/pr060507u

Keywords

proteomics; tryptic cleavage; missed cleavage; mass spectrometry; protein identification; scoring systems

Funding

  1. BBSRC [BB/D006996/1] Funding Source: UKRI
  2. Biotechnology and Biological Sciences Research Council [BBS/B/17204, BB/D006996/1] Funding Source: Medline
  3. Biotechnology and Biological Sciences Research Council [BBS/B/17204, BB/D006996/1] Funding Source: researchfish

Ask authors/readers for more resources

Protein identification via peptide mass fingerprinting (PMF) remains a key component of high-throughput proteomics experiments in post-genomic science. Candidate protein identifications are made using bioinformatic tools from peptide peak lists obtained via mass spectrometry (MS). These algorithms rely on several search parameters, including the number of potential uncut peptide bonds matching the primary specificity of the hydrolytic enzyme used in the experiment. Typically, up to one of these missed cleavages are considered by the bioinformatics search tools, usually after digestion of the in silico proteome by trypsin. Using two distinct, nonredundant datasets of peptides identified via PMF and tandem MS, a simple predictive method based on information theory is presented which is able to identify experimentally defined missed cleavages with up to 90% accuracy from amino acid sequence alone. Using this simple protocol, we are able to mask candidate protein databases so that confident missed cleavage sites need not be considered for in silico digestion. We show that that this leads to an improvement in database searching, with two different search engines, using the PMF dataset as a test set. In addition, the improved approach is also demonstrated on an independent PMF data set of known proteins that also has corresponding high-quality tandem MS data, validating the protein identifications. This approach has wider applicability for proteomics database searching, and the program for predicting missed cleavages and masking Fasta-formatted protein sequence databases has been made available via http:// ispider.smith.man.ac uk/MissedCleave.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available