4.7 Article

Flexible Data Analysis Pipeline for High-Confidence Proteogenomics

Journal

JOURNAL OF PROTEOME RESEARCH
Volume 15, Issue 12, Pages 4686-4695

Publisher

AMER CHEMICAL SOC
DOI: 10.1021/acs.jproteome.6b00765

Keywords

proteogenomics; bioinformatics; workflow; mass spectrometry; genome annotation; testis

Funding

  1. Wellcome Trust [WT098051]
  2. National Institutes of Health [U41HG007234]

Ask authors/readers for more resources

Proteogenomics leverages information derived from proteomic data to improve genome annotations. Of particular interest are novel peptides that provide direct evidence of protein expression for genomic regions not previously annotated as protein-coding. We present a modular, automated data analysis pipeline aimed at detecting such novel peptides in proteomic data sets. This pipeline implements criteria developed by proteomics and genome annotation experts for high-stringency peptide identification and filtering. Our pipeline is based on the OpenMS computational framework; it incorporates multiple database search engines for peptide identification and applies a machine-learning approach (Percolator) to post-process search results. We describe several new and improved software tools that we developed to facilitate proteogenomic analyses that enhance the wealth of tools provided by OpenMS. We demonstrate the application of our pipeline to a human testis tissue data set previously acquired for the Chromosome-Centric Human Proteome Project, which led to the addition of five new gene annotations on the human reference genome.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available