☆ 4.7 Article

The impact of noise and missing fragmentation cleavages on de novo peptide identification algorithms

COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL (2022)

Journal

COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL

Volume 20, Issue -, Pages 1402-1412

Publisher

ELSEVIER

DOI: 10.1016/j.csbj.2022.03.008

Keywords

De novo peptide sequencing; Machine learning; Peptide identification; Noise; Fragmentation cleavage sites; Peptide fragmentation

Funding

Irish Research Council [GOIPG/2019/1650]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This research compared the performance of two state-of-the-art de novo peptide sequencing algorithms, Novor and DeepNovo, with a focus on their handling of missing fragmentation cleavage sites and noise. The study found that DeepNovo performed better overall than Novor, but Novor recalled more correct amino acids when 6 or more cleavage sites were missing.

Proteomics aims to characterise system-wide protein expression and typically relies on mass-spectrometry and peptide fragmentation, followed by a database search for protein identification. It has wide ranging applications from clinical to environmental settings and virtually impacts on every area of biology. In that context, de novo peptide sequencing is becoming increasingly popular. Historically its performance lagged behind database search methods but with the integration of machine learning, this field of research is gaining momentum. To enable de novo peptide sequencing to realise its full potential, it is critical to explore the mass spectrometry data underpinning peptide identification. In this research we investigate the characteristics of tandem mass spectra using 8 published datasets. We then evaluate two state of the art de novo peptide sequencing algorithms, Novor and DeepNovo, with a particular focus on their performance with regard to missing fragmentation cleavage sites and noise. DeepNovo was found to perform better than Novor overall. However, Novor recalled more correct amino acids when 6 or more cleavage sites were missing. Furthermore, less than 11% of each algorithms' correct peptide predictions emanate from data with more than one missing cleavage site, highlighting the issues missing cleavages pose. We further investigate how the algorithms manage to correctly identify peptides with many of these missing fragmentation cleavages. We show how noise negatively impacts the performance of both algorithms, when high intensity peaks are considered. Finally, we provide recommendations regarding further algorithms' improvements and offer potential avenues to overcome current inherent data limitations. (C) 2022 The Authors. Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

The impact of noise and missing fragmentation cleavages on de novo peptide identification algorithms

Journal

COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL

Publisher

ELSEVIER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

The impact of noise and missing fragmentation cleavages on de novo peptide identification algorithms

Journal

COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL

Publisher

ELSEVIER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper