☆ 4.8 Article

DeepDigest: Prediction of Protein Proteolytic Digestion with Deep Learning

ANALYTICAL CHEMISTRY (2021)

期刊

ANALYTICAL CHEMISTRY

卷 93, 期 15, 页码 6094-6103

出版社

AMER CHEMICAL SOC

DOI: 10.1021/acs.analchem.0c04704

关键词

类别

Chemistry, Analytical

资金

National Natural Science Foundation of China [32070668]
National Key R&D Program of China [2020YFE0202200]
Innovation Foundation of Medicine of China [20SWAQX34]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Proteolytic digestion of proteins is crucial in shotgun proteomics for further analysis, but the completion of digestion is often hindered by missed cleavage sites. DeepDigest, a deep learning algorithm, accurately predicts cleavage sites of various proteases, outperforming traditional machine learning algorithms. Transfer learning further enhances prediction accuracy and reveals interesting characteristics of different proteases.

Proteolytic digestion of proteins by one or more proteases is a key step in shotgun proteomics, in which the proteolytic products, i.e., peptides, are taken as the surrogates of their parent proteins for further qualitative or quantitative analysis. The proteases generally cleave proteins at specific amino acid residue sites, but digestion is hardly complete (wide existence of missed cleavage sites). Therefore, it would be of great help to improve the prior experimental design and the posterior data analysis if the digestion behaviors of proteases can be accurately modeled and predicted. At present, systematic studies about the commonly used proteases in proteomics are insufficient, and there is a lack of easy-to-use tools to predict the cleavage sites of different proteases. Here, we propose a novel sequence-based deep learning algorithm-DeepDigest, which integrates convolutional neural networks and long short-term memory networks for protein digestion prediction. DeepDigest can predict the cleavage probability of each potential cleavage site on the protein sequences for eight popular proteases including trypsin, ArgC, chymotrypsin, GluC, LysC, AspN, LysN, and LysargiNase. We compared DeepDigest with three traditional machine learning algorithms, i.e., logistic regression, random forest, and support vector machine. On the eight training data sets, the 10-fold cross-validation accuracies (AUCs) of DeepDigest were 0.956-0.982, significantly higher than those of the three traditional algorithms. On the 11 independent test data sets, DeepDigest achieved AUCs between 0.849 and 0.978, outperforming the other traditional algorithms in most cases. Transfer learning then further improved the prediction accuracy. Besides, some interesting characteristics of different proteases were revealed and discussed. Ultimately, as an application, we used DeepDigest to predict the digestibilities of peptides and demonstrated that peptide digestibility is an informative new feature to discriminate between correct and incorrect peptide identifications.

DeepDigest: Prediction of Protein Proteolytic Digestion with Deep Learning

期刊

ANALYTICAL CHEMISTRY

出版社

AMER CHEMICAL SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

DeepDigest: Prediction of Protein Proteolytic Digestion with Deep Learning

期刊

ANALYTICAL CHEMISTRY

出版社

AMER CHEMICAL SOC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文