4.7 Article

Predicting protein-peptide binding residues via interpretable deep learning

Journal

BIOINFORMATICS
Volume 38, Issue 13, Pages 3351-3360

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btac352

Keywords

-

Funding

  1. National Natural Science Foundation of China [62071278]

Ask authors/readers for more resources

In this study, we propose a BERT-based contrastive learning framework called PepBCL for predicting protein-peptide binding residues. This method eliminates the need for complex feature engineering by utilizing a well-pretrained protein language model to automatically extract and learn feature representations. Additionally, a contrastive learning module is used to optimize the feature representations of binding residues within the imbalanced dataset, resulting in improved performance. Experimental results demonstrate that our method outperforms existing techniques, and the integration of traditional features and learned features further enhances performance.
A Summary: Identifying the protein-peptide binding residues is fundamentally important to understand the mechanisms of protein functions and explore drug discovery. Although several computational methods have been developed, most of them highly rely on third-party tools or complex data preprocessing for feature design, easily resulting in low computational efficacy and suffering from low predictive performance. To address the limitations, we propose PepBCL, a novel BERT (Bidirectional Encoder Representation from Transformers) -based contrastive learning framework to predict the protein-peptide binding residues based on protein sequences only. PepBCL is an end-to-end predictive model that is independent of feature engineering. Specifically, we introduce a well pre-trained protein language model that can automatically extract and learn high-latent representations of protein sequences relevant for protein structures and functions. Further, we design a novel contrastive learning module to optimize the feature representations of binding residues underlying the imbalanced dataset. We demonstrate that our proposed method significantly outperforms the state-of-the-art methods under benchmarking comparison, and achieves more robust performance. Moreover, we found that we further improve the performance via the integration of traditional features and our learnt features. Interestingly, the interpretable analysis of our model highlights the flexibility and adaptability of deep learning-based protein language model to capture both conserved and non-conserved sequential characteristics of peptide-binding residues. Finally, to facilitate the use of our method, we establish an online predictive platform as the implementation of the proposed PepBCL, which is now available at http://server.wei-group.net/PepBCL/.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available