4.7 Article

Improving deep learning-based protein distance prediction in CASP14

Journal

BIOINFORMATICS
Volume 37, Issue 19, Pages 3190-3196

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btab355

Keywords

-

Funding

  1. National Science Foundation [DBI1759934, IIS1763246]
  2. one National Institutes of Health [GM093123]
  3. United States Department of Energy [DE-SC0020400, DE-SC0021303]
  4. Oak Ridge Leadership Computing Facility [BIF132]

Ask authors/readers for more resources

The research aims to develop deep learning methods for accurately predicting residue-residue distances in proteins and achieve good performance in CASP14. The quality and depth of MSAs have a significant impact on the accuracy of distance prediction. Using larger training datasets and multiple complementary features improves prediction accuracy.
Motivation: Accurate prediction of residue-residue distances is important for protein structure prediction. We developed several protein distance predictors based on a deep learning distance prediction method and blindly tested them in the 14th Critical Assessment of Protein Structure Prediction (CASP14). The prediction method uses deep residual neural networks with the channel-wise attention mechanism to classify the distance between every two residues into multiple distance intervals. The input features for the deep learning method include co-evolutionary features as well as other sequence-based features derived from multiple sequence alignments (MSAs). Three alignment methods are used with multiple protein sequence/profile databases to generate MSAs for input feature generation. Based on different configurations and training strategies of the deep learning method, five MULTICOM distance predictors were created to participate in the CASP14 experiment. Results: Benchmarked on 37 hard CASP14 domains, the best performing MULTICOM predictor is ranked 5th out of 30 automated CASP14 distance prediction servers in terms of precision of top L/5 long-range contact predictions [i.e. classifying distances between two residues into two categories: in contact (<8 Angstrom) and not in contact otherwise] and performs better than the best CASP13 distance prediction method. The best performing MULTICOM predictor is also ranked 6th among automated server predictors in classifying inter-residue distances into 10 distance intervals defined by CASP14 according to the precision of distance classification. The results show that the quality and depth of MSAs depend on alignment methods and sequence databases and have a significant impact on the accuracy of distance prediction. Using larger training datasets and multiple complementary features improves prediction accuracy. However, the number of effective sequences in MSAs is only a weak indicator of the quality of MSAs and the accuracy of predicted distance maps. In contrast, there is a strong correlation between the accuracy of contact/distance predictions and the average probability of the predicted contacts, which can therefore be more effectively used to estimate the confidence of distance predictions and select predicted distance maps.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available