☆ 4.6 Article

Accurate contact-based modelling of repeat proteins predicts the structure of new repeats protein families

PLOS COMPUTATIONAL BIOLOGY (2021)

期刊

PLOS COMPUTATIONAL BIOLOGY

卷 17, 期 4, 页码 -

出版社

PUBLIC LIBRARY SCIENCE

DOI: 10.1371/journal.pcbi.1008798

关键词

类别

Biochemical Research Methods Mathematical & Computational Biology

资金

European Union [823886]
Swedish Natural Science Research Council (Vetenskapsradet) [VR-NT 2016-03798]
SNIC [SNIC 2020/5300]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Repeat proteins are highly prevalent in eukaryotic proteomes and play specific functions, yet are challenging to crystallize. Contact prediction and deep learning methods like trRosetta, DeepMetaPsicov (DMP) and PconsC4 have shown effectiveness in predicting the structure for repeat proteins, with about 90% accuracy in a benchmark dataset of 815 proteins. Furthermore, out of 48 PFAM families lacking protein structures, models for forty-one families have been successfully produced with high accuracy.

Author summary Repeat proteins are widespread among organisms and particularly abundant in eukaryotic proteomes. Their primary sequence presents repetition in the amino acid sequences that origin structures with repeated folds/domains. Although the repeated units often can be recognised from the sequence alone, often structural information is missing. Here, we used contact prediction for predicting the structure of repeats protein directly from their primary sequences. We benchmark the methods on a dataset comprehensive of all the known repeated structures. We evaluate the contact predictions and the obtained models for different classes of repeat proteins. Further, we develop and benchmark a quality assessment (QA) method specific for repeat proteins. Finally, we used the prediction pipeline for all PFAM repeat families without resolved structures and found that forty-one of them could be modelled with high accuracy. Repeat proteins are abundant in eukaryotic proteomes. They are involved in many eukaryotic specific functions, including signalling. For many of these proteins, the structure is not known, as they are difficult to crystallise. Today, using direct coupling analysis and deep learning it is often possible to predict a protein's structure. However, the unique sequence features present in repeat proteins have been a challenge to use direct coupling analysis for predicting contacts. Here, we show that deep learning-based methods (trRosetta, DeepMetaPsicov (DMP) and PconsC4) overcomes this problem and can predict intra- and inter-unit contacts in repeat proteins. In a benchmark dataset of 815 repeat proteins, about 90% can be correctly modelled. Further, among 48 PFAM families lacking a protein structure, we produce models of forty-one families with estimated high accuracy.

Accurate contact-based modelling of repeat proteins predicts the structure of new repeats protein families

期刊

PLOS COMPUTATIONAL BIOLOGY

出版社

PUBLIC LIBRARY SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Accurate contact-based modelling of repeat proteins predicts the structure of new repeats protein families

期刊

PLOS COMPUTATIONAL BIOLOGY

出版社

PUBLIC LIBRARY SCIENCE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文