☆ 4.6 Article

IFF: Identifying key residues in intrinsically disordered regions of proteins using machine learning

PROTEIN SCIENCE (2023)

期刊

PROTEIN SCIENCE

卷 32, 期 9, 页码 -

出版社

WILEY

DOI: 10.1002/pro.4739

关键词

intrinsically disordered proteins; liquid-liquid phase separation; unsupervised contrastive machine learning

类别

Biochemistry & Molecular Biology

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Conserved residues in protein homolog sequence alignments are important for structure or function. However, alignment often fails for intrinsically disordered proteins or regions due to lack of structure. This study proposes a method to retrieve common features in intrinsically disordered regions to identify functionally important residues. The trained model successfully predicts critical residues, allowing for improved understanding of protein functions.

Conserved residues in protein homolog sequence alignments are structurally or functionally important. For intrinsically disordered proteins or proteins with intrinsically disordered regions (IDRs), however, alignment often fails because they lack a steric structure to constrain evolution. Although sequences vary, the physicochemical features of IDRs may be preserved in maintaining function. Therefore, a method to retrieve common IDR features may help identify functionally important residues. We applied unsupervised contrastive learning to train a model with self-attention neuronal networks on human IDR orthologs. Parameters in the model were trained to match sequences in ortholog pairs but not in other IDRs. The trained model successfully identifies previously reported critical residues from experimental studies, especially those with an overall pattern (e.g., multiple aromatic residues or charged blocks) rather than short motifs. This predictive model can be used to identify potentially important residues in other proteins, improving our understanding of their functions. The trained model can be run directly from the Jupyter Notebook in the GitHub repository using Binder (). The only required input is the primary sequence. The training scripts are available on GitHub (). The training datasets have been deposited in an Open Science Framework repository ().

IFF: Identifying key residues in intrinsically disordered regions of proteins using machine learning

期刊

PROTEIN SCIENCE

出版社

WILEY

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

IFF: Identifying key residues in intrinsically disordered regions of proteins using machine learning

期刊

PROTEIN SCIENCE

出版社

WILEY

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文