4.7 Article

Alignment-free metal ion-binding site prediction from protein sequence through pretrained language model and multi-task learning

期刊

BRIEFINGS IN BIOINFORMATICS
卷 23, 期 6, 页码 -

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bib/bbac444

关键词

metal ion-binding site; alignment-free; pretrained language model; multi-task learning

资金

  1. National Key R&D Program of China [2020YFB0204803]
  2. National Natural Science Foundation of China [61772566, 62041209]
  3. Guangdong Key Field RD Plan [2019B020228001, 2018B010109006]
  4. Introducing Innovative and Entrepreneurial Teams [2016ZT06D211]
  5. Guangzhou ST Research Plan [202007030010]

向作者/读者索取更多资源

LMetalSite is an alignment-free sequence-based predictor for metal ion-binding sites. It leverages pretrained language models and transformers to improve prediction accuracy, and adopts multi-task learning to compensate for the scarcity of training data and capture the intrinsic similarities between different metal ions.
More than one-third of the proteins contain metal ions in the Protein Data Bank. Correct identification of metal ion-binding residues is important for understanding protein functions and designing novel drugs. Due to the small size and high versatility of metal ions, it remains challenging to computationally predict their binding sites from protein sequence. Existing sequence-based methods are of low accuracy due to the lack of structural information, and time-consuming owing to the usage of multi-sequence alignment. Here, we propose LMetalSite, an alignment-free sequence-based predictor for binding sites of the four most frequently seen metal ions in BioLiP (Zn2+, Ca2+, Mg2+ and Mn2+). LMetalSite leverages the pretrained language model to rapidly generate informative sequence representations and employs transformer to capture long-range dependencies. Multi-task learning is adopted to compensate for the scarcity of training data and capture the intrinsic similarities between different metal ions. LMetalSite was shown to surpass state-of-the-art structure-based methods by more than 19.7, 14.4, 36.8 and 12.6% in area under the precision recall on the four independent tests, respectively. Further analyses indicated that the self-attention modules are effective to learn the structural contexts of residues from protein sequence. We provide the data sets, source codes and trained models of LMetalSite at https://github.com/biomed-AI/LMetalSite.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据