☆ 4.7 Article

MG-BERT: leveraging unsupervised atomic representation learning for molecular property prediction

BRIEFINGS IN BIOINFORMATICS (2021)

期刊

BRIEFINGS IN BIOINFORMATICS

卷 22, 期 6, 页码 -

出版社

OXFORD UNIV PRESS

DOI: 10.1093/bib/bbab152

关键词

molecular property prediction; molecular graph BERT; atomic representation; deep learning; self-supervised learning

类别

Biochemical Research Methods Mathematical & Computational Biology

资金

Changsha Municipal Natural Science Foundation [kq2014144]
Changsha Science and Technology Bureau project [kq2001034]
National Key Research & Development project by the Ministry of Science and Technology of China [2018YFB1003203]
State Key Laboratory of High-Performance Computing [201901-11]
National Science Foundation of China [U1811462]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This study introduces a molecular graph BERT (MG-BERT) model that integrates graph neural network mechanisms and utilizes a self-supervised learning strategy for pretraining, enhancing the model's contextual sensitivity and achieving outstanding performance in molecular property prediction.

Motivation: Accurate and efficient prediction of molecular properties is one of the fundamental issues in drug design and discovery pipelines. Traditional feature engineering-based approaches require extensive expertise in the feature design and selection process. With the development of artificial intelligence (AI) technologies, data-driven methods exhibit unparalleled advantages over the feature engineering-based methods in various domains. Nevertheless, when applied to molecular property prediction, AI models usually suffer from the scarcity of labeled data and show poor generalization ability. Results: In this study, we proposed molecular graph BERT (MG-BERT), which integrates the local message passing mechanism of graph neural networks (GNNs) into the powerful BERT model to facilitate learning from molecular graphs. Furthermore, an effective self-supervised learning strategy named masked atoms prediction was proposed to pretrain the MG-BERT model on a large amount of unlabeled data to mine context information in molecules. We found the MG-BERT model can generate context-sensitive atomic representations after pretraining and transfer the learned knowledge to the prediction of a variety of molecular properties. The experimental results show that the pretrained MG-BERT model with a little extra fine-tuning can consistently outperform the state-of-the-art methods on all 11 ADMET datasets. Moreover, the MG-BERT model leverages attention mechanisms to focus on atomic features essential to the target property, providing excellent interpretability for the trained model. The MG-BERT model does not require any hand-crafted feature as input and is more reliable due to its excellent interpretability, providing a novel framework to develop state-of-the-art models for a wide range of drug discovery tasks.

MG-BERT: leveraging unsupervised atomic representation learning for molecular property prediction

期刊

BRIEFINGS IN BIOINFORMATICS

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

MG-BERT: leveraging unsupervised atomic representation learning for molecular property prediction

期刊

BRIEFINGS IN BIOINFORMATICS

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文