4.6 Article

Chemical toxicity prediction based on semi-supervised learning and graph convolutional neural network

期刊

JOURNAL OF CHEMINFORMATICS
卷 13, 期 1, 页码 -

出版社

BMC
DOI: 10.1186/s13321-021-00570-8

关键词

Chemical toxicity; Deep learning; Graph convolutional neural network; Semi-supervised learning; Mean teacher; Tox21; ADMET

资金

  1. University of Macau [MYRG2019-00098-FST]

向作者/读者索取更多资源

This study introduces a Graph Convolution Neural Network (GCN) model for predicting chemical toxicity, trained using the Mean Teacher (MT) semi-supervised learning algorithm. The model achieves better performance compared to traditional supervised learning and machine learning methods, demonstrating the potential of learning from unannotated data to enhance predictive power.
As safety is one of the most important properties of drugs, chemical toxicology prediction has received increasing attentions in the drug discovery research. Traditionally, researchers rely on in vitro and in vivo experiments to test the toxicity of chemical compounds. However, not only are these experiments time consuming and costly, but experiments that involve animal testing are increasingly subject to ethical concerns. While traditional machine learning (ML) methods have been used in the field with some success, the limited availability of annotated toxicity data is the major hurdle for further improving model performance. Inspired by the success of semi-supervised learning (SSL) algorithms, we propose a Graph Convolution Neural Network (GCN) to predict chemical toxicity and trained the network by the Mean Teacher (MT) SSL algorithm. Using the Tox21 data, our optimal SSL-GCN models for predicting the twelve toxicological endpoints achieve an average ROC-AUC score of 0.757 in the test set, which is a 6% improvement over GCN models trained by supervised learning and conventional ML methods. Our SSL-GCN models also exhibit superior performance when compared to models constructed using the built-in DeepChem ML methods. This study demonstrates that SSL can increase the prediction power of models by learning from unannotated data. The optimal unannotated to annotated data ratio ranges between 1:1 and 4:1. This study demonstrates the success of SSL in chemical toxicity prediction; the same technique is expected to be beneficial to other chemical property prediction tasks by utilizing existing large chemical databases. Our optimal model SSL-GCN is hosted on an online server accessible through: https://app.cbbio.online/ssl-.gcn/home.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据