4.7 Article

Molecular image-convolutional neural network (CNN) assisted QSAR models for predicting contaminant reactivity toward OH radicals: Transfer learning, data augmentation and model interpretation

期刊

CHEMICAL ENGINEERING JOURNAL
卷 408, 期 -, 页码 -

出版社

ELSEVIER SCIENCE SA
DOI: 10.1016/j.cej.2020.127998

关键词

Convolutional neural network (CNN); Hydroxyl radical; Model interpretation; Machine learning; Molecular images; QSARs

资金

  1. National Science Foundation [CBET-1804708, CHEM-1808406]

向作者/读者索取更多资源

The study demonstrated that data augmentation and transfer learning can enhance the predictive performance of molecular image-CNN models for predicting compound rate constants toward OH radicals. The models showed comparable performance to molecular fingerprint-based models, with a broader applicability domain and the ability to reliably predict the reactivity of new compounds.
In this study, we used molecular images as a representation for organic compounds and combined them with a convolutional neural network (CNN) to develop quantitative structure-activity relationships (QSARs) for predicting compound rate constants toward OH radicals. We applied transfer learning and data augmentation to train molecular image-CNN models and the Gradient-weighted Class Activation Mapping (Grad-CAM) method to interpret them. Results showed that data augmentation and transfer learning can effectively enhance the robustness and predictive performance of the models, with the root-mean-square-error (RMSE) values on the test dataset (RMSEtest) decreasing from (0.395-0.45) to (0.284-0.339) after applying data augmentation, and the RMSE on the training dataset (RMSEtrain) decreasing from (0.452-0.592) to (0.123-0.151) after applying transfer learning. The obtained molecular image-CNN models showed comparative predictive performance (RMSEtest 0.284-0.339) with the molecular fingerprint-based models (RMSEtest 0.30-0.35). Grad-CAM interpretation showed that the molecular image-CNN models correctly chose the molecular features in the images and identified key functional groups that influenced the reactivity. The applicability domain analysis showed that the molecular image-CNN models have a broader applicability domain than molecular fingerprints-based models and the reactivity of any new compounds with a maximum similarity of over 0.85 to the compounds in the training dataset can be reliably predicted. This study demonstrated that molecular image-CNN is a new tool to develop QSARs for environmental applications and can be used to build trustful models that make meaningful predictions.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据