4.7 Article

Different molecular enumeration influences in deep learning: an example using aqueous solubility

期刊

BRIEFINGS IN BIOINFORMATICS
卷 22, 期 3, 页码 -

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bib/bbaa092

关键词

biological sciences; drug discovery; medicinal chemistry; cheminformatics

资金

  1. Neurobiology and Cognitive Science Center at NTU [NTUCC-109L892703]
  2. Ministry of Science and Technology (Taiwan) [MOST 107-2627-E-002-002]

向作者/读者索取更多资源

This study reviewed different molecular representations and focused on using graph and line notations for modeling compounds. It suggests using full enumerations in SMILES notation for better accuracy. A CNN model was utilized to predict solubility, which can handle large datasets without additional chemistry knowledge. Using attention in the decoding network can help explain the contribution of chemical substructures to solubility predicted by the CNN.
Aqueous solubility is the key property driving many chemical and biological phenomena and impacts experimental and computational attempts to assess those phenomena. Accurate prediction of solubility is essential and challenging, even with modern computational algorithms. Fingerprint-based, feature-based and molecular graph-based representations have all been used with different deep learning methods for aqueous solubility prediction. It has been clearly demonstrated that different molecular representations impact the model prediction and explainability. In this work, we reviewed different representations and also focused on using graph and line notations for modeling. In general, one canonical chemical structure is used to represent one molecule when computing its properties. We carefully examined the commonly used simplified molecular-input line-entry specification (SMILES) notation representing a single molecule and proposed to use the full enumerations in SMILES to achieve better accuracy. A convolutional neural network (CNN) was used. The full enumeration of SMILES can improve the presentation of a molecule and describe the molecule with all possible angles. This CNN model can be very robust when dealing with large datasets since no additional explicit chemistry knowledge is necessary to predict the solubility. Also, traditionally it is hard to use a neural network to explain the contribution of chemical substructures to a single property. We demonstrated the use of attention in the decoding network to detect the part of a molecule that is relevant to solubility, which can be used to explain the contribution from the CNN.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据