4.4 Article

Bayesian molecularL design with a chemical language model

期刊

JOURNAL OF COMPUTER-AIDED MOLECULAR DESIGN
卷 31, 期 4, 页码 379-391

出版社

SPRINGER
DOI: 10.1007/s10822-016-0008-z

关键词

Inverse-QSPR; Molecular design; Bayesian analysis; Small organic molecules; Natural language processing; SMILES

资金

  1. Materials research by Information Integration Initiative (MI2I) project of the Support Program for Starting Up Innovation Hub from Japan Science and Technology Agency (JST)
  2. Japan Society for the Promotion of Science (JSPS) [15H02672]
  3. JST PRESTO
  4. KAITEKI Institute, Inc.
  5. [16J09205]
  6. Grants-in-Aid for Scientific Research [17H05478, 26287063, 15H02672, 16J09205] Funding Source: KAKEN

向作者/读者索取更多资源

The aim of computational molecular design is the identification of promising hypothetical molecules with a predefined set of desired properties. We address the issue of accelerating the material discovery with state-of-the-art machine learning techniques. The method involves two different types of prediction; the forward and backward predictions. The objective of the forward prediction is to create a set of machine learning models on various properties of a given molecule. Inverting the trained forward models through Bayes' law, we derive a posterior distribution for the backward prediction, which is conditioned by a desired property requirement. Exploring high-probability regions of the posterior with a sequential Monte Carlo technique, molecules that exhibit the desired properties can computationally be created. One major difficulty in the computational creation of molecules is the exclusion of the occurrence of chemically unfavorable structures. To circumvent this issue, we derive a chemical language model that acquires commonly occurring patterns of chemical fragments through natural language processing of ASCII strings of existing compounds, which follow the SMILES chemical language notation. In the backward prediction, the trained language model is used to refine chemical strings such that the properties of the resulting structures fall within the desired property region while chemically unfavorable structures are successfully removed. The present method is demonstrated through the design of small organic molecules with the property requirements on HOMO-LUMO gap and internal energy. The R package iqspr is available at the CRAN repository.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.4
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据