4.7 Article

Dataset Construction to Explore Chemical Space with 3D Geometry and Deep Learning

期刊

JOURNAL OF CHEMICAL INFORMATION AND MODELING
卷 61, 期 3, 页码 1095-1104

出版社

AMER CHEMICAL SOC
DOI: 10.1021/acs.jcim.1c00007

关键词

-

资金

  1. NIH [R35GM127040]

向作者/读者索取更多资源

A dataset is crucial for the development of deep learning models, and the new Frag20 dataset with optimized 3D geometries and calculated molecular properties contributes to the development of robust molecular energy prediction models that achieve near chemical accuracy.
A dataset is the basis of deep learning model development, and the success of deep learning models heavily relies on the quality and size of the dataset. In this work, we present a new data preparation protocol and build a large fragment-based dataset Frag20, which consists of optimized 3D geometries and calculated molecular properties from Merck molecular force field (MMFF) and DFT at the B3LYP/6-31G* level of theory for more than half a million molecules composed of H, B, C, O, N, F, P, S, Cl, and Br with no larger than 20 heavy atoms. Based on the new dataset, we develop robust molecular energy prediction models using a simplified PhysNet architecture for both DFT-optimized and MMFF-optimized geometries, which achieve better than or close to chemical accuracy (1 kcal/mol) on multiple test sets, including CSD20 and Plati20 based on experimental crystal structures.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据