4.7 Article Data Paper

The ANI-1ccx and ANI-1x data sets, coupled-cluster and density functional theory properties for molecules

期刊

SCIENTIFIC DATA
卷 7, 期 1, 页码 -

出版社

NATURE PORTFOLIO
DOI: 10.1038/s41597-020-0473-z

关键词

-

资金

  1. LDRD program
  2. Center for Nonlinear Studies (CNLS)
  3. Advanced Simulation and Computing (ASC) Program
  4. DOD-ONR [N00014-16-1-2311]
  5. NSF [CHE-1802789, CHE-1802831]
  6. National Science Foundation [ACI-1053575, 1148698, DMR110088]
  7. U.S. DOE Office of Science

向作者/读者索取更多资源

Maximum diversification of data is a central theme in building generalized and accurate machine learning (ML) models. In chemistry, ML has been used to develop models for predicting molecular properties, for example quantum mechanics (QM) calculated potential energy surfaces and atomic charge models. The ANI-1x and ANI-1ccx ML-based general-purpose potentials for organic molecules were developed through active learning; an automated data diversification process. Here, we describe the ANI-1x and ANI-1ccx data sets. To demonstrate data diversity, we visualize it with a dimensionality reduction scheme, and contrast against existing data sets. The ANI-1x data set contains multiple QM properties from 5 M density functional theory calculations, while the ANI-1ccx data set contains 500 k data points obtained with an accurate CCSD(T)/CBS extrapolation. Approximately 14 million CPU core-hours were expended to generate this data. Multiple QM calculated properties for the chemical elements C, H, N, and O are provided: energies, atomic forces, multipole moments, atomic charges, etc. We provide this data to the community to aid research and development of ML models for chemistry.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据