4.7 Article Data Paper

QM7-X, a comprehensive dataset of quantum-mechanical properties spanning the chemical space of small organic molecules

Journal

SCIENTIFIC DATA
Volume 8, Issue 1, Pages -

Publisher

NATURE PORTFOLIO
DOI: 10.1038/s41597-021-00812-2

Keywords

-

Funding

  1. European Research Council (ERC-CoG grant BeStMo)
  2. College of Arts and Sciences at Cornell University
  3. DOE Office of Science User Facility [DE-AC02-06CH11357]

Ask authors/readers for more resources

The QM7-X dataset contains approximately 4.2 million equilibrium and non-equilibrium structures of small organic molecules, with a comprehensive coverage of various physicochemical properties. It is expected to play a critical role in the development of next-generation machine-learning models for exploring broader regions of chemical compound space and designing molecules with targeted properties.
We introduce QM7-X, a comprehensive dataset of 42 physicochemical properties for approximate to 4.2 million equilibrium and non-equilibrium structures of small organic molecules with up to seven non-hydrogen (C, N, O, S, Cl) atoms. To span this fundamentally important region of chemical compound space (CCS), QM7-X includes an exhaustive sampling of (meta-)stable equilibrium structures-comprised of constitutional/structural isomers and stereoisomers, e.g., enantiomers and diastereomers (including cis-/trans- and conformational isomers)-as well as 100 non-equilibrium structural variations thereof to reach a total of approximate to 4.2 million molecular structures. Computed at the tightly converged quantum-mechanical PBE0+MBD level of theory, QM7-X contains global (molecular) and local (atom-in-a-molecule) properties ranging from ground state quantities (such as atomization energies and dipole moments) to response quantities (such as polarizability tensors and dispersion coefficients). By providing a systematic, extensive, and tightly-converged dataset of quantum-mechanically computed physicochemical properties, we expect that QM7-X will play a critical role in the development of next-generation machine-learning based models for exploring greater swaths of CCS and performing in silico design of molecules with targeted properties.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available