4.7 Article Data Paper

Comprehensive exploration of graphically defined reaction spaces

Journal

SCIENTIFIC DATA
Volume 10, Issue 1, Pages -

Publisher

NATURE PORTFOLIO
DOI: 10.1038/s41597-023-02043-z

Keywords

-

Ask authors/readers for more resources

The concept of a graphically-defined model reaction has been used to address the data gap in existing reaction transition state (TS) databases. The resulting dataset, called the Reaction Graph Depth 1 (RGD1) dataset, is composed of 176,992 organic reactions with validated TS and various other properties. RGD1 is the largest and most chemically diverse TS dataset published to date, and can be used to develop novel machine learning models for predicting reaction properties.
Existing reaction transition state (TS) databases are comparatively small and lack chemical diversity. Here, this data gap has been addressed using the concept of a graphically-defined model reaction to comprehensively characterize a reaction space associated with C, H, O, and N containing molecules with up to 10 heavy (non-hydrogen) atoms. The resulting dataset is composed of 176,992 organic reactions possessing at least one validated TS, activation energy, heat of reaction, reactant and product geometries, frequencies, and atom-mapping. For 33,032 reactions, more than one TS was discovered by conformational sampling, allowing conformational errors in TS prediction to be assessed. Data is supplied at the GFN2-xTB and B3LYP-D3/TZVP levels of theory. A subset of reactions were recalculated at the CCSD(T)-F12/cc-pVDZ-F12 and omega B97X-D2/def2-TZVP levels to establish relative errors. The resulting collection of reactions and properties are called the Reaction Graph Depth 1 (RGD1) dataset. RGD1 represents the largest and most chemically diverse TS dataset published to date and should find immediate use in developing novel machine learning models for predicting reaction properties.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available