4.7 Article

Formulation of Small Test Sets Using Large Test Sets for Efficient Assessment of Quantum Chemistry Methods

Journal

JOURNAL OF CHEMICAL THEORY AND COMPUTATION
Volume 14, Issue 8, Pages 4254-4262

Publisher

AMER CHEMICAL SOC
DOI: 10.1021/acs.jctc.8b00514

Keywords

-

Funding

  1. Japan Society for the Promotion of Science (JSPS) [16H07074001]
  2. Japan Society for the Promotion of Science (JSPS) (Program for Advancing Strategic International Networks to Accelerate the Circulation of Talented Researchers)
  3. RIKEN Advanced Center for Computing and Communication (ACCC), Japan [Q18266]
  4. National Computational Infrastructure (NCI), Australia

Ask authors/readers for more resources

In the present study, we have examined in detail literature data of deviations for a wide range of (mainly) DFT methods for the extensive MGCDB82 set (similar to 4400 data points) of main-group thermochemical quantities. We use the data and standard statistical techniques (lasso regularization and forward selection) to devise the MG8 model for linearly combining assessment results of a collection of small data sets to accurately estimate the MAD of MGCDB82. The MG8 model contains a total of 64 data points representing noncovalent interactions, isomerization energies, thermochemical properties, and barrier heights. It is thus well suited for rapid evaluation of new quantum chemistry procedures. We propose that a value of similar to 4 kJ mol(-1) for an estimated MAD by the MG8 model (EMAD(MG8)) to be an initial indicator of a highly robust quantum chemistry method, with large deviations occurring mainly for properties (such as heats of formation) that are difficult to accurately compute. For methods with larger EMADs, we emphasize the importance of more thorough testing, as these methods are likely to have a larger number of outliers, and it may be less trivial to anticipate circumstances under which large deviations occur. In relation to this aspect, we have applied the same generally applicable statistical techniques to further formulate small-data-set models for assessing the accuracy for some properties that are not covered by MG8 nor by MGCDB82. They include the MOR13 model for metal-organic reactions, the SBGS model for semiconductor band gaps, and MB13 for stress-testing methods with artificial species.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available