4.7 Article

Density Prediction Models for Energetic Compounds Merely Using Molecular Topology

Journal

JOURNAL OF CHEMICAL INFORMATION AND MODELING
Volume 61, Issue 6, Pages 2582-2593

Publisher

AMER CHEMICAL SOC
DOI: 10.1021/acs.jcim.0c01393

Keywords

-

Funding

  1. Science Challenge Project [TZ-2018004]

Ask authors/readers for more resources

The study introduces three machine learning models - support vector machine, random forest, and Graph neural network - with molecular topology as the only input, to establish direct mappings between molecular structure and density. Through training and testing on over 2000 nitro compounds, it is found that the Graph neural network model has slightly higher accuracy compared to the traditional DFT-QSPR method, making it suitable for high-throughput screening of energetic compounds.
Newly developed high-throughput methods for property predictions make the process of materials design faster and more efficient. Density is an important physical property for energetic compounds to assess detonation velocity and detonation pressure, but the time cost of recent density prediction models is still high owing to the time-consuming processes to calculate molecular descriptors. To improve the screening efficiency of potential energetic compounds, new methods for density prediction with more accuracy and less time cost are urgently needed, and a possible solution is to establish direct mappings between the molecular structure and density. We propose three machine learning (ML) models, support vector machine (SVM), random forest (RF), and Graph neural network (GNN), using molecular topology as the only known input. The widely applied quantitative structure-property relationship based on the density functional theory (DFT-QSPR) is adopted as the benchmark to evaluate the accuracies of the models. All these four models are trained and tested by using the same data set enclosing over 2000 reported nitro compounds searched out from the Cambridge Structural Database. The proportions of compounds with prediction error less than 5% are evaluated by using the independent test set, and the values for the models of SVM, RF, DFT-QSPR, and GNN are 48, 63, 85, and 88%, respectively. The results show that, for the models of SVM and RF, fingerprint bit vectors alone are not facilitated to obtain good QSPRs. Mapping between the molecular structure and density can be well established by using GNN and molecular topology, and its accuracy is slightly better than that of the time-consuming DFT-QSPR method. The GNN-based model has higher accuracy and lower computational resource cost than the widely accepted DFT-QSPR model, so it is more suitable for high-throughput screening of energetic compounds.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available