4.5 Article

Machine learning transition temperatures from 2D structure

Journal

JOURNAL OF MOLECULAR GRAPHICS & MODELLING
Volume 105, Issue -, Pages -

Publisher

ELSEVIER SCIENCE INC
DOI: 10.1016/j.jmgm.2021.107848

Keywords

Cheminformatics; Machine learning; Gradient boosting; Phase transitions; Melting and boiling points; Quantitative structure-property relationships

Funding

  1. CCDC Army Research Laboratory [W911NF-19-2-0090]
  2. DOD High Performance Computing Modernization Program at the ARL DoD Supercomputing Resource Center

Ask authors/readers for more resources

A priori knowledge of physicochemical properties can accelerate materials discovery, while data science tools are increasingly important for predicting material properties. This study extends the UPPER molecular representation and constructs a framework for predicting experimental transition temperatures using machine learning techniques.
A priori knowledge of physicochemical properties such as melting and boiling could expedite materials discovery. However, theoretical modeling from first principles poses a challenge for efficient virtual screening of potential candidates. As an alternative, the tools of data science are becoming increasingly important for exploring chemical datasets and predicting material properties. Herein, we extend a molecular representation, or set of descriptors, first developed for quantitative structure-property relationship modeling by Yalkowsky and coworkers known as the Unified Physicochemical Property Estimation Relationships (UPPER). This molecular representation has group-constitutive and geometrical descriptors that map to enthalpy and entropy; two thermodynamic quantities that drive thermal phase transitions. We extend the UPPER representation to include additional information about sp(2)-bonded fragments. Additionally, instead of using the UPPER descriptors in a series of thermodynamically-inspired calculations, as per Yalkowsky, we use the descriptors to construct a vector representation for use with machine learning techniques. The concise and easy-to-compute representation, combined with a gradient-boosting decision tree model, provides an appealing framework for predicting experimental transition temperatures in a diverse chemical space. An application to energetic materials shows that the method is predictive, despite a relatively modest energetics reference dataset. We also report competitive results on diverse public datasets of melting points (i.e., OCHEM, Enamine, Bradley, and Bergstr_om) comprised of over 47k structures. Open source software is available at https://github.com/USArmyResearchLab/ARL-UPPER. Published by Elsevier Inc.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available