4.5 Article

Hybrid, Interpretable Machine Learning for Thermodynamic Property Estimation using Grammar2vec for Molecular Representation

Journal

FLUID PHASE EQUILIBRIA
Volume 561, Issue -, Pages -

Publisher

ELSEVIER
DOI: 10.1016/j.fluid.2022.113531

Keywords

-

Funding

  1. National Science Foundation (NSF) [2132142]
  2. Emerging Frontiers & Multidisciplinary Activities
  3. Directorate For Engineering [2132142] Funding Source: National Science Foundation

Ask authors/readers for more resources

In this study, a Grammar2vec framework based on SMILES grammar is proposed for generating dense and numeric molecular representations. The framework embeds molecular structural information in the grammar rules of SMILES string representations. Using Grammar2vec representations, machine learning models are built to predict the normal boiling point and critical temperature of molecules, and their performance is compared with group contribution methods. The results demonstrate that Grammar2vec is an effective approach for molecular representation.
Property prediction models have been developed for several decades with varying degrees of performance and complexity, from the group contribution-based methods to molecular simulations-based methods. An interesting issue in this area is finding an appropriate representation of molecules inherently suited for the property modeling problem. Here, we propose Grammar2vec, a SMILES grammar-based framework for generating dense, numeric molecular representations. Grammar2vec embeds molecular structural information contained in the grammar rules underlying SMILES string representations of molecules. We use Grammar2vec representations to build machine learning-based models for estimating normal boiling point (T-b) and critical temperature (T-c) and benchmark their performance against the popularly used group contribution (GC)-based methods. To ensure interpretability of the developed ML model, we perform a Shapley values-based analysis to estimate feature importance and simplify (or prune) the trained model.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available