4.6 Article

Treat Molecular Linear Notations as Sentences: Accurate Quantitative Structure-Property Relationship Modeling via a Natural Language Processing Approach

Journal

INDUSTRIAL & ENGINEERING CHEMISTRY RESEARCH
Volume 62, Issue 12, Pages 5336-5346

Publisher

AMER CHEMICAL SOC
DOI: 10.1021/acs.iecr.2c04070

Keywords

-

Ask authors/readers for more resources

QSPR modeling is a widely used method for estimating molecular properties based on structural information, and it has been applied in exploring new solvents, pharmaceuticals, and materials with desired properties. SMILES is considered as a chemical language, and a deep pyramid convolutional neural network architecture is constructed to extract information from SMILES sentences. The effectiveness of this approach is proven through a case study of predicting the logarithm values of the octanol-water partition coefficient, showing better performance compared to a precedent reference model and providing insights for molecular information mining and exploration of chemical property space through natural language processing technologies.
Quantitative structure-property relationship (QSPR) modeling is an implementation for estimating molecular properties based on structural information, which is widely applied in exploring new solvents, pharmaceuticals, and materials with desired properties. In QSPR modeling, simplified molecular input line-entry system (SMILES) is a popular molecular representation with specific vocabulary and syntax. Herein, SMILES is considered a chemical language, and each SMILES notation is treated as a sentence. A deep pyramid convolutional neural network architecture is constructed for extracting the information from SMILES sentences , and the feed-forward neural network is used for the property correlation. A case study of predicting the logarithm values of the octanol-water partition coefficient is conducted to prove the effectiveness of the proposed philosophy. Compared with a precedent reference model, the outperformance of the developed QSPR models provides fascinating insights for applying natural language processing technologies for molecular information mining and exploration of chemical property space.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available