4.8 Article

Img2Mol-accurate SMILES recognition from molecular graphical depictions

Journal

CHEMICAL SCIENCE
Volume 12, Issue 42, Pages 14174-14181

Publisher

ROYAL SOC CHEMISTRY
DOI: 10.1039/d1sc01839f

Keywords

-

Funding

  1. Bayer AG Life Science Collaboration (DeepMinds)
  2. Bayer AG Life Science Collaboration (Explainable AI)
  3. Bayer AG's PhD scholarships
  4. European Commission under the Horizon2020 Framework Program for Research and Innovation [963845, 956832]
  5. Marie Curie Actions (MSCA) [956832] Funding Source: Marie Curie Actions (MSCA)

Ask authors/readers for more resources

The paper introduces a model that combines deep convolutional neural network learning and a pre-trained decoder to accurately translate molecular images into SMILES representation. Evaluation shows that the model can correctly translate up to 88% of molecular images.
The automatic recognition of the molecular content of a molecule's graphical depiction is an extremely challenging problem that remains largely unsolved despite decades of research. Recent advances in neural machine translation enable the auto-encoding of molecular structures in a continuous vector space of fixed size (latent representation) with low reconstruction errors. In this paper, we present a fast and accurate model combining deep convolutional neural network learning from molecule depictions and a pre-trained decoder that translates the latent representation into the SMILES representation of the molecules. This combination allows us to precisely infer a molecular structure from an image. Our rigorous evaluation shows that Img2Mol is able to correctly translate up to 88% of the molecular depictions into their SMILES representation. A pretrained version of Img2Mol is made publicly available on GitHub for non-commercial users.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available