4.7 Article

ABC-Net: a divide-and-conquer based deep learning architecture for SMILES recognition from molecular images

期刊

BRIEFINGS IN BIOINFORMATICS
卷 23, 期 2, 页码 -

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bib/bbac033

关键词

optical chemical structure recognition; divide and conquer; deep learning; fully convolutional neural network

资金

  1. National Key Research and Development Program of China [2021YFF1201400]
  2. National Natural Science Foundation of China [22173118, U1811462]
  3. Hunan Provincial Science Fund for Distinguished Young Scholars [2021JJ10068]
  4. Science and Technology innovation Program of Hunan Province [2021RC4011]
  5. Chang sha Municipal Natural Science Foundation [kq2014144]
  6. Chang sha Science and Technology Bureau project [kq2001034]
  7. HKBU Strategic Development Fund project [SDF19-0402-P02]

向作者/读者索取更多资源

This paper presents a deep neural network model called ABC-Net, which can directly predict graph structures. By using the divide-and-conquer principle, atoms or bonds are modeled as single points in the center, and a fully convolutional neural network is leveraged to identify and predict relevant properties, enabling the recovery of molecular structures. Experimental results demonstrate significant improvement in recognition performance with this method.
Structural information for chemical compounds is often described by pictorial images in most scientific documents, which cannot be easily understood and manipulated by computers. This dilemma makes optical chemical structure recognition (OCSR) an essential tool for automatically mining knowledge from an enormous amount of literature. However, existing OCSR methods fall far short of our expectations for realistic requirements due to their poor recovery accuracy. In this paper, we developed a deep neural network model named ABC-Net (Atom and Bond Center Network) to predict graph structures directly. Based on the divide-and-conquer principle, we propose to model an atom or a bond as a single point in the center. In this way, we can leverage a fully convolutional neural network (CNN) to generate a series of heat-maps to identify these points and predict relevant properties, such as atom types, atom charges, bond types and other properties. Thus, the molecular structure can be recovered by assembling the detected atoms and bonds. Our approach integrates all the detection and property prediction tasks into a single fully CNN, which is scalable and capable of processing molecular images quite efficiently. Experimental results demonstrate that our method could achieve a significant improvement in recognition performance compared with publicly available tools. The proposed method could be considered as a promising solution to OCSR problems and a starting point for the acquisition of molecular information in the literature.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据