4.6 Article

Classification of alkaloids according to the starting substances of their biosynthetic pathways using graph convolutional neural networks

Journal

BMC BIOINFORMATICS
Volume 20, Issue -, Pages -

Publisher

BMC
DOI: 10.1186/s12859-019-2963-6

Keywords

Molecular graph convolutional neural networks; Alkaloids; Metabolic pathways; Deep learning

Funding

  1. Ministry of Education, Culture, Sports, Science, and Technology of Japan [16K07223, 17K00406]
  2. Platform Project for Supporting Drug Discovery and Life Science Research - Japan Agency for Medical Research and Development [18am0101111]
  3. National Bioscience Database Center (NBDC)
  4. NAIST Bigdata Project
  5. JSPS [17H05297]
  6. Grants-in-Aid for Scientific Research [17H05297] Funding Source: KAKEN

Ask authors/readers for more resources

Background: Alkaloids, a class of organic compounds that contain nitrogen bases, are mainly synthesized as secondary metabolites in plants and fungi, and they have a wide range of bioactivities. Although there are thousands of compounds in this class, few of their biosynthesis pathways are fully identified. In this study, we constructed a model to predict their precursors based on a novel kind of neural network called the molecular graph convolutional neural network. Molecular similarity is a crucial metric in the analysis of qualitative structure-activity relationships. However, it is sometimes difficult for current fingerprint representations to emphasize specific features for the target problems efficiently. It is advantageous to allow the model to select the appropriate features according to data-driven decisions for extracting more useful information, which influences a classification or regression problem substantially. Results: In this study, we applied a neural network architecture for undirected graph representation of molecules. By encoding a molecule as an abstract graph and applying convolution on the graph and training the weight of the neural network framework, the neural network can optimize feature selection for the training problem. By incorporating the effects from adjacent atoms recursively, graph convolutional neural networks can extract the features of latent atoms that represent chemical features of a molecule efficiently. In order to investigate alkaloid biosynthesis, we trained the network to distinguish the precursors of 566 alkaloids, which are almost all of the alkaloids whose biosynthesis pathways are known, and showed that the model could predict starting substances with an averaged accuracy of 97.5%. Conclusion: We have showed that our model can predict more accurately compared to the random forest and general neural network when the variables and fingerprints are not selected, while the performance is comparable when we carefully select 507 variables from 18000 dimensions of descriptors. The prediction of pathways contributes to understanding of alkaloid synthesis mechanisms and the application of graph based neural network models to similar problems in bioinformatics would therefore be beneficial. We applied our model to evaluate the precursors of biosynthesis of 12000 alkaloids found in various organisms and found power-low-like distribution.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available