4.7 Article

Enzyme promiscuity prediction using hierarchy-informed multi-label classification

Journal

BIOINFORMATICS
Volume 37, Issue 14, Pages 2017-2024

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btab054

Keywords

-

Funding

  1. NSF (National Science Foundation) [CCF-1909536]
  2. NIGMS (National Institute of General Medical Sciences) of the National Institutes of Health [R01GM132391]

Ask authors/readers for more resources

The article presents a method to predict enzyme interactions with query molecules using machine-learning models, framing the problem as a multi-label classification task. Results show that the hierarchical multi-label neural network, EPP-HMCNF, is the best model for solving this problem.
Motivation: As experimental efforts are costly and time consuming, computational characterization of enzyme capabilities is an attractive alternative. We present and evaluate several machine-learning models to predict which of 983 distinct enzymes, as defined via the Enzyme Commission (EC) numbers, are likely to interact with a given query molecule. Our data consists of enzyme-substrate interactions from the BRENDA database. Some interactions are attributed to natural selection and involve the enzyme's natural substrates. The majority of the interactions however involve non-natural substrates, thus reflecting promiscuous enzymatic activities. Results: We frame this 'enzyme promiscuity prediction' problem as a multi-label classification task. We maximally utilize inhibitor and unlabeled data to train prediction models that can take advantage of known hierarchical relationships between enzyme classes. We report that a hierarchical multi-label neural network, EPP-HMCNF, is the best model for solving this problem, outperforming k-nearest neighbors similarity-based and other machine-learning models. We show that inhibitor information during training consistently improves predictive power, particularly for EPP-HMCNF. We also show that all promiscuity prediction models perform worse under a realistic data split when compared to a random data split, and when evaluating performance on non-natural substrates compared to natural substrates.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available