4.8 Article

SteroidXtract: Deep Learning-Based Pattern Recognition Enables Comprehensive and Rapid Extraction of Steroid-Like Metabolic Features for Automated Biology-Driven Metabolomics

Journal

ANALYTICAL CHEMISTRY
Volume 93, Issue 14, Pages 5735-5743

Publisher

AMER CHEMICAL SOC
DOI: 10.1021/acs.analchem.0c04834

Keywords

-

Funding

  1. University of British Columbia [F18-03001]
  2. Canadian Foundation for Innovation (CFI) [38159]
  3. UBC Support for Teams to Advance Interdisciplinary Research Award [F19-05720]
  4. New Frontiers in Research Fund/Exploration [NFRFE-201900789]
  5. National Science and Engineering Research Council (NSERC) [RGPIN-2020-04895]
  6. NSERC [DGECR2020-00189, RGPIN-2019-04837]
  7. Discovery Accelerator Supplement [RGDAS-2019-00033]
  8. Canada Foundation for Innovation Grant (CFI) [32631]
  9. NSERC CGS-M Fellowship
  10. NSERC CGS-D Fellowship

Ask authors/readers for more resources

This study introduces SteroidXtract, a CNN-based bioinformatics tool that can recognize steroid molecules in untargeted metabolomics by their unique MS2 spectral patterns. The tool has shown high sensitivity, specificity, and robustness in metabolomics studies.
Despite the vast amount of metabolic information that can be captured in untargeted metabolomics, many biological applications are looking for a biology-driven metabolomics platform that targets a set of metabolites that are relevant to the given biological question. Steroids are a class of important molecules that play critical roles in many physiological systems and diseases. Besides known steroids, there are a large number of unknown steroids that have not been reported in the literature. The ability to rapidly detect and quantify both known and unknown steroid molecules in a biological sample can greatly accelerate a broad range of steroid-focused life science research. This work describes the development and application of SteroidXtract, a convolutional neural network (CNN)-based bioinformatics tool that can recognize steroid molecules in mass spectrometry (MS)-based untargeted metabolomics using their unique tandem MS (MS2) spectral patterns. SteroidXtract was trained using a comprehensive set of standard MS2 spectra from MassBank of North America (MoNA) and an in-house steroid library. Data augmentation strategies, including intensity thresholding and Gaussian noise addition, were created and applied to minimize data overfitting caused by the limited number of standard steroid MS2 spectra. The CNN model embedded in SteroidXtract was further compared with random forest and XGBoost using nested cross-validations to demonstrate its performance. Finally, SteroidXtract was applied in several metabolomics studies to demonstrate its sensitivity, specificity, and robustness. Compared to conventional statistics-driven metabolomics data interpretation, our work offers a novel automated biology-driven approach to interpreting untargeted metabolomics data, prioritizing biologically important molecules with high throughput and sensitivity.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available