4.8 Article

SteroidXtract: Deep Learning-Based Pattern Recognition Enables Comprehensive and Rapid Extraction of Steroid-Like Metabolic Features for Automated Biology-Driven Metabolomics

期刊

ANALYTICAL CHEMISTRY
卷 93, 期 14, 页码 5735-5743

出版社

AMER CHEMICAL SOC
DOI: 10.1021/acs.analchem.0c04834

关键词

-

资金

  1. University of British Columbia [F18-03001]
  2. Canadian Foundation for Innovation (CFI) [38159]
  3. UBC Support for Teams to Advance Interdisciplinary Research Award [F19-05720]
  4. New Frontiers in Research Fund/Exploration [NFRFE-201900789]
  5. National Science and Engineering Research Council (NSERC) [RGPIN-2020-04895]
  6. NSERC [DGECR2020-00189, RGPIN-2019-04837]
  7. Discovery Accelerator Supplement [RGDAS-2019-00033]
  8. Canada Foundation for Innovation Grant (CFI) [32631]
  9. NSERC CGS-M Fellowship
  10. NSERC CGS-D Fellowship

向作者/读者索取更多资源

This study introduces SteroidXtract, a CNN-based bioinformatics tool that can recognize steroid molecules in untargeted metabolomics by their unique MS2 spectral patterns. The tool has shown high sensitivity, specificity, and robustness in metabolomics studies.
Despite the vast amount of metabolic information that can be captured in untargeted metabolomics, many biological applications are looking for a biology-driven metabolomics platform that targets a set of metabolites that are relevant to the given biological question. Steroids are a class of important molecules that play critical roles in many physiological systems and diseases. Besides known steroids, there are a large number of unknown steroids that have not been reported in the literature. The ability to rapidly detect and quantify both known and unknown steroid molecules in a biological sample can greatly accelerate a broad range of steroid-focused life science research. This work describes the development and application of SteroidXtract, a convolutional neural network (CNN)-based bioinformatics tool that can recognize steroid molecules in mass spectrometry (MS)-based untargeted metabolomics using their unique tandem MS (MS2) spectral patterns. SteroidXtract was trained using a comprehensive set of standard MS2 spectra from MassBank of North America (MoNA) and an in-house steroid library. Data augmentation strategies, including intensity thresholding and Gaussian noise addition, were created and applied to minimize data overfitting caused by the limited number of standard steroid MS2 spectra. The CNN model embedded in SteroidXtract was further compared with random forest and XGBoost using nested cross-validations to demonstrate its performance. Finally, SteroidXtract was applied in several metabolomics studies to demonstrate its sensitivity, specificity, and robustness. Compared to conventional statistics-driven metabolomics data interpretation, our work offers a novel automated biology-driven approach to interpreting untargeted metabolomics data, prioritizing biologically important molecules with high throughput and sensitivity.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据