4.8 Article

Machine Learning-Based Hazard-Driven Prioritization of Features in Nontarget Screening of Environmental High-Resolution Mass Spectrometry Data

期刊

出版社

AMER CHEMICAL SOC
DOI: 10.1021/acs.est.3c00304

关键词

ToxCast; Tox21; toxicity prediction; HRMS; MS; supervised classification; extreme gradientboosting; SIRIUS

向作者/读者索取更多资源

MLinvitroTox predicts the toxicity of unidentified NTSHRMS/MS features by using machine learning to analyze fragmentation spectra. The system utilizes molecular fingerprints derived from MS2 to rapidly classify thousands of HRMS/MS features as toxic or nontoxic, based on specific targets and cytotoxic endpoints. The model successfully predicts over a quarter of toxic endpoints and the majority of associated mechanistic targets with sensitivities exceeding 0.95.
MLinvitroTox maps toxicologically relevantpollution inaquatic environments by predicting the toxicity of unidentified NTSHRMS/MS features from fragmentation spectra via machine learning. Nontarget high-resolution mass spectrometry screening(NTS HRMS/MS)can detect thousands of organic substances in environmental samples.However, new strategies are needed to focus time-intensive identificationefforts on features with the highest potential to cause adverse effectsinstead of the most abundant ones. To address this challenge, we developedMLinvitroTox, a machine learning framework that uses molecular fingerprintsderived from fragmentation spectra (MS2) for a rapid classificationof thousands of unidentified HRMS/MS features as toxic/nontoxic basedon nearly 400 target-specific and over 100 cytotoxic endpoints fromToxCast/Tox21. Model development results demonstrated that using customizedmolecular fingerprints and models, over a quarter of toxic endpointsand the majority of the associated mechanistic targets could be accuratelypredicted with sensitivities exceeding 0.95. Notably, SIRIUS molecularfingerprints and xboost (Extreme Gradient Boosting) models with SMOTE(Synthetic Minority Oversampling Technique) for handling data imbalancewere a universally successful and robust modeling configuration. Validationof MLinvitroTox on MassBank spectra showed that toxicity could bepredicted from molecular fingerprints derived from MS2 with an averagebalanced accuracy of 0.75. By applying MLinvitroTox to environmentalHRMS/MS data, we confirmed the experimental results obtained withtarget analysis and narrowed the analytical focus from tens of thousandsof detected signals to 783 features linked to potential toxicity,including 109 spectral matches and 30 compounds with confirmed toxicactivity.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据