4.6 Article

Array-Based Machine Learning for Functional Group Detection in Electron Ionization Mass Spectrometry

Journal

ACS OMEGA
Volume -, Issue -, Pages -

Publisher

AMER CHEMICAL SOC
DOI: 10.1021/acsomega.3c01684

Keywords

-

Ask authors/readers for more resources

Mass spectrometry is a technique widely used for complex chemical analysis. Artificial intelligence methods, specifically CNN models and logistic regression models, have been trained on mass spectra data to identify functional groups. Logistic regression models demonstrated higher accuracy in identifying new data, and the mass range of 0-100 m/z was found to be most beneficial for functional group analysis.
Mass spectrometry is a ubiquitous technique capable ofcomplexchemical analysis. The fragmentation patterns that appear in massspectrometry are an excellent target for artificial intelligence methodsto automate and expedite the analysis of data to identify targetssuch as functional groups. To develop this approach, we trained modelson electron ionization (a reproducible hard fragmentation technique)mass spectra so that not only the final model accuracies but alsothe reasoning behind model assignments could be evaluated. The convolutionalneural network (CNN) models were trained on 2D images of the spectrausing transfer learning of Inception V3, and the logistic regressionmodels were trained using array-based data and Scikit Learn implementationin Python. Our training dataset consisted of 21,166 mass spectra fromthe United States' National Institute of Standards and Technology(NIST) Webbook. The data was used to train models to identify functionalgroups, both specific (e.g., amines, esters) and generalized classifications(aromatics, oxygen-containing functional groups, and nitrogen-containingfunctional groups). We found that the highest final accuracies onidentifying new data were observed using logistic regression ratherthan transfer learning on CNN models. It was also determined thatthe mass range most beneficial for functional group analysis is 0-100 m/z. We also found success in correctlyidentifying functional groups of example molecules selected from boththe NIST database and experimental data. Beyond functional group analysis,we also have developed a methodology to identify impactful fragmentsfor the accurate detection of the models' targets. The resultsdemonstrate a potential pathway for analyzing and screening substantialamounts of mass spectral data.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available