4.7 Article

MS2AI: automated repurposing of public peptide LC-MS data for machine learning applications

期刊

BIOINFORMATICS
卷 38, 期 3, 页码 875-877

出版社

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btab701

关键词

-

资金

  1. Velux Foundation [00028116]

向作者/读者索取更多资源

The MS2AI pipeline automates the process of gathering large quantities of MS data for machine learning applications, addressing three major limitations in ML within the LC-MS field.
Motivation: Liquid-chromatography mass-spectrometry (LC-MS) is the established standard for analyzing the proteome in biological samples by identification and quantification of thousands of proteins. Machine learning (ML) promises to considerably improve the analysis of the resulting data, however, there is yet to be any tool that mediates the path from raw data to modern ML applications. More specifically, ML applications are currently hampered by three major limitations: (i) absence of balanced training data with large sample size; (ii) unclear definition of sufficiently information-rich data representations for e.g. peptide identification; (iii) lack of benchmarking of ML methods on specific LC-MS problems. Results: We created the MS2AI pipeline that automates the process of gathering vast quantities of MS data for largescale ML applications. The software retrieves raw data from either in-house sources or from the proteomics identifications database, PRIDE. Subsequently, the raw data are stored in a standardized format amenable for ML, encompassing MS1/MS2 spectra and peptide identifications. This tool bridges the gap between MS and AI, and to this effect we also present an ML application in the form of a convolutional neural network for the identification of oxidized peptides.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据