4.6 Article

A Modular and Expandable Ecosystem for Metabolomics Data Annotation in R

Journal

METABOLITES
Volume 12, Issue 2, Pages -

Publisher

MDPI
DOI: 10.3390/metabo12020173

Keywords

metabolomics; untargeted analysis; annotation; R programming; small-compound databases; reproducible research

Funding

  1. Department of Innovation
  2. University of the Autonomous Province of Bozen/Bolzano
  3. BMBF [031L0107]
  4. DFG [WI 4382/11-1, 431572533, WI 4382/10-1 (425789784)]

Ask authors/readers for more resources

Liquid chromatography-mass spectrometry (LC-MS)-based untargeted metabolomics experiments require customized annotation workflows. We present an ecosystem of R packages that provide a modular infrastructure for annotating untargeted metabolomics data, including initial annotation and reference library comparison. The system supports various data formats and offers highly customizable functionality for most untargeted LC-MS data.
Liquid chromatography-mass spectrometry (LC-MS)-based untargeted metabolomics experiments have become increasingly popular because of the wide range of metabolites that can be analyzed and the possibility to measure novel compounds. LC-MS instrumentation and analysis conditions can differ substantially among laboratories and experiments, thus resulting in non-standardized datasets demanding customized annotation workflows. We present an ecosystem of R packages, centered around the MetaboCoreUtils, MetaboAnnotation and CompoundDb packages that together provide a modular infrastructure for the annotation of untargeted metabolomics data. Initial annotation can be performed based on MS1 properties such as m/z and retention times, followed by an MS2-based annotation in which experimental fragment spectra are compared against a reference library. Such reference databases can be created and managed with the CompoundDb package. The ecosystem supports data from a variety of formats, including, but not limited to, MSP, MGF, mzML, mzXML, netCDF as well as MassBank text files and SQL databases. Through its highly customizable functionality, the presented infrastructure allows to build reproducible annotation workflows tailored for and adapted to most untargeted LC-MS-based datasets. All core functionality, which supports base R data types, is exported, also facilitating its re-use in other R packages. Finally, all packages are thoroughly unit-tested and documented and are available on GitHub and through Bioconductor.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available