4.8 Article

The National Microbiome Data Collaborative Data Portal: an integrated multi-omics microbiome data resource

Journal

NUCLEIC ACIDS RESEARCH
Volume 50, Issue D1, Pages D828-D836

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/nar/gkab990

Keywords

-

Funding

  1. Genomic Science Program in the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research (BER) [DEAC02-05CH11231, 89233218CNA000001, DE-AC05-00OR22725, DE-AC0576RL01830]

Ask authors/readers for more resources

The NMDC Data Portal facilitates exploration and access to multi-omics microbiome data, hosting 10.2 TB of data. The portal utilizes a flexible data schema and workflows to generate annotated data products, offering various interactive search features.
The National Microbiome Data Collaborative (NMDC) Data Portal (https://data.microbiomedata.org) supports microbiome multi-omics data exploration and access through an integrated, distributed data framework aligned with the FAIR (Findable, Accessible, Interoperable and Reusable) data principles (1). The NMDC Data Portal currently hosts 10.2 terabytes of multi-omics microbiome data, spanning five data types (metagenomes, metatranscriptomes, metaproteomes, metabolomes, and natural organic matter characterizations), generated at two Department of Energy User Facilities, the Joint Genome Institute (JGI) at Lawrence Berkeley National Laboratory (LBNL) and the Environmental Molecular Systems Laboratory (EMSL) at Pacific Northwest National Laboratory (PNNL). A flexible data schema (https://github.com/microbiomedata/nmdc-schema) leveraging community-driven standards underpins how data is managed and integrated. Annotated multi-omic data products are produced by the NMDC workflows and linked through common biosamples to enable search capabilities based on environmental context, instrumentation, and functional attributes. As a pilot system, the NMDC Data Portal offers download capabilities and several search components, including interactive geographic visualization of samples; environmental classification distribution visualized through an interactive Sankey diagram; time-series slider to select longitudinal samples of interest; and an upset plot displaying the number of multi-omics data generated from the same biosample within a study.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available