4.8 Article

Uniform genomic data analysis in the NCI Genomic Data Commons

Journal

NATURE COMMUNICATIONS
Volume 12, Issue 1, Pages -

Publisher

NATURE PORTFOLIO
DOI: 10.1038/s41467-021-21254-9

Keywords

-

Funding

  1. National Cancer Institute, National Institutes of Health [17X053, 14X050, HHSN261200800001E]

Ask authors/readers for more resources

The goal of National Cancer Institute's Genomic Data Commons (GDC) is to provide the cancer research community with a data repository of uniformly processed genomic and clinical data to support precision medicine through data sharing and collaborative analysis. The initial dataset includes various data types from NCI TCGA and TARGET projects, and data production started in June 2015 using an OpenStack-based private cloud. The GDC has analyzed more than 50,000 raw sequencing data inputs and generated different data types using the latest human genome reference build GRCh38, which are available for download and exploratory analysis at GDC Data Portal and Legacy Archive.
The goal of the National Cancer Institute's (NCI's) Genomic Data Commons (GDC) is to provide the cancer research community with a data repository of uniformly processed genomic and associated clinical data that enables data sharing and collaborative analysis in the support of precision medicine. The initial GDC dataset include genomic, epigenomic, proteomic, clinical and other data from the NCI TCGA and TARGET programs. Data production for the GDC started in June, 2015 using an OpenStack-based private cloud. By June of 2016, the GDC had analyzed more than 50,000 raw sequencing data inputs, as well as multiple other data types. Using the latest human genome reference build GRCh38, the GDC generated a variety of data types from aligned reads to somatic mutations, gene expression, miRNA expression, DNA methylation status, and copy number variation. In this paper, we describe the pipelines and workflows used to process and harmonize the data in the GDC. The generated data, as well as the original input files from TCGA and TARGET, are available for download and exploratory analysis at the GDC Data Portal and Legacy Archive (https://gdc.cancer.gov/).

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available