☆ 4.7 Article

GenoMetric Query Language: a novel approach to large-scale genomic data management

BIOINFORMATICS (2015)

Journal

BIOINFORMATICS

Volume 31, Issue 12, Pages 1881-1888

Publisher

OXFORD UNIV PRESS

DOI: 10.1093/bioinformatics/btv048

Keywords

Funding

'Data-Driven Genomic Computing [GenData 2020]' PRIN project - Italian Ministry of the University and Research (MIUR)

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Motivation: Improvement of sequencing technologies and data processing pipelines is rapidly providing sequencing data, with associated high-level features, of many individual genomes in multiple biological and clinical conditions. They allow for data-driven genomic, transcriptomic and epigenomic characterizations, but require state-of-the-art 'big data' computing strategies, with abstraction levels beyond available tool capabilities. Results: We propose a high-level, declarative GenoMetric Query Language (GMQL) and a toolkit for its use. GMQL operates downstream of raw data preprocessing pipelines and supports queries over thousands of heterogeneous datasets and samples; as such it is key to genomic 'big data' analysis. GMQL leverages a simple data model that provides both abstractions of genomic region data and associated experimental, biological and clinical metadata and interoperability between many data formats. Based on Hadoop framework and Apache Pig platform, GMQL ensures high scalability, expressivity, flexibility and simplicity of use, as demonstrated by several biological query examples on ENCODE and TCGA datasets.

GenoMetric Query Language: a novel approach to large-scale genomic data management

Journal

BIOINFORMATICS

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

GenoMetric Query Language: a novel approach to large-scale genomic data management

Journal

BIOINFORMATICS

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper