4.7 Article

Using Genome Query Language to uncover genetic variation

Journal

BIOINFORMATICS
Volume 30, Issue 1, Pages 1-8

Publisher

OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btt250

Keywords

-

Funding

  1. NIH [NIH 5R01-HG004962]
  2. iDASH project [U54 HL108460]
  3. CSRO scholarship
  4. NATIONAL CANCER INSTITUTE [P30CA023100] Funding Source: NIH RePORTER
  5. NATIONAL HEART, LUNG, AND BLOOD INSTITUTE [U54HL108460] Funding Source: NIH RePORTER
  6. NATIONAL HUMAN GENOME RESEARCH INSTITUTE [R01HG004962] Funding Source: NIH RePORTER

Ask authors/readers for more resources

Motivation: With high-throughput DNA sequencing costs dropping 5$1000 for human genomes, data storage, retrieval and analysis are the major bottlenecks in biological studies. To address the large-data challenges, we advocate a clean separation between the evidence collection and the inference in variant calling. We define and implement a Genome Query Language (GQL) that allows for the rapid collection of evidence needed for calling variants. Results: We provide a number of cases to showcase the use of GQL for complex evidence collection, such as the evidence for large structural variations. Specifically, typical GQL queries can be written in 5-10 lines of high-level code and search large datasets (100 GB) in minutes. We also demonstrate its complementarity with other variant calling tools. Popular variant calling tools can achieve one order of magnitude speed-up by using GQL to retrieve evidence. Finally, we show how GQL can be used to query and compare multiple datasets. By separating the evidence and inference for variant calling, it frees all variant detection tools from the data intensive evidence collection and focuses on statistical inference.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available