☆ 4.7 Article

Hadoop-BAM: directly manipulating next generation sequencing data in the cloud

BIOINFORMATICS (2012)

Journal

BIOINFORMATICS

Volume 28, Issue 6, Pages 876-877

Publisher

OXFORD UNIV PRESS

DOI: 10.1093/bioinformatics/bts054

Keywords

Funding

Finnish Funding Agency for Technology and Innovation Tekes
Academy of Finland [139402]
Academy of Finland (AKA) [139402, 139402] Funding Source: Academy of Finland (AKA)

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Hadoop-BAM is a novel library for the scalable manipulation of aligned next-generation sequencing data in the Hadoop distributed computing framework. It acts as an integration layer between analysis applications and BAM files that are processed using Hadoop. Hadoop-BAM solves the issues related to BAM data access by presenting a convenient API for implementing map and reduce functions that can directly operate on BAM records. It builds on top of the Picard SAM JDK, so tools that rely on the Picard API are expected to be easily convertible to support large-scale distributed processing. In this article we demonstrate the use of Hadoop-BAM by building a coverage summarizing tool for the Chipster genome browser. Our results show that Hadoop offers good scalability, and one should avoid moving data in and out of Hadoop between analysis steps.

Hadoop-BAM: directly manipulating next generation sequencing data in the cloud

Journal

BIOINFORMATICS

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Hadoop-BAM: directly manipulating next generation sequencing data in the cloud

Journal

BIOINFORMATICS

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper