☆ 4.7 Article

Hadoop-BAM: directly manipulating next generation sequencing data in the cloud

BIOINFORMATICS (2012)

期刊

BIOINFORMATICS

卷 28, 期 6, 页码 876-877

出版社

OXFORD UNIV PRESS

DOI: 10.1093/bioinformatics/bts054

关键词

类别

Biochemical Research Methods Biotechnology & Applied Microbiology Computer Science, Interdisciplinary Applications Mathematical & Computational Biology Statistics & Probability

资金

Finnish Funding Agency for Technology and Innovation Tekes
Academy of Finland [139402]
Academy of Finland (AKA) [139402, 139402] Funding Source: Academy of Finland (AKA)

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Hadoop-BAM is a novel library for the scalable manipulation of aligned next-generation sequencing data in the Hadoop distributed computing framework. It acts as an integration layer between analysis applications and BAM files that are processed using Hadoop. Hadoop-BAM solves the issues related to BAM data access by presenting a convenient API for implementing map and reduce functions that can directly operate on BAM records. It builds on top of the Picard SAM JDK, so tools that rely on the Picard API are expected to be easily convertible to support large-scale distributed processing. In this article we demonstrate the use of Hadoop-BAM by building a coverage summarizing tool for the Chipster genome browser. Our results show that Hadoop offers good scalability, and one should avoid moving data in and out of Hadoop between analysis steps.

Hadoop-BAM: directly manipulating next generation sequencing data in the cloud

期刊

BIOINFORMATICS

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Hadoop-BAM: directly manipulating next generation sequencing data in the cloud

期刊

BIOINFORMATICS

出版社

OXFORD UNIV PRESS

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文