☆ 4.7 Article

SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop

BIOINFORMATICS (2014)

Journal

BIOINFORMATICS

Volume 30, Issue 1, Pages 119-120

Publisher

OXFORD UNIV PRESS

DOI: 10.1093/bioinformatics/btt601

Keywords

Funding

Finnish Strategic Centre for Science, Technology and Innovation DIGILE
Academy of Finland [139402]
Sardinian (Italy) [L7-2010/COBIK]
COST Action [BM1006]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Hadoop MapReduce-based approaches have become increasingly popular due to their scalability in processing large sequencing datasets. However, as these methods typically require in-depth expertise in Hadoop and Java, they are still out of reach of many bioinformaticians. To solve this problem, we have created SeqPig, a library and a collection of tools to manipulate, analyze and query sequencing datasets in a scalable and simple manner. SeqPigscripts use the Hadoop-based distributed scripting engine Apache Pig, which automatically parallelizes and distributes data processing tasks. We demonstrate SeqPig's scalability over many computing nodes and illustrate its use with example scripts.

Authors

I am an author on this paper

Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7

Not enough ratings

SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop

Journal

BIOINFORMATICS

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop

Journal

BIOINFORMATICS

Publisher

OXFORD UNIV PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper