Journal
BIOINFORMATICS
Volume 30, Issue 1, Pages 119-120Publisher
OXFORD UNIV PRESS
DOI: 10.1093/bioinformatics/btt601
Keywords
-
Categories
Funding
- Finnish Strategic Centre for Science, Technology and Innovation DIGILE
- Academy of Finland [139402]
- Sardinian (Italy) [L7-2010/COBIK]
- COST Action [BM1006]
Ask authors/readers for more resources
Hadoop MapReduce-based approaches have become increasingly popular due to their scalability in processing large sequencing datasets. However, as these methods typically require in-depth expertise in Hadoop and Java, they are still out of reach of many bioinformaticians. To solve this problem, we have created SeqPig, a library and a collection of tools to manipulate, analyze and query sequencing datasets in a scalable and simple manner. SeqPigscripts use the Hadoop-based distributed scripting engine Apache Pig, which automatically parallelizes and distributes data processing tasks. We demonstrate SeqPig's scalability over many computing nodes and illustrate its use with example scripts.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available