☆ 4.6 Article

Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework

BMC BIOINFORMATICS (2012)

Journal

BMC BIOINFORMATICS

Volume 13, Issue -, Pages -

Publisher

BMC

DOI: 10.1186/1471-2105-13-324

Keywords

Funding

NIGMS [R01GM087221]
NCI [R01CA137442]
EU [260558]
major research instrumentation grant [0923536]
American Recovery and Reinvestment Act (ARRA) funds from National Institutes of Health National Human Genome Research Institute [R01 HG005805]
National Institute of General Medical Sciences [2P50 GM076547]
Luxembourg Centre for Systems Biomedicine
University of Luxembourg
Direct For Biological Sciences
Div Of Biological Infrastructure [923536] Funding Source: National Science Foundation

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Background: For shotgun mass spectrometry based proteomics the most computationally expensive step is in matching the spectra against an increasingly large database of sequences and their post-translational modifications with known masses. Each mass spectrometer can generate data at an astonishingly high rate, and the scope of what is searched for is continually increasing. Therefore solutions for improving our ability to perform these searches are needed. Results: We present a sequence database search engine that is specifically designed to run efficiently on the Hadoop MapReduce distributed computing framework. The search engine implements the K-score algorithm, generating comparable output for the same input files as the original implementation. The scalability of the system is shown, and the architecture required for the development of such distributed processing is discussed. Conclusion: The software is scalable in its ability to handle a large peptide database, numerous modifications and large numbers of spectra. Performance scales with the number of processors in the cluster, allowing throughput to expand with the available resources.

Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework

Journal

BMC BIOINFORMATICS

Publisher

BMC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Hydra: a scalable proteomic search engine which utilizes the Hadoop distributed computing framework

Journal

BMC BIOINFORMATICS

Publisher

BMC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper