☆ 4.2 Article

Parallel workflow manager for non-parallel bioinformatic applications to solve large-scale biological problems on a supercomputer

JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY (2016)

Journal

JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY

Volume 14, Issue 2, Pages -

Publisher

IMPERIAL COLLEGE PRESS

DOI: 10.1142/S0219720016410080

Keywords

Bioinformatics; computational biology; supercomputer cluster; large-scale data analysis; Message Passing Interface; programming languages; resource-intensive applications; mpiWrapper; Penicillin acylase

Funding

Russian Science Foundation [15-14-00069] Funding Source: Russian Science Foundation

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Rapid expansion of online resources providing access to genomic, structural, and functional information associated with biological macromolecules opens an opportunity to gain a deeper understanding of the mechanisms of biological processes due to systematic analysis of large datasets. This, however, requires novel strategies to optimally utilize computer processing power. Some methods in bioinformatics and molecular modeling require extensive computational resources. Other algorithms have fast implementations which take at most several hours to analyze a common input on a modern desktop station, however, due to multiple invocations for a large number of subtasks the full task requires a significant computing power. Therefore, an efficient computational solution to large-scale biological problems requires both a wise parallel implementation of resource-hungry methods as well as a smart workflow to manage multiple invocations of relatively fast algorithms. In this work, a new computer software mpi-Wrapper has been developed to accommodate non-parallel implementations of scientific algorithms within the parallel supercomputing environment. The Message Passing Interface has been implemented to exchange information between nodes. Two specialized threads - one for task management and communication, and another for subtask execution - are invoked on each processing unit to avoid deadlock while using blocking calls to MPI. The mpiWrapper can be used to launch all conventional Linux applications without the need to modify their original source codes and supports resubmission of subtasks on node failure. We show that this approach can be used to process huge amounts of biological data efficiently by running non-parallel programs in parallel mode on a supercomputer. The C++ source code and documentation are available from http://biokinet.belozersky.msu.ru/mpiWrapper.

Parallel workflow manager for non-parallel bioinformatic applications to solve large-scale biological problems on a supercomputer

Journal

JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY

Publisher

IMPERIAL COLLEGE PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Parallel workflow manager for non-parallel bioinformatic applications to solve large-scale biological problems on a supercomputer

Journal

JOURNAL OF BIOINFORMATICS AND COMPUTATIONAL BIOLOGY

Publisher

IMPERIAL COLLEGE PRESS

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper