☆ 4.5 Article

A heterogeneous parallel implementation of the Markov clustering algorithm for large-scale biological networks on distributed CPU-GPU clusters

JOURNAL OF SUPERCOMPUTING (2022)

Journal

JOURNAL OF SUPERCOMPUTING

Volume 78, Issue 7, Pages 9017-9037

Publisher

SPRINGER

DOI: 10.1007/s11227-021-04204-6

Keywords

Biological interaction network; Heterogenous computing; Cluster computing; Parallel computing; Compute Unified Device Architecture

Funding

National Key Research and Development Program of China [2017YFB0202002]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper proposes a heterogeneous parallel algorithm for accelerating clustering tasks using distributed CPU-GPU clusters. The algorithm utilizes both CPU and GPU capabilities to achieve high efficiency in GPU memory usage and inter-node data transmission. Compared to the serial counterpart, the algorithm can significantly speed up the clustering task.

Biological interaction databases accommodate information about interacted proteins or genes. Clustering on the networks formed by the interaction information for finding regions highly connected could reveal the functional affinities or structural similarities between protein or gene entities. With the ever-increasing amounts of information in these databases, the runtime of a clustering task is more and more unaffordable. In this paper, we propose a heterogeneous parallel algorithm focusing on accelerating clustering tasks using distributed CPU-GPU clusters. Our parallel implementation is based on the original serial algorithm of the Markov clustering (MCL). In our parallel implementation, we utilize both the CPUs and GPUs to exploit the power of heterogeneous platforms. With the BioGRID biological interaction database, we have tested the proposed algorithm on a computer cluster equipped with NVIDIA Tesla P100 GPU accelerators. The result shows that, the algorithm is efficient in GPU memory usage and inter-node data transmission, and it can complete the clustering task in 3.2 minutes with the best speedup of 70.02 times compared to the serial counterpart.We believe our work can provide key insights for realizing fast MCL analyses on large-scale biological data, with distributed CPU-GPU computer clusters.

A heterogeneous parallel implementation of the Markov clustering algorithm for large-scale biological networks on distributed CPU-GPU clusters

Journal

JOURNAL OF SUPERCOMPUTING

Publisher

SPRINGER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

A heterogeneous parallel implementation of the Markov clustering algorithm for large-scale biological networks on distributed CPU-GPU clusters

Journal

JOURNAL OF SUPERCOMPUTING

Publisher

SPRINGER

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper