4.5 Article

Efficient breadth first search on multi-GPU systems

期刊

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING
卷 73, 期 9, 页码 1292-1305

出版社

ACADEMIC PRESS INC ELSEVIER SCIENCE
DOI: 10.1016/j.jpdc.2013.05.007

关键词

GPU; CUDA; BFS; Distributed algorithm; Large graphs; Graph 500 benchmark

向作者/读者索取更多资源

Simple algorithms for the execution of a Breadth First Search on large graphs lead, running on clusters of GPUs, to a situation of load unbalance among threads and un-coalesced memory accesses, resulting in pretty low performances. To obtain a significant improvement on a single GPU and to scale by using multiple GPUs, we resort to a suitable combination of operations to rearrange data before processing them. We propose a novel technique for mapping threads to data that achieves a perfect load balance by leveraging prefix-sum and binary search operations. To reduce the communication overhead, we perform a pruning operation on the set of edges that needs to be exchanged at each BFS level. The result is an algorithm that exploits at its best the parallelism available on a single GPU and minimizes communication among GPUs. We show that a cluster of GPUs can efficiently perform a distributed BFS on graphs with billions of nodes. (C) 2013 Elsevier Inc. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据