4.7 Article

pGraph: Efficient Parallel Construction of Large-Scale Protein Sequence Homology Graphs

期刊

出版社

IEEE COMPUTER SOC
DOI: 10.1109/TPDS.2012.19

关键词

Parallel protein sequence homology detection; parallel sequence graph construction; hierarchical master-worker paradigm; producer-consumer model

资金

  1. US National Science Foundation (NSF) [IIS-0916463]
  2. DOE [57,271, 54,976]
  3. Department of Energy's Office of Biological and Environmental Research
  4. Direct For Computer & Info Scie & Enginr
  5. Div Of Information & Intelligent Systems [0916463] Funding Source: National Science Foundation

向作者/读者索取更多资源

Detecting sequence homology between protein sequences is a fundamental problem in computational molecular biology, with a pervasive application in nearly all analyses that aim to structurally and functionally characterize protein molecules. While detecting the homology between two protein sequences is relatively inexpensive, detecting pairwise homology for a large number of protein sequences can become computationally prohibitive for modern inputs, often requiring millions of CPU hours. Yet, there is currently no robust support to parallelize this kernel. In this paper, we identify the key characteristics that make this problem particularly hard to parallelize, and then propose a new parallel algorithm that is suited for detecting homology on large data sets using distributed memory parallel computers. Our method, called pGraph, is a novel hybrid between the hierarchical multiple-master/worker model and producer-consumer model, and is designed to break the irregularities imposed by alignment computation and work generation. Experimental results show that pGraph achieves linear scaling on a 2,048 processor distributed memory cluster for a wide range of inputs ranging from as small as 20,000 sequences to 2,560,000 sequences. In addition to demonstrating strong scaling, we present an extensive report on the performance of the various system components and related parametric studies.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据