4.5 Article

PEGASUS: mining peta-scale graphs

期刊

KNOWLEDGE AND INFORMATION SYSTEMS
卷 27, 期 2, 页码 303-325

出版社

SPRINGER LONDON LTD
DOI: 10.1007/s10115-010-0305-0

关键词

PEGASUS; Graph mining; GIM-V; Generalized iterative matrix-vector multiplication; Hadoop

资金

  1. National Science Foundation [IIS-0705359, IIS0808661]
  2. U.S. Department of Energy by University of California Lawrence Livermore National Laboratory [DE-AC52-07NA27344 (LLNL-CONF-404625), B579447, B580840]
  3. Div Of Information & Intelligent Systems
  4. Direct For Computer & Info Scie & Enginr [0808661] Funding Source: National Science Foundation

向作者/读者索取更多资源

In this paper, we describe PeGaSus, an open source Peta Graph Mining library which performs typical graph mining tasks such as computing the diameter of the graph, computing the radius of each node, finding the connected components, and computing the importance score of nodes. As the size of graphs reaches several Giga-, Tera- or Peta-bytes, the necessity for such a library grows too. To the best of our knowledge, PeGaSus is the first such library, implemented on the top of the Hadoop platform, the open source version of MapReduce. Many graph mining operations (PageRank, spectral clustering, diameter estimation, connected components, etc.) are essentially a repeated matrix-vector multiplication. In this paper, we describe a very important primitive for PeGaSus, called GIM-V (generalized iterated matrix-vector multiplication). GIM-V is highly optimized, achieving (a) good scale-up on the number of available machines, (b) linear running time on the number of edges, and (c) more than 5 times faster performance over the non-optimized version of GIM-V. Our experiments ran on M45, one of the top 50 supercomputers in the world. We report our findings on several real graphs, including one of the largest publicly available Web graphs, thanks to Yahoo!, with a parts per thousand 6.7 billion edges.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据