☆ 4.4 Article

Massively Parallel Algorithms for Personalized PageRank

PROCEEDINGS OF THE VLDB ENDOWMENT (2021)

期刊

PROCEEDINGS OF THE VLDB ENDOWMENT

卷 14, 期 9, 页码 1668-1680

出版社

ASSOC COMPUTING MACHINERY

DOI: 10.14778/3461535.3461554

关键词

类别

Computer Science, Information Systems Computer Science, Theory & Methods

资金

Hong Kong RGC ECS [24203419]
RGC CRF [C4158-20G]
CUHK Direct Grant [4055114]
NSFC [U1936205]
National Natural Science Foundation of China [61972401, 61932001, 61832017]
Beijing Outstanding Young Scientist Program [BJJWZYJH012019100020098]
Alibaba Group
Public Computing Cloud, Renmin University of China

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Personalized PageRank is widely used in search engines and social recommendations, but existing solutions are often inefficient for distributed settings. The Delta-Push framework aims to reduce rounds and communication load in distributed environments, utilizing a redesigned push algorithm and Monte-Carlo method to optimize PPR queries.

Personalized PageRank (PPR) has wide applications in search engines, social recommendations, community detection, and so on. Nowadays, graphs are becoming massive and many IT companies need to deal with large graphs that cannot be fitted into the memory of most commodity servers. However, most existing state-of-the-art solutions for PPR computation only work for single-machines and are inefficient for the distributed framework since such solutions either (i) result in an excessively large number of communication rounds, or (ii) incur high communication costs in each round. Motivated by this, we present Delta-Push, an efficient framework for single-source and top-k PPR queries in distributed settings. Our goal is to reduce the number of rounds while guaranteeing that the load, i.e., the maximum number of messages an executor sends or receives in a round, can be bounded by the capacity of each executor. We first present a non-trivial combination of a redesigned parallel push algorithm and the Monte-Carlo method to answer single-source PPR queries. The solution uses pre-sampled random walks to reduce the number of rounds for the push algorithm. Theoretical analysis under the Massively Parallel Computing (MPC) model shows that our proposed solution bounds the communication rounds to O(log n(2)logn/is an element of(2)m) under a load of O(m/p), where m is the number of edges of the input graph, p is the number of executors, and is an element of is a user-defined error parameter. In the meantime, as the number of executors increases to p' = gamma . p, the load constraint can be relaxed since each executor can hold O(gamma . m/p') messages with invariant local memory. In such scenarios, multiple queries can be processed in batches simultaneously. We show that with a load of O(gamma . m/p'), our Delta-Push can process y queries in a batch with O(log n(2)log n/gamma is an element of(2)m rounds, while other baseline solutions still keep the same round cost for each batch. We further present a new top-k algorithm that is friendly to the distributed framework and reduces the number of rounds required in practice. Extensive experiments show that our proposed solution is more efficient than alternatives.

Massively Parallel Algorithms for Personalized PageRank

期刊

PROCEEDINGS OF THE VLDB ENDOWMENT

出版社

ASSOC COMPUTING MACHINERY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Massively Parallel Algorithms for Personalized PageRank

期刊

PROCEEDINGS OF THE VLDB ENDOWMENT

出版社

ASSOC COMPUTING MACHINERY

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文