Parallelizing approximate single-source personalized PageRank queries on shared memory

Wang, Runhui; Wang, Sibo; Zhou, Xiaofang

doi:10.1007/s00778-019-00576-7

Parallelizing approximate single-source personalized PageRank queries on shared memory

Regular Paper
Published: 08 October 2019

Volume 28, pages 923–940, (2019)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

479 Accesses
15 Citations
Explore all metrics

Abstract

Given a directed graph G, a source node s, and a target node t, the personalized PageRank (PPR) \(\pi (s,t)\) measures the importance of node t with respect to node s. In this work, we study the single-source PPR query, which takes a source node s as input and outputs the PPR values of all nodes in G with respect to s. The single-source PPR query finds many important applications, e.g., community detection and recommendation. Deriving the exact answers for single-source PPR queries is prohibitive, so most existing work focuses on approximate solutions. Nevertheless, existing approximate solutions are still inefficient, and it is challenging to compute single-source PPR queries efficiently for online applications. This motivates us to devise efficient parallel algorithms running on shared-memory multi-core systems. In this work, we present how to efficiently parallelize the state-of-the-art index-based solution FORA, and theoretically analyze the complexity of the parallel algorithms. Theoretically, we prove that our proposed algorithm achieves a time complexity of \(O(W/P+\log ^2{n})\), where W is the time complexity of sequential FORA algorithm, P is the number of processors used, and n is the number of nodes in the graph. FORA includes a forward push phase and a random walk phase, and we present optimization techniques to both phases, including effective maintenance of active nodes, improving the efficiency of memory access, and cache-aware scheduling. Extensive experimental evaluation demonstrates that our solution achieves up to 37\(\times \) speedup on 40 cores and 3.3\(\times \) faster than alternatives on 40 cores. Moreover, the forward push alone can be used for local graph clustering, and our parallel algorithm for forward push is 4.8\(\times \) faster than existing parallel alternatives.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Combination of In-Memory Graph Computation with MapReduce: A Subgraph-Centric Method of PageRank

An Improved/Optimized Practical Non-Blocking PageRank Algorithm for Massive Graphs*

Article 26 March 2022

Deterministic Coresets for Stochastic Matrices with Applications to Scalable Sparse PageRank

Notes

The grainsize is set to 128 according to Leiserson et al. [23].

References

Andersen, R., Borgs, C., Chayes, J., Hopcraft, J., Mirrokni, V., Teng, S.-H.: Local computation of pagerank contributions. In: WAW, pp. 150–165 (2007)
Chapter Google Scholar
Andersen, R., Chung, F.R.K., Lang, K.J.: Local graph partitioning using pagerank vectors. In: FOCS, pp. 475–486 (2006)
Bahmani, B., Chakrabarti, K., Xin, D.: Fast personalized pagerank on mapreduce. In: SIGMOD, pp. 973–984 (2011)
Bahmani, B., Chowdhury, A., Goel, A.: Fast incremental and personalized pagerank. PVLDB 4(3), 173–184 (2010)
Google Scholar
Beamer, S., Asanović, K., Patterson, D.: Direction-optimizing breadth-first search. Sci. Program. 21(3–4), 137–148 (2013)
Google Scholar
Brent, R.P.: The parallel evaluation of general arithmetic expressions. J. ACM 21(2), 201–206 (1974)
Article MathSciNet Google Scholar
Chakrabarti, D., Zhan, Y., Faloutsos, C.: R-mat: a recursive model for graph mining. In: Proceedings of the 2004 SIAM International Conference on Data Mining, pp. 442–446. SIAM (2004)
Cohen, E.: Size-estimation framework with applications to transitive closure and reachability. J. Comput. Syst. Sci. 55(3), 441–453 (1997)
Article MathSciNet Google Scholar
Cormen, T.H., Leiserson, C.E., Rivest, R.L., Stein, C.: Introduction to Algorithms, 3rd edn. MIT Press, Cambridge (2009)
MATH Google Scholar
Coskun, M., Grama, A., Koyutürk, M.: Efficient processing of network proximity queries via chebyshev acceleration. In: SIGKDD, pp. 1515–1524 (2016)
Dagum, L., Menon, R.: Openmp: an industry standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998)
Article Google Scholar
Fogaras, D., Rácz, B., Csalogány, K., Sarlós, T.: Towards scaling fully personalized pagerank: algorithms, lower bounds, and experiments. Internet Math. 2(3), 333–358 (2005)
Article MathSciNet Google Scholar
Fujiwara, Y., Nakatsuji, M., Onizuka, M., Kitsuregawa, M.: Fast and exact top-k search for random walk with restart. PVLDB 5(5), 442–453 (2012)
Google Scholar
Fujiwara, Y., Nakatsuji, M., Shiokawa, H., Mishima, T., Onizuka, M.: Efficient ad-hoc search for personalized pagerank. In: SIGMOD, pp. 445–456 (2013)
Fujiwara, Y., Nakatsuji, M., Yamamuro, T., Shiokawa, H., Onizuka, M.: Efficient personalized pagerank with accuracy assurance. In: SIGKDD, pp. 15–23 (2012)
Guo, T., Cao, X., Cong, G., Lu, J., Lin, X.: Distributed algorithms on exact personalized pagerank. In: SIGMOD, pp. 479–494 (2017)
Guo, W., Li, Y., Sha, M., Tan, K.-L.: Parallel personalized pagerank on dynamic graphs. PVLDB 11(1), 93–106 (2017)
Google Scholar
Gupta, M., Pathak, A., Chakrabarti, S.: Fast algorithms for topk personalized pagerank queries. In: WWW, pp. 1225–1226 (2008)
Gupta, P., Goel, A., Lin, J.J., Sharma, A., Wang, D., Zadeh, R.: WTF: the who to follow service at twitter. In: WWW, pp. 505–514 (2013)
https://www.cilkplus.org/ (2018)
Jeh, G., Widom, J.: Scaling personalized web search. In: WWW, pp. 271–279 (2003)
Jung, J., Park, N., Sael, L., Kang, U.: Bepi: fast and memory-efficient method for billion-scale random walk with restart. In: SIGMOD, pp 789–804 (2017)
Leiserson, C.E., Schardl, T.B.: A work-efficient parallel breadth-first search algorithm (or how to cope with the nondeterminism of reducers). In: SPAA, pp. 303–314 (2010)
Lin, W.: Distributed algorithms for fully personalized pagerank on large graphs. In: WWW, pp. 1084–1094 (2019)
Liu, D.C., Rogers, S., Shiau, R., Kislyuk, D., Ma, K.C., Zhong, Z., Liu, J., Jing, Y.: Related pins at pinterest: the evolution of a real-world recommender system. In: WWW, pp. 583–592 (2017)
Lofgren, P., Banerjee, S., Goel, A.: Personalized pagerank estimation and search: a bidirectional approach. In: WSDM, pp. 163–172 (2016)
Nguyen, P., Tomeo, P., Noia, T.D., Sciascio, E.D.: An evaluation of simrank and personalized pagerank to build a recommender system for the web of data. In: WWW, pp. 1477–1482 (2015)
Page, L., Brin, S., Motwani, R., Winograd, T.: The pagerank citation ranking: bringing order to the web. Technical report, Stanford InfoLab (1999)
Park, H., Jung, J., Kang, U.: A comparative study of matrix factorization and random walk with restart in recommender systems. In: BigData, pp. 756–765 (2017)
Shin, K., Jung, J., Sael, L., Kang, U.: BEAR: block elimination approach for random walk with restart on large graphs. In: SIGMOD, pp. 1571–1585 (2015)
Shun, J., Blelloch, G.E.: Ligra: a lightweight graph processing framework for shared memory. In: PPoPP, pp. 135–146 (2013)
Article Google Scholar
Shun, J., Blelloch, G.E.: Phase-concurrent hash tables for determinism. In: SPAA, pp. 96–107 (2014)
Shun, J., Roosta-Khorasani, F., Fountoulakis, K., Mahoney, M.W.: Parallel local graph clustering. PVLDB 9(12), 1041–1052 (2016)
Google Scholar
Wang, S., Tang, Y., Xiao, X., Yang, Y., Li, Z.: Hubppr: effective indexing for approximate personalized pagerank. Proc. VLDB Endow. 10(3), 205–216 (2016)
Article Google Scholar
Wang, S., Tao, Y.: Efficient algorithms for finding approximate heavy hitters in personalized pageranks. In: SIGMOD, pp. 1113–1127 (2018)
Wang, S., Yang, R., Xiao, X., Wei, Z., Yang, Y.: FORA: simple and effective approximate single-source personalized pagerank. In: SIGKDD, pp. 505–514 (2017)
Wei, H., Yu, J.X., Lu, C., Lin, X.: Speedup graph processing by graph ordering. In: SIGMOD, pp. 1813–1828 (2016)
Wei, Z., He, X., Xiao, X., Wang, S., Shang, S., Wen, J.-R.: Topppr: top-k personalized pagerank queries with precision guarantees on large graphs. In: SIGMOD, pp. 441–456 (2018)
Whang, J.J., Gleich, D.F., Dhillon, I.S.: Overlapping community detection using neighborhood-inflated seed expansion. IEEE Trans. Knowl. Data Eng. 28(5), 1272–1284 (2016)
Article Google Scholar
Yin, H., Benson, A.R., Leskovec, J., Gleich, D.F.: Local higher-order graph clustering. In: SIGKDD, pp. 555–564 (2017)
Zhang, H., Lofgren, P., Goel, A.: Approximate personalized pagerank on dynamic graphs. In: SIGKDD, pp. 1315–1324 (2016)
Zhu, F., Fang, Y., Chang, K.C., Ying, J.: Incremental and accuracy-aware personalized pagerank through scheduled approximation. PVLDB 6(6), 481–492 (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Queensland, Brisbane, Australia
Runhui Wang & Xiaofang Zhou
The Chinese University of Hong Kong, Sha Tin, Hong Kong, China
Sibo Wang

Authors

Runhui Wang
View author publications
You can also search for this author in PubMed Google Scholar
Sibo Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaofang Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Runhui Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, R., Wang, S. & Zhou, X. Parallelizing approximate single-source personalized PageRank queries on shared memory. The VLDB Journal 28, 923–940 (2019). https://doi.org/10.1007/s00778-019-00576-7

Download citation

Received: 23 February 2019
Revised: 20 August 2019
Accepted: 21 September 2019
Published: 08 October 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s00778-019-00576-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Parallelizing approximate single-source personalized PageRank queries on shared memory

Abstract

Access this article

Similar content being viewed by others

Combination of In-Memory Graph Computation with MapReduce: A Subgraph-Centric Method of PageRank

An Improved/Optimized Practical Non-Blocking PageRank Algorithm for Massive Graphs*

Deterministic Coresets for Stochastic Matrices with Applications to Scalable Sparse PageRank

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Parallelizing approximate single-source personalized PageRank queries on shared memory

Abstract

Access this article

Similar content being viewed by others

Combination of In-Memory Graph Computation with MapReduce: A Subgraph-Centric Method of PageRank

An Improved/Optimized Practical Non-Blocking PageRank Algorithm for Massive Graphs*

Deterministic Coresets for Stochastic Matrices with Applications to Scalable Sparse PageRank

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation