Skip to main content
Log in

DiffPageRank: an efficient differential PageRank approach in MapReduce

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Unstructured big data processing requires efficient computational styles to rapidly analyze continuously changing data. Incremental processing is a promising technique to update search results without re-computing the whole process. One of the most important mining algorithms is PageRank which necessitates incremental processing and iterative computations to keep the mining results up-to-date. In this paper, a light and efficient PageRank algorithm is proposed using differential dataflow in the MapReduce programming style named DiffPageRank. The innovation of the proposed approach is investigated from two points of view: (1) the updating state is dependent on a partial order of changes in DiffPageRank computing, while the changes are applied based on complete order sequences in the standard incremental computing; and (2) in DiffPageRank, a set of updates rebuilds each version, while in the incremental systems, every single update continually bring up-to-date the current version state. DiffPageRank is compared with two other implementations of PageRank including standard and incremental PageRank in MapReduce in terms of processing time, CPU utilization and update speed of data mining results. The results show that if the changes on the input data are large, the efficiency of the differential method will considerably increase in terms of time than the standard and incremental PageRank approaches up to 61% and 86%, respectively. DiffPageRank is also compared with two state-of-the-art incremental graph processing benchmarks on Orkut and Twitter datasets in terms of execution time and the number of updates. The experimental comparisons on benchmark datasets show that our method reduces the number of updates and total execution time by using the differential MapReduce approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1–7):107–117

    Article  Google Scholar 

  2. White T (2012) Hadoop: the definitive guide. O’Reilly Media, Inc., Sebastopol

    Google Scholar 

  3. Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST). IEEE, pp 1–10

  4. Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  5. Maleki N, Rahmani AM, Conti M (2019) MapReduce: an infrastructure review and research insights. J Supercomput 75(10):6934–7002

    Article  Google Scholar 

  6. Gupta D, Rani R (2018) A study of big data evolution and research challenges. J Inf Sci 45:322–340

    Article  Google Scholar 

  7. Talan PP, Sharma KU, Nawade PP, Talan KP (2019) An overview of Hadoop MapReduce, spark, and scalable graph processing architecture. In: Kalita J, Balas VE, Borah S, Pradhan R (eds) Recent developments in machine learning and data analytics. Springer, Berlin, pp 35–42

    Chapter  Google Scholar 

  8. Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web. ACM, pp 591–600

  9. Zhang Y, Chen S, Wang Q, Yu G (2016) i2MapReduce: incremental MapReduce for mining evolving big data. In: 2016 IEEE 32nd International Conference on Data Engineering (ICDE). IEEE, pp 1482–1483

  10. McSherry F, Murray DG, Isaacs R, Isard M (2013) Differential dataflow. In: CIDR

  11. Bhawiyuga A, Kirana AP (2016) Implementation of page rank algorithm in Hadoop MapReduce framework. In: 2016 International Seminar on Intelligent Technology and Its Applications (ISITIA). IEEE, pp 231–236

  12. Murray DG, McSherry F, Isaacs R, Isard M, Barham P, Abadi M (2013) Naiad: a timely dataflow system. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. ACM, pp 439–455

  13. Murray DG, McSherry F, Isard M, Isaacs R, Barham P, Abadi M (2016) Incremental, iterative data processing with timely dataflow. Commun ACM 59(10):75–83

    Article  Google Scholar 

  14. Pasquinelli M (2009) Google’s PageRank algorithm: a diagram of cognitive capitalism and the rentier of the common intellect. In: Becker K, Stalder F (eds) Deep search: the politics of search beyond Google. Studien Verlag, Innsbruck, pp 152–163

    Google Scholar 

  15. Cauwenberghs G, Poggio T (2001) Incremental and decremental support vector machine learning. In: Advances in Neural Information Processing Systems, pp 409–415

  16. Peng D, Dabek F (2010) Large-scale incremental processing using distributed transactions and notifications. In: OSDI, vol 10, pp 1–15

  17. Popa L, Budiu M, Yu Y, Isard M (2009) DryadInc: reusing work in large-scale computations. HotCloud 9:2–6

    Google Scholar 

  18. Logothetis D, Olston C, Reed B, Webb KC, Yocum K (2010) Stateful bulk processing for incremental analytics. In: Proceedings of the 1st ACM Symposium on Cloud Computing, pp 51–62

  19. Lee D, Kim J-S, Maeng S (2014) Large-scale incremental processing with MapReduce. Future Gener Comput Syst 36:66–79

    Article  Google Scholar 

  20. Zhang Y, Chen S (2013) i2MapReduce: incremental iterative MapReduce. In: Proceedings of the 2nd International Workshop on Cloud Intelligence, pp 1–4

  21. Jörg T, Parvizi R, Yong H, Dessloch S (2011) Incremental recomputations in mapreduce. In: Proceedings of the Third International Workshop on Cloud Data Management, pp 7–14

  22. Saadon AGB, Mokhtar HM (2019) Survey on iterative and incremental approaches in distributed computing environment. Int J Data Sci 4(1):18–30

    Article  Google Scholar 

  23. Bhatotia P, Wieder A, Rodrigues R, Acar UA, Pasquin, R (2011) Incoop: MapReduce for incremental computations. In: Proceedings of the 2nd ACM Symposium on Cloud Computing. ACM, p 7

  24. Logothetis D, Olston C, Reed B, Webb KC, Yocum K (2010) Stateful bulk processing for incremental analytics. In: Presented at the Proceedings of the 1st ACM Symposium on Cloud Computing—SoCC ‘10. http://dx.doi.org/10.1145/1807128.1807138

  25. McSherry FD, Isaacs R, Isard MA, Murray DG (2015) Differential dataflow, ed: Google Patents

  26. Cheng R et al (2012) Kineograph: taking the pulse of a fast-changing and connected world. In: Proceedings of the 7th ACM European Conference on Computer Systems, pp 85–98

  27. Yin J, Gao L (2016) Asynchronous distributed incremental computation on evolving graphs. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, pp 722–738

  28. Lv X, Xiao W, Zhang Y, Liao X, Jin H, Hua Q (2019) An effective framework for asynchronous incremental graph processing. Front Comput Sci 13(3):539–551

    Article  Google Scholar 

  29. Park S, Lee W, Choe B, Lee S-G (2019) A survey on personalized PageRank computation algorithms. IEEE Access 7:163049–163062

    Article  Google Scholar 

  30. Bahmani B, Chowdhury A, Goel A (2010) Fast incremental and personalized pagerank. Proc VLDB Endow 4(3):173–184

    Article  Google Scholar 

  31. Abdullah IB (2010) Incremental pagerank for twitter data using hadoop. Technical paper

  32. Desikan P, Pathak N, Srivastava J, Kumar V (2005) Incremental page rank computation on evolving graphs. In: Special Interest Tracks and Posters of the 14th International Conference on World Wide Web. ACM, pp 1094–1095

  33. Kim KS, Choi YS (2015) Incremental iteration method for fast pagerank computation. In: Proceedings of the 9th International Conference on Ubiquitous Information Management and Communication. ACM, p 80

  34. Lin W (2019) Distributed algorithms for fully personalized pagerank on large graphs. In: The World Wide Web Conference, pp 1084–1094

  35. Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B (1998) Support vector machines. IEEE Intell Syst Appl 13(4):18–28

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maryam Nooraei Abadeh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nooraei Abadeh, M., Mirzaie, M. DiffPageRank: an efficient differential PageRank approach in MapReduce. J Supercomput 77, 188–211 (2021). https://doi.org/10.1007/s11227-020-03265-3

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-020-03265-3

Keywords

Navigation