DiffPageRank: an efficient differential PageRank approach in MapReduce

Nooraei Abadeh, Maryam; Mirzaie, Mansooreh

doi:10.1007/s11227-020-03265-3

DiffPageRank: an efficient differential PageRank approach in MapReduce

Published: 30 March 2020

Volume 77, pages 188–211, (2021)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Maryam Nooraei Abadeh¹ &
Mansooreh Mirzaie²

342 Accesses
Explore all metrics

Abstract

Unstructured big data processing requires efficient computational styles to rapidly analyze continuously changing data. Incremental processing is a promising technique to update search results without re-computing the whole process. One of the most important mining algorithms is PageRank which necessitates incremental processing and iterative computations to keep the mining results up-to-date. In this paper, a light and efficient PageRank algorithm is proposed using differential dataflow in the MapReduce programming style named DiffPageRank. The innovation of the proposed approach is investigated from two points of view: (1) the updating state is dependent on a partial order of changes in DiffPageRank computing, while the changes are applied based on complete order sequences in the standard incremental computing; and (2) in DiffPageRank, a set of updates rebuilds each version, while in the incremental systems, every single update continually bring up-to-date the current version state. DiffPageRank is compared with two other implementations of PageRank including standard and incremental PageRank in MapReduce in terms of processing time, CPU utilization and update speed of data mining results. The results show that if the changes on the input data are large, the efficiency of the differential method will considerably increase in terms of time than the standard and incremental PageRank approaches up to 61% and 86%, respectively. DiffPageRank is also compared with two state-of-the-art incremental graph processing benchmarks on Orkut and Twitter datasets in terms of execution time and the number of updates. The experimental comparisons on benchmark datasets show that our method reduces the number of updates and total execution time by using the differential MapReduce approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

DF* PageRank: Incrementally Expanding Approaches for Updating PageRank on Dynamic Graphs

Monte Carlo Based Incremental PageRank on Evolving Graphs

Algorithms for Iterative Applications in MapReduce Framework

References

Brin S, Page L (1998) The anatomy of a large-scale hypertextual web search engine. Comput Netw ISDN Syst 30(1–7):107–117
Article Google Scholar
White T (2012) Hadoop: the definitive guide. O’Reilly Media, Inc., Sebastopol
Google Scholar
Shvachko K, Kuang H, Radia S, Chansler R (2010) The hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST). IEEE, pp 1–10
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Article Google Scholar
Maleki N, Rahmani AM, Conti M (2019) MapReduce: an infrastructure review and research insights. J Supercomput 75(10):6934–7002
Article Google Scholar
Gupta D, Rani R (2018) A study of big data evolution and research challenges. J Inf Sci 45:322–340
Article Google Scholar
Talan PP, Sharma KU, Nawade PP, Talan KP (2019) An overview of Hadoop MapReduce, spark, and scalable graph processing architecture. In: Kalita J, Balas VE, Borah S, Pradhan R (eds) Recent developments in machine learning and data analytics. Springer, Berlin, pp 35–42
Chapter Google Scholar
Kwak H, Lee C, Park H, Moon S (2010) What is Twitter, a social network or a news media? In: Proceedings of the 19th International Conference on World Wide Web. ACM, pp 591–600
Zhang Y, Chen S, Wang Q, Yu G (2016) i2MapReduce: incremental MapReduce for mining evolving big data. In: 2016 IEEE 32nd International Conference on Data Engineering (ICDE). IEEE, pp 1482–1483
McSherry F, Murray DG, Isaacs R, Isard M (2013) Differential dataflow. In: CIDR
Bhawiyuga A, Kirana AP (2016) Implementation of page rank algorithm in Hadoop MapReduce framework. In: 2016 International Seminar on Intelligent Technology and Its Applications (ISITIA). IEEE, pp 231–236
Murray DG, McSherry F, Isaacs R, Isard M, Barham P, Abadi M (2013) Naiad: a timely dataflow system. In: Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles. ACM, pp 439–455
Murray DG, McSherry F, Isard M, Isaacs R, Barham P, Abadi M (2016) Incremental, iterative data processing with timely dataflow. Commun ACM 59(10):75–83
Article Google Scholar
Pasquinelli M (2009) Google’s PageRank algorithm: a diagram of cognitive capitalism and the rentier of the common intellect. In: Becker K, Stalder F (eds) Deep search: the politics of search beyond Google. Studien Verlag, Innsbruck, pp 152–163
Google Scholar
Cauwenberghs G, Poggio T (2001) Incremental and decremental support vector machine learning. In: Advances in Neural Information Processing Systems, pp 409–415
Peng D, Dabek F (2010) Large-scale incremental processing using distributed transactions and notifications. In: OSDI, vol 10, pp 1–15
Popa L, Budiu M, Yu Y, Isard M (2009) DryadInc: reusing work in large-scale computations. HotCloud 9:2–6
Google Scholar
Logothetis D, Olston C, Reed B, Webb KC, Yocum K (2010) Stateful bulk processing for incremental analytics. In: Proceedings of the 1st ACM Symposium on Cloud Computing, pp 51–62
Lee D, Kim J-S, Maeng S (2014) Large-scale incremental processing with MapReduce. Future Gener Comput Syst 36:66–79
Article Google Scholar
Zhang Y, Chen S (2013) i2MapReduce: incremental iterative MapReduce. In: Proceedings of the 2nd International Workshop on Cloud Intelligence, pp 1–4
Jörg T, Parvizi R, Yong H, Dessloch S (2011) Incremental recomputations in mapreduce. In: Proceedings of the Third International Workshop on Cloud Data Management, pp 7–14
Saadon AGB, Mokhtar HM (2019) Survey on iterative and incremental approaches in distributed computing environment. Int J Data Sci 4(1):18–30
Article Google Scholar
Bhatotia P, Wieder A, Rodrigues R, Acar UA, Pasquin, R (2011) Incoop: MapReduce for incremental computations. In: Proceedings of the 2nd ACM Symposium on Cloud Computing. ACM, p 7
Logothetis D, Olston C, Reed B, Webb KC, Yocum K (2010) Stateful bulk processing for incremental analytics. In: Presented at the Proceedings of the 1st ACM Symposium on Cloud Computing—SoCC ‘10. http://dx.doi.org/10.1145/1807128.1807138
McSherry FD, Isaacs R, Isard MA, Murray DG (2015) Differential dataflow, ed: Google Patents
Cheng R et al (2012) Kineograph: taking the pulse of a fast-changing and connected world. In: Proceedings of the 7th ACM European Conference on Computer Systems, pp 85–98
Yin J, Gao L (2016) Asynchronous distributed incremental computation on evolving graphs. In: Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, pp 722–738
Lv X, Xiao W, Zhang Y, Liao X, Jin H, Hua Q (2019) An effective framework for asynchronous incremental graph processing. Front Comput Sci 13(3):539–551
Article Google Scholar
Park S, Lee W, Choe B, Lee S-G (2019) A survey on personalized PageRank computation algorithms. IEEE Access 7:163049–163062
Article Google Scholar
Bahmani B, Chowdhury A, Goel A (2010) Fast incremental and personalized pagerank. Proc VLDB Endow 4(3):173–184
Article Google Scholar
Abdullah IB (2010) Incremental pagerank for twitter data using hadoop. Technical paper
Desikan P, Pathak N, Srivastava J, Kumar V (2005) Incremental page rank computation on evolving graphs. In: Special Interest Tracks and Posters of the 14th International Conference on World Wide Web. ACM, pp 1094–1095
Kim KS, Choi YS (2015) Incremental iteration method for fast pagerank computation. In: Proceedings of the 9th International Conference on Ubiquitous Information Management and Communication. ACM, p 80
Lin W (2019) Distributed algorithms for fully personalized pagerank on large graphs. In: The World Wide Web Conference, pp 1084–1094
Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B (1998) Support vector machines. IEEE Intell Syst Appl 13(4):18–28
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Engineering, Abadan Branch, Islamic Azad University, Abadan, Iran
Maryam Nooraei Abadeh
Faculty of Electrical and Computer Engineering, Golpayegan University of Technology, Golpayegan, Iran
Mansooreh Mirzaie

Authors

Maryam Nooraei Abadeh
View author publications
You can also search for this author inPubMed Google Scholar
Mansooreh Mirzaie
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Maryam Nooraei Abadeh.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nooraei Abadeh, M., Mirzaie, M. DiffPageRank: an efficient differential PageRank approach in MapReduce. J Supercomput 77, 188–211 (2021). https://doi.org/10.1007/s11227-020-03265-3

Download citation

Published: 30 March 2020
Issue Date: January 2021
DOI: https://doi.org/10.1007/s11227-020-03265-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DiffPageRank: an efficient differential PageRank approach in MapReduce

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

DF* PageRank: Incrementally Expanding Approaches for Updating PageRank on Dynamic Graphs

Monte Carlo Based Incremental PageRank on Evolving Graphs

Algorithms for Iterative Applications in MapReduce Framework

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now