Skip to main content
Log in

Toward High-Performance Delta-Based Iterative Processing with a Group-Based Approach

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Many systems have been built to employ the delta-based iterative execution model to support iterative algorithms on distributed platforms by exploiting the sparse computational dependencies between data items of these iterative algorithms in a synchronous or asynchronous approach. However, for large-scale iterative algorithms, existing synchronous solutions suffer from slow convergence speed and load imbalance, because of the strict barrier between iterations; while existing asynchronous approaches induce excessive redundant communication and computation cost as a result of being barrier-free. In view of the performance trade-off between these two approaches, this paper designs an efficient execution manager, called Aiter-R, which can be integrated into existing delta-based iterative processing systems to efficiently support the execution of delta-based iterative algorithms, by using our proposed group-based iterative execution approach. It can efficiently and correctly explore the middle ground of the two extremes. A heuristic scheduling algorithm is further proposed to allow an iterative algorithm to adaptively choose its trade-off point so as to achieve the maximum efficiency. Experimental results show that Aiter-R strikes a good balance between the synchronous and asynchronous policies and outperforms state-of-the-art solutions. It reduces the execution time by up to 54.1% and 84.6% in comparison with existing asynchronous and the synchronous models, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Zhang Y, Gao Q, Gao L, Wang C. Maiter: An asynchronous graph processing framework for delta-based accumulative iterative computation. IEEE Transactions on Parallel and Distributed Systems, 2014, 25(8): 2091-2100. DOI: https://doi.org/10.1109/TPDS.2013.235.

    Article  Google Scholar 

  2. Gonzalez J E, Low Y, Gu H, Bickson D, Guestrin C. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proc. the 10th USENIX Symposium on Operating Systems Design and Implementation, Oct. 2012, pp.17-30.

  3. Mihaylov S R, Ives Z G, Guha S. REX: Recursive, delta-based data-centric computation. Proc. the VLDB Endowment, 2012, 5(11): 1280-1291. DOI: https://doi.org/10.14778/2350229.2350246.

    Article  Google Scholar 

  4. Yu W, Lin X, Zhang W. Fast incremental SimRank on link-evolving graphs. In Proc. the 30th IEEE International Conference on Data Engineering, Mar. 31-Apr. 4, 2014, pp.304-315. DOI: 10.1109/ICDE.2014.6816660.

  5. Zhang Y, Chen S,Wang Q, Yu G. i2MapReduce: Incremental MapReduce for mining evolving big data. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(7): 1906-1919. DOI: https://doi.org/10.1109/TKDE.2015.2397438.

    Article  Google Scholar 

  6. Zhang Y, Liao X, Jin H, Gu L, Zhou B B. FBSGraph: Accelerating asynchronous graph processing via forward and backward sweeping. IEEE Transactions on Knowledge and Data Engineering, 2018, 30(5): 895-907. DOI: https://doi.org/10.1109/TKDE.2017.2781241.

    Article  Google Scholar 

  7. Zhang Y, Liao X, Shi X, Jin H, He B. Efficient disk-based directed graph processing: A strongly connected component approach. IEEE Transactions on Parallel and Distributed Systems, 2018, 29(4): 830-842. DOI: https://doi.org/10.1109/TPDS.2017.2776115.

    Article  Google Scholar 

  8. Hou G, Chen X, Wang S, Wei Z. Massively parallel algorithms for personalized PageRank. Proc. the VLDB Endowment, 2021, 14(9): 1668-1680. DOI: https://doi.org/10.14778/3461535.3461554.

    Article  Google Scholar 

  9. Chen H, Jin H, Cui X. Hybrid followee recommendation in microblogging systems. Science China Information Sciences, 2017, 60(1): Article No. 012102. DOI: 10.1007/s11432-016-5551-7.

  10. Liao X, Chen Y, Zhang Y et al. An efficient incremental strongly connected components algorithm for evolving directed graphs. Scientia Sinica Informationis, 2019, 49(8): 988-1004. DOI: https://doi.org/10.1360/N112018-00125. (in Chinese)

    Article  Google Scholar 

  11. Baluja S, Seth R, Sivakumar D et al. Video suggestion and discovery for YouTube: Taking random walks through the view graph. In Proc. the 17th International Conference on World Wide Web, Apr. 2008, pp. 895-904. DOI: 10.1145/1367497.1367618.

  12. Liben-Nowell D, Kleinberg J. The link prediction problem for social networks. In Proc. the 12th International Conference on Information and Knowledge Management, Nov. 2003, pp.556-559. DOI: 10.1145/956863.956972.

  13. Shroff G M. A parallel algorithm for the eigenvalues and eigenvectors of a general complex matrix. Numerische Mathematik, 1990, 58(1): 779-805. DOI: https://doi.org/10.1007/BF01385654.

    Article  MathSciNet  MATH  Google Scholar 

  14. Zhang Y, Liao X, Jin H, Min G. Resisting skew-accumulation for time-stepped applications in the cloud via exploiting parallelism. IEEE Transactions on Cloud Computing, 2015, 3(1): 54-65. DOI: https://doi.org/10.1109/TCC.2014.2328594.

    Article  Google Scholar 

  15. Zhang Y, Liao X, Jin H, Tan G, Min G. Inc-part: Incremental partitioning for load balancing in large-scale behavioral simulations. IEEE Transactions on Parallel and Distributed Systems, 2015, 26(7): 1900-1909. DOI: https://doi.org/10.1109/TPDS.2014.2333511.

    Article  Google Scholar 

  16. Ekanayake J, Li H, Zhang B, Gunarathne T, Bae S, Qiu J, Fox G. Twister: A runtime for iterative MapReduce. In Proc. the 19th ACM International Symposium on High Performance Distributed Computing, Jun. 2010, pp.810-818. DOI: 10.1145/1851476.1851593.

  17. Bu Y, Howe B, Balazinska M, Ernst M D. HaLoop: Efficient iterative data processing on large clusters. Proc. the VLDB Endowment, 2010, 3(1): 285-296. DOI: https://doi.org/10.14778/1920841.1920881.

    Article  Google Scholar 

  18. Power R, Li J. Piccolo: Building fast, distributed programs with partitioned tables. In Proc. the 9th USENIX Conference on Operating Systems Design and Implementation, Oct. 2010, pp.293-306.

  19. Zaharia M, Chowdhury M, FranklinM J, Shenker S, Stoica I. Spark: Cluster computing with working sets. In Proc. the 2nd USENIX Workshop on Hot Topics in Cloud Computing, Jun. 2010.

  20. Malewicz G, Austern M H, Bik A J C, Dehnert J C, Horn I, Leiser N, Czajkowski G. Pregel: A system for large-scale graph processing. In Proc. the 2010 ACM SIGMOD International Conference on Management of Data, Jun. 2010, pp.135-146. DOI: 10.1145/1807167.1807184.

  21. Roy A, Bindschaedler L, Malicevic J, Zwaenepoel W. Chaos: Scale-out graph processing from secondary storage. In Proc. the 25th Symposium on Operating Systems Principles, Oct. 2015, pp.410-424. DOI: 10.1145/2815400.2815408.

  22. Chen R, Shi J, Chen Y, Chen H. PowerLyra: Differentiated graph computation and partitioning on skewed graphs. In Proc. the 10th European Conference on Computer Systems, Apr. 2015, Article No. 1. DOI: https://doi.org/10.1145/2741948.2741970.

  23. Chazan D, Miranker W. Chaotic relaxation. Linear Algebra and Its Applications, 1969, 2(2): 199-222. DOI: https://doi.org/10.1016/0024-3795(69)90028-7.

    Article  MathSciNet  MATH  Google Scholar 

  24. Baudet G M. Asynchronous iterative methods for multiprocessors. Journal of the ACM, 1978, 25(2): 226-244. DOI: https://doi.org/10.1145/322063.322067.

    Article  MathSciNet  MATH  Google Scholar 

  25. Bertsekas D P. Distributed asynchronous computation of fixed points. Mathematical Programming, 1983, 27(1): 107-120. DOI: https://doi.org/10.1007/BF02591967.

    Article  MathSciNet  MATH  Google Scholar 

  26. Liu H K, Chen D, Jin H, Liao X F, He B S, Hu K, Zhang Y. A survey of non-volatile main memory technologies: State-of-the-arts, practices, and future directions. Journal of Computer Science and Technology, 2021, 36(1): 4-32. DOI: https://doi.org/10.1007/s11390-020-0780-z.

    Article  Google Scholar 

  27. Lv X Q, Xiao W, Zhang Y, Liao X F, Jin H, Hua S Q. An effective framework for asynchronous incremental graph processing. Frontiers of Computer Science, 2019, 13(3): 539-551. DOI: https://doi.org/10.1007/s11704-018-7443-z.

    Article  Google Scholar 

  28. Murray D G, Schwarzkopf M, Smowton C, Smith S, Mad-havapeddy A, Hand S. CIEL: A universal execution engine for distributed data-ow computing. In Proc. the 8th USENIX Conference on Networked Systems Design and Implementation, Mar. 30-Apr. 1, 2011, pp.113-126.

  29. Dai D, Chen Y, Kimpe D, Ross R B. Trigger-based incremental data processing with unified sync and async model. IEEE Transactions on Cloud Computing, 2021, 9(1): 372-385. DOI: https://doi.org/10.1109/TCC.2018.2830348.

    Article  Google Scholar 

  30. Zhang Y, Gao Q, Gao L, Wang C. PrIter: A distributed framework for prioritized iterative computations. In Proc. the 2nd ACM Symposium on Cloud Computing, Oct. 2011, Article No. 13. DOI: https://doi.org/10.1145/2038916.2038929.

  31. Talács G, Pilászy I, Németh B, Tikk D. Scalable collaborative filtering approaches for large recommender systems. Journal of Machine Learning Research, 2009, 10: 623-656.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yu Zhang.

Supplementary Information

ESM 1

(PDF 131 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yu, H., Jiang, XY., Zhao, J. et al. Toward High-Performance Delta-Based Iterative Processing with a Group-Based Approach. J. Comput. Sci. Technol. 37, 797–813 (2022). https://doi.org/10.1007/s11390-022-2101-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-022-2101-1

Keywords

Navigation