Abstract
Many systems have been built to employ the delta-based iterative execution model to support iterative algorithms on distributed platforms by exploiting the sparse computational dependencies between data items of these iterative algorithms in a synchronous or asynchronous approach. However, for large-scale iterative algorithms, existing synchronous solutions suffer from slow convergence speed and load imbalance, because of the strict barrier between iterations; while existing asynchronous approaches induce excessive redundant communication and computation cost as a result of being barrier-free. In view of the performance trade-off between these two approaches, this paper designs an efficient execution manager, called Aiter-R, which can be integrated into existing delta-based iterative processing systems to efficiently support the execution of delta-based iterative algorithms, by using our proposed group-based iterative execution approach. It can efficiently and correctly explore the middle ground of the two extremes. A heuristic scheduling algorithm is further proposed to allow an iterative algorithm to adaptively choose its trade-off point so as to achieve the maximum efficiency. Experimental results show that Aiter-R strikes a good balance between the synchronous and asynchronous policies and outperforms state-of-the-art solutions. It reduces the execution time by up to 54.1% and 84.6% in comparison with existing asynchronous and the synchronous models, respectively.
Similar content being viewed by others
References
Zhang Y, Gao Q, Gao L, Wang C. Maiter: An asynchronous graph processing framework for delta-based accumulative iterative computation. IEEE Transactions on Parallel and Distributed Systems, 2014, 25(8): 2091-2100. DOI: https://doi.org/10.1109/TPDS.2013.235.
Gonzalez J E, Low Y, Gu H, Bickson D, Guestrin C. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proc. the 10th USENIX Symposium on Operating Systems Design and Implementation, Oct. 2012, pp.17-30.
Mihaylov S R, Ives Z G, Guha S. REX: Recursive, delta-based data-centric computation. Proc. the VLDB Endowment, 2012, 5(11): 1280-1291. DOI: https://doi.org/10.14778/2350229.2350246.
Yu W, Lin X, Zhang W. Fast incremental SimRank on link-evolving graphs. In Proc. the 30th IEEE International Conference on Data Engineering, Mar. 31-Apr. 4, 2014, pp.304-315. DOI: 10.1109/ICDE.2014.6816660.
Zhang Y, Chen S,Wang Q, Yu G. i2MapReduce: Incremental MapReduce for mining evolving big data. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(7): 1906-1919. DOI: https://doi.org/10.1109/TKDE.2015.2397438.
Zhang Y, Liao X, Jin H, Gu L, Zhou B B. FBSGraph: Accelerating asynchronous graph processing via forward and backward sweeping. IEEE Transactions on Knowledge and Data Engineering, 2018, 30(5): 895-907. DOI: https://doi.org/10.1109/TKDE.2017.2781241.
Zhang Y, Liao X, Shi X, Jin H, He B. Efficient disk-based directed graph processing: A strongly connected component approach. IEEE Transactions on Parallel and Distributed Systems, 2018, 29(4): 830-842. DOI: https://doi.org/10.1109/TPDS.2017.2776115.
Hou G, Chen X, Wang S, Wei Z. Massively parallel algorithms for personalized PageRank. Proc. the VLDB Endowment, 2021, 14(9): 1668-1680. DOI: https://doi.org/10.14778/3461535.3461554.
Chen H, Jin H, Cui X. Hybrid followee recommendation in microblogging systems. Science China Information Sciences, 2017, 60(1): Article No. 012102. DOI: 10.1007/s11432-016-5551-7.
Liao X, Chen Y, Zhang Y et al. An efficient incremental strongly connected components algorithm for evolving directed graphs. Scientia Sinica Informationis, 2019, 49(8): 988-1004. DOI: https://doi.org/10.1360/N112018-00125. (in Chinese)
Baluja S, Seth R, Sivakumar D et al. Video suggestion and discovery for YouTube: Taking random walks through the view graph. In Proc. the 17th International Conference on World Wide Web, Apr. 2008, pp. 895-904. DOI: 10.1145/1367497.1367618.
Liben-Nowell D, Kleinberg J. The link prediction problem for social networks. In Proc. the 12th International Conference on Information and Knowledge Management, Nov. 2003, pp.556-559. DOI: 10.1145/956863.956972.
Shroff G M. A parallel algorithm for the eigenvalues and eigenvectors of a general complex matrix. Numerische Mathematik, 1990, 58(1): 779-805. DOI: https://doi.org/10.1007/BF01385654.
Zhang Y, Liao X, Jin H, Min G. Resisting skew-accumulation for time-stepped applications in the cloud via exploiting parallelism. IEEE Transactions on Cloud Computing, 2015, 3(1): 54-65. DOI: https://doi.org/10.1109/TCC.2014.2328594.
Zhang Y, Liao X, Jin H, Tan G, Min G. Inc-part: Incremental partitioning for load balancing in large-scale behavioral simulations. IEEE Transactions on Parallel and Distributed Systems, 2015, 26(7): 1900-1909. DOI: https://doi.org/10.1109/TPDS.2014.2333511.
Ekanayake J, Li H, Zhang B, Gunarathne T, Bae S, Qiu J, Fox G. Twister: A runtime for iterative MapReduce. In Proc. the 19th ACM International Symposium on High Performance Distributed Computing, Jun. 2010, pp.810-818. DOI: 10.1145/1851476.1851593.
Bu Y, Howe B, Balazinska M, Ernst M D. HaLoop: Efficient iterative data processing on large clusters. Proc. the VLDB Endowment, 2010, 3(1): 285-296. DOI: https://doi.org/10.14778/1920841.1920881.
Power R, Li J. Piccolo: Building fast, distributed programs with partitioned tables. In Proc. the 9th USENIX Conference on Operating Systems Design and Implementation, Oct. 2010, pp.293-306.
Zaharia M, Chowdhury M, FranklinM J, Shenker S, Stoica I. Spark: Cluster computing with working sets. In Proc. the 2nd USENIX Workshop on Hot Topics in Cloud Computing, Jun. 2010.
Malewicz G, Austern M H, Bik A J C, Dehnert J C, Horn I, Leiser N, Czajkowski G. Pregel: A system for large-scale graph processing. In Proc. the 2010 ACM SIGMOD International Conference on Management of Data, Jun. 2010, pp.135-146. DOI: 10.1145/1807167.1807184.
Roy A, Bindschaedler L, Malicevic J, Zwaenepoel W. Chaos: Scale-out graph processing from secondary storage. In Proc. the 25th Symposium on Operating Systems Principles, Oct. 2015, pp.410-424. DOI: 10.1145/2815400.2815408.
Chen R, Shi J, Chen Y, Chen H. PowerLyra: Differentiated graph computation and partitioning on skewed graphs. In Proc. the 10th European Conference on Computer Systems, Apr. 2015, Article No. 1. DOI: https://doi.org/10.1145/2741948.2741970.
Chazan D, Miranker W. Chaotic relaxation. Linear Algebra and Its Applications, 1969, 2(2): 199-222. DOI: https://doi.org/10.1016/0024-3795(69)90028-7.
Baudet G M. Asynchronous iterative methods for multiprocessors. Journal of the ACM, 1978, 25(2): 226-244. DOI: https://doi.org/10.1145/322063.322067.
Bertsekas D P. Distributed asynchronous computation of fixed points. Mathematical Programming, 1983, 27(1): 107-120. DOI: https://doi.org/10.1007/BF02591967.
Liu H K, Chen D, Jin H, Liao X F, He B S, Hu K, Zhang Y. A survey of non-volatile main memory technologies: State-of-the-arts, practices, and future directions. Journal of Computer Science and Technology, 2021, 36(1): 4-32. DOI: https://doi.org/10.1007/s11390-020-0780-z.
Lv X Q, Xiao W, Zhang Y, Liao X F, Jin H, Hua S Q. An effective framework for asynchronous incremental graph processing. Frontiers of Computer Science, 2019, 13(3): 539-551. DOI: https://doi.org/10.1007/s11704-018-7443-z.
Murray D G, Schwarzkopf M, Smowton C, Smith S, Mad-havapeddy A, Hand S. CIEL: A universal execution engine for distributed data-ow computing. In Proc. the 8th USENIX Conference on Networked Systems Design and Implementation, Mar. 30-Apr. 1, 2011, pp.113-126.
Dai D, Chen Y, Kimpe D, Ross R B. Trigger-based incremental data processing with unified sync and async model. IEEE Transactions on Cloud Computing, 2021, 9(1): 372-385. DOI: https://doi.org/10.1109/TCC.2018.2830348.
Zhang Y, Gao Q, Gao L, Wang C. PrIter: A distributed framework for prioritized iterative computations. In Proc. the 2nd ACM Symposium on Cloud Computing, Oct. 2011, Article No. 13. DOI: https://doi.org/10.1145/2038916.2038929.
Talács G, Pilászy I, Németh B, Tikk D. Scalable collaborative filtering approaches for large recommender systems. Journal of Machine Learning Research, 2009, 10: 623-656.
Author information
Authors and Affiliations
Corresponding author
Supplementary Information
ESM 1
(PDF 131 kb)
Rights and permissions
About this article
Cite this article
Yu, H., Jiang, XY., Zhao, J. et al. Toward High-Performance Delta-Based Iterative Processing with a Group-Based Approach. J. Comput. Sci. Technol. 37, 797–813 (2022). https://doi.org/10.1007/s11390-022-2101-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-022-2101-1