Toward High-Performance Delta-Based Iterative Processing with a Group-Based Approach

Yu, Hui; Jiang, Xin-Yu; Zhao, Jin; Qi, Hao; Zhang, Yu; Liao, Xiao-Fei; Liu, Hai-Kun; Mao, Fu-Bing; Jin, Hai

doi:10.1007/s11390-022-2101-1

Toward High-Performance Delta-Based Iterative Processing with a Group-Based Approach

Regular Paper
Published: 30 July 2022

Volume 37, pages 797–813, (2022)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Hui Yu^1,2,3,
Xin-Yu Jiang^1,2,3,
Jin Zhao^1,2,3,
Hao Qi^1,2,3,
Yu Zhang^1,2,3,
Xiao-Fei Liao^1,2,3,
Hai-Kun Liu^1,2,3,
Fu-Bing Mao^1,2,3 &
…
Hai Jin^1,2,3

190 Accesses
2 Citations
Explore all metrics

Abstract

Many systems have been built to employ the delta-based iterative execution model to support iterative algorithms on distributed platforms by exploiting the sparse computational dependencies between data items of these iterative algorithms in a synchronous or asynchronous approach. However, for large-scale iterative algorithms, existing synchronous solutions suffer from slow convergence speed and load imbalance, because of the strict barrier between iterations; while existing asynchronous approaches induce excessive redundant communication and computation cost as a result of being barrier-free. In view of the performance trade-off between these two approaches, this paper designs an efficient execution manager, called Aiter-R, which can be integrated into existing delta-based iterative processing systems to efficiently support the execution of delta-based iterative algorithms, by using our proposed group-based iterative execution approach. It can efficiently and correctly explore the middle ground of the two extremes. A heuristic scheduling algorithm is further proposed to allow an iterative algorithm to adaptively choose its trade-off point so as to achieve the maximum efficiency. Experimental results show that Aiter-R strikes a good balance between the synchronous and asynchronous policies and outperforms state-of-the-art solutions. It reduces the execution time by up to 54.1% and 84.6% in comparison with existing asynchronous and the synchronous models, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Command Horizons: Coalescing Data Dependencies While Maintaining Asynchronicity

Improving Performance of Distributed Graph Traversals via Application-Aware Plug-In Work Scheduler

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

Article Open access 06 April 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Zhang Y, Gao Q, Gao L, Wang C. Maiter: An asynchronous graph processing framework for delta-based accumulative iterative computation. IEEE Transactions on Parallel and Distributed Systems, 2014, 25(8): 2091-2100. DOI: https://doi.org/10.1109/TPDS.2013.235.
Article Google Scholar
Gonzalez J E, Low Y, Gu H, Bickson D, Guestrin C. PowerGraph: Distributed graph-parallel computation on natural graphs. In Proc. the 10th USENIX Symposium on Operating Systems Design and Implementation, Oct. 2012, pp.17-30.
Mihaylov S R, Ives Z G, Guha S. REX: Recursive, delta-based data-centric computation. Proc. the VLDB Endowment, 2012, 5(11): 1280-1291. DOI: https://doi.org/10.14778/2350229.2350246.
Article Google Scholar
Yu W, Lin X, Zhang W. Fast incremental SimRank on link-evolving graphs. In Proc. the 30th IEEE International Conference on Data Engineering, Mar. 31-Apr. 4, 2014, pp.304-315. DOI: 10.1109/ICDE.2014.6816660.
Zhang Y, Chen S,Wang Q, Yu G. i²MapReduce: Incremental MapReduce for mining evolving big data. IEEE Transactions on Knowledge and Data Engineering, 2015, 27(7): 1906-1919. DOI: https://doi.org/10.1109/TKDE.2015.2397438.
Article Google Scholar
Zhang Y, Liao X, Jin H, Gu L, Zhou B B. FBSGraph: Accelerating asynchronous graph processing via forward and backward sweeping. IEEE Transactions on Knowledge and Data Engineering, 2018, 30(5): 895-907. DOI: https://doi.org/10.1109/TKDE.2017.2781241.
Article Google Scholar
Zhang Y, Liao X, Shi X, Jin H, He B. Efficient disk-based directed graph processing: A strongly connected component approach. IEEE Transactions on Parallel and Distributed Systems, 2018, 29(4): 830-842. DOI: https://doi.org/10.1109/TPDS.2017.2776115.
Article Google Scholar
Hou G, Chen X, Wang S, Wei Z. Massively parallel algorithms for personalized PageRank. Proc. the VLDB Endowment, 2021, 14(9): 1668-1680. DOI: https://doi.org/10.14778/3461535.3461554.
Article Google Scholar
Chen H, Jin H, Cui X. Hybrid followee recommendation in microblogging systems. Science China Information Sciences, 2017, 60(1): Article No. 012102. DOI: 10.1007/s11432-016-5551-7.
Liao X, Chen Y, Zhang Y et al. An efficient incremental strongly connected components algorithm for evolving directed graphs. Scientia Sinica Informationis, 2019, 49(8): 988-1004. DOI: https://doi.org/10.1360/N112018-00125. (in Chinese)
Article Google Scholar
Baluja S, Seth R, Sivakumar D et al. Video suggestion and discovery for YouTube: Taking random walks through the view graph. In Proc. the 17th International Conference on World Wide Web, Apr. 2008, pp. 895-904. DOI: 10.1145/1367497.1367618.
Liben-Nowell D, Kleinberg J. The link prediction problem for social networks. In Proc. the 12th International Conference on Information and Knowledge Management, Nov. 2003, pp.556-559. DOI: 10.1145/956863.956972.
Shroff G M. A parallel algorithm for the eigenvalues and eigenvectors of a general complex matrix. Numerische Mathematik, 1990, 58(1): 779-805. DOI: https://doi.org/10.1007/BF01385654.
Article MathSciNet MATH Google Scholar
Zhang Y, Liao X, Jin H, Min G. Resisting skew-accumulation for time-stepped applications in the cloud via exploiting parallelism. IEEE Transactions on Cloud Computing, 2015, 3(1): 54-65. DOI: https://doi.org/10.1109/TCC.2014.2328594.
Article Google Scholar
Zhang Y, Liao X, Jin H, Tan G, Min G. Inc-part: Incremental partitioning for load balancing in large-scale behavioral simulations. IEEE Transactions on Parallel and Distributed Systems, 2015, 26(7): 1900-1909. DOI: https://doi.org/10.1109/TPDS.2014.2333511.
Article Google Scholar
Ekanayake J, Li H, Zhang B, Gunarathne T, Bae S, Qiu J, Fox G. Twister: A runtime for iterative MapReduce. In Proc. the 19th ACM International Symposium on High Performance Distributed Computing, Jun. 2010, pp.810-818. DOI: 10.1145/1851476.1851593.
Bu Y, Howe B, Balazinska M, Ernst M D. HaLoop: Efficient iterative data processing on large clusters. Proc. the VLDB Endowment, 2010, 3(1): 285-296. DOI: https://doi.org/10.14778/1920841.1920881.
Article Google Scholar
Power R, Li J. Piccolo: Building fast, distributed programs with partitioned tables. In Proc. the 9th USENIX Conference on Operating Systems Design and Implementation, Oct. 2010, pp.293-306.
Zaharia M, Chowdhury M, FranklinM J, Shenker S, Stoica I. Spark: Cluster computing with working sets. In Proc. the 2nd USENIX Workshop on Hot Topics in Cloud Computing, Jun. 2010.
Malewicz G, Austern M H, Bik A J C, Dehnert J C, Horn I, Leiser N, Czajkowski G. Pregel: A system for large-scale graph processing. In Proc. the 2010 ACM SIGMOD International Conference on Management of Data, Jun. 2010, pp.135-146. DOI: 10.1145/1807167.1807184.
Roy A, Bindschaedler L, Malicevic J, Zwaenepoel W. Chaos: Scale-out graph processing from secondary storage. In Proc. the 25th Symposium on Operating Systems Principles, Oct. 2015, pp.410-424. DOI: 10.1145/2815400.2815408.
Chen R, Shi J, Chen Y, Chen H. PowerLyra: Differentiated graph computation and partitioning on skewed graphs. In Proc. the 10th European Conference on Computer Systems, Apr. 2015, Article No. 1. DOI: https://doi.org/10.1145/2741948.2741970.
Chazan D, Miranker W. Chaotic relaxation. Linear Algebra and Its Applications, 1969, 2(2): 199-222. DOI: https://doi.org/10.1016/0024-3795(69)90028-7.
Article MathSciNet MATH Google Scholar
Baudet G M. Asynchronous iterative methods for multiprocessors. Journal of the ACM, 1978, 25(2): 226-244. DOI: https://doi.org/10.1145/322063.322067.
Article MathSciNet MATH Google Scholar
Bertsekas D P. Distributed asynchronous computation of fixed points. Mathematical Programming, 1983, 27(1): 107-120. DOI: https://doi.org/10.1007/BF02591967.
Article MathSciNet MATH Google Scholar
Liu H K, Chen D, Jin H, Liao X F, He B S, Hu K, Zhang Y. A survey of non-volatile main memory technologies: State-of-the-arts, practices, and future directions. Journal of Computer Science and Technology, 2021, 36(1): 4-32. DOI: https://doi.org/10.1007/s11390-020-0780-z.
Article Google Scholar
Lv X Q, Xiao W, Zhang Y, Liao X F, Jin H, Hua S Q. An effective framework for asynchronous incremental graph processing. Frontiers of Computer Science, 2019, 13(3): 539-551. DOI: https://doi.org/10.1007/s11704-018-7443-z.
Article Google Scholar
Murray D G, Schwarzkopf M, Smowton C, Smith S, Mad-havapeddy A, Hand S. CIEL: A universal execution engine for distributed data-ow computing. In Proc. the 8th USENIX Conference on Networked Systems Design and Implementation, Mar. 30-Apr. 1, 2011, pp.113-126.
Dai D, Chen Y, Kimpe D, Ross R B. Trigger-based incremental data processing with unified sync and async model. IEEE Transactions on Cloud Computing, 2021, 9(1): 372-385. DOI: https://doi.org/10.1109/TCC.2018.2830348.
Article Google Scholar
Zhang Y, Gao Q, Gao L, Wang C. PrIter: A distributed framework for prioritized iterative computations. In Proc. the 2nd ACM Symposium on Cloud Computing, Oct. 2011, Article No. 13. DOI: https://doi.org/10.1145/2038916.2038929.
Talács G, Pilászy I, Németh B, Tikk D. Scalable collaborative filtering approaches for large recommender systems. Journal of Machine Learning Research, 2009, 10: 623-656.
Google Scholar

Download references

Author information

Authors and Affiliations

National Engineering Research Center for Big Data Technology and System, Huazhong University of Science and Technology, Wuhan, 430074, China
Hui Yu, Xin-Yu Jiang, Jin Zhao, Hao Qi, Yu Zhang, Xiao-Fei Liao, Hai-Kun Liu, Fu-Bing Mao & Hai Jin
Service Computing Technology and System Laboratory, Huazhong University of Science and Technology, Wuhan, 430074, China
Hui Yu, Xin-Yu Jiang, Jin Zhao, Hao Qi, Yu Zhang, Xiao-Fei Liao, Hai-Kun Liu, Fu-Bing Mao & Hai Jin
Cluster and Grid Computing Laboratory, Huazhong University of Science and Technology, Wuhan 430074, China School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China
Hui Yu, Xin-Yu Jiang, Jin Zhao, Hao Qi, Yu Zhang, Xiao-Fei Liao, Hai-Kun Liu, Fu-Bing Mao & Hai Jin

Authors

Hui Yu
View author publications
You can also search for this author inPubMed Google Scholar
Xin-Yu Jiang
View author publications
You can also search for this author inPubMed Google Scholar
Jin Zhao
View author publications
You can also search for this author inPubMed Google Scholar
Hao Qi
View author publications
You can also search for this author inPubMed Google Scholar
Yu Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Xiao-Fei Liao
View author publications
You can also search for this author inPubMed Google Scholar
Hai-Kun Liu
View author publications
You can also search for this author inPubMed Google Scholar
Fu-Bing Mao
View author publications
You can also search for this author inPubMed Google Scholar
Hai Jin
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Yu Zhang.

Supplementary Information

ESM 1

(PDF 131 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yu, H., Jiang, XY., Zhao, J. et al. Toward High-Performance Delta-Based Iterative Processing with a Group-Based Approach. J. Comput. Sci. Technol. 37, 797–813 (2022). https://doi.org/10.1007/s11390-022-2101-1

Download citation

Received: 21 December 2021
Accepted: 29 June 2022
Published: 30 July 2022
Issue Date: July 2022
DOI: https://doi.org/10.1007/s11390-022-2101-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Toward High-Performance Delta-Based Iterative Processing with a Group-Based Approach

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Command Horizons: Coalescing Data Dependencies While Maintaining Asynchronicity

Improving Performance of Distributed Graph Traversals via Application-Aware Plug-In Work Scheduler

Balancing Tracking Granularity and Parallelism in Many-Task Systems: The Horizons Approach

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Supplementary Information

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now