Hypergraph-partitioning-based online joint scheduling of tasks and data

Song, Yao; Wang, Liang; Xiao, Limin; Wei, Wei; Scherer, Rafał; Qin, Guangjun; Wang, Jinquan

doi:10.1007/s11227-022-04460-0

Hypergraph-partitioning-based online joint scheduling of tasks and data

Published: 30 April 2022

Volume 78, pages 16088–16117, (2022)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Yao Song ORCID: orcid.org/0000-0002-8167-9558^1,2,
Liang Wang^1,2,
Limin Xiao^1,2,
Wei Wei³,
Rafał Scherer⁴,
Guangjun Qin⁵ &
…
Jinquan Wang^1,2

431 Accesses
1 Citation
Explore all metrics

Abstract

Recently, wide-area distributed computing environments have become popular owing to their huge resource capability. In a wide-area distributed computing environment, joint scheduling of tasks and data is the main strategy to improve system performance. However, the geographically distributed diverse resources exhibit high variations, making it challenging to design efficient joint scheduling of tasks and data. To accurately adapt to the dynamic variations of geographically distributed diverse resources and achieve a high system performance, this study proposes a hypergraph-partitioning-based online joint scheduling method. The proposed method constructs a hypergraph of geographically distributed tasks, data, and diverse resources to clearly describe the correlation among the three elements and quantitatively reflect the time cost of different process in the environment. The hypergraph is dynamically updated according to the generated scheduling scheme and the collected information to reflect the dynamic variations of resource states. Then, a hypergraph partition optimization mechanism is proposed to generate efficient joint scheduling schemes, thus reducing the overall completion time in the system. The experimental results indicate that compared with the state-of-the-art joint scheduling methods, the proposed method reduces the overall completion time by up to 25.67% and significantly reduces the task waiting time, although it makes a concession in the data migration time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic two-side matching of tasks and resources in wide-area distributed computing environments

Article 03 February 2023

GCSS: a global collaborative scheduling strategy for wide-area high-performance computing

Article 08 January 2022

A demand-centered scheduling framework for shared supercomputing resources: modeling, metrics, and case insights

Article Open access 22 May 2025

References

Cheng L, Wang Y, Liu Q, Epema DH, Liu C, Mao Y, Murphy J (2021) Network-aware locality scheduling for distributed data operators in data centers. IEEE Trans Parallel Distrib Syst 32(6):1494–1510
Article Google Scholar
Bilal K, Khalid O, Erbad A, Khan SU (2018) Potentials, trends, and prospects in edge technologies: Fog, cloudlet, mobile edge, and micro data centers. Computer Netw 130:94–120
Article Google Scholar
Kang S, Veeravalli B, Aung KMM (2018) Dynamic scheduling strategy with efficient node availability prediction for handling divisible loads in multi-cloud systems. J Parallel Distrib Comput 113:1–16
Article Google Scholar
Li C, Bai J, Tang J (2019) Joint optimization of data placement and scheduling for improving user experience in edge computing. J Parallel Distrib Comput 125:93–105
Article Google Scholar
Gagliardi F (2004) The European grid infrastructure EGEE project. Astron Data Anal Softw Syst(ADASS) 314:357
Google Scholar
Towns J, Cockerill T, Dahan M, Foster I, Gaither K, Grimshaw A, Hazlewood V, Lathrop S, Lifka D, Peterson GD et al (2014) Xsede: accelerating scientific discovery. Comput Sci Eng 16(5):62–74
Article Google Scholar
Xie X, Xiao N, Xu Z, Zha L, Li W, Yu H (2005) Cngrid software 2: service oriented approach to grid computing. In: the Proceedings of the UK e-Science All Hands Meeting, pp. 701–708. Citeseer
Chen Q, Zheng Z, Hu C, Wang D, Liu F (2019) On-edge multi-task transfer learning: model and practice with data-driven task allocation. IEEE Trans Parallel Distrib Syst 31(6):1357–1371
Article Google Scholar
Barika M, Garg S, Zomaya AY, Ranjan R (2021) Online scheduling technique to handle data velocity changes in stream workflows. IEEE Trans Parallel Distrib Syst 32(8):2115–2130
Article Google Scholar
Jin Y, Qian Z, Guo S, Zhang S, Jiao L, Lu S (2021) $ run $ rundata: Re-distributing data via piggybacking for geo-distributed data analytics over edges. IEEE Trans Parallel Distrib Syst 33(1):40–55
Google Scholar
Wang W, Li B, Liang B, Li J (2016) Multi-resource fair sharing for datacenter jobs with placement constraints. In: SC’16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1003–1014. IEEE
Chowdhury M, Zaharia M, Ma J, Jordan MI, Stoica I (2011) Managing data transfers in computer clusters with orchestra. ACM SIGCOMM Computer Commun Rev 41(4):98–109
Article Google Scholar
Xu K, Lv L, Li T, Shen M, Wang H, Yang K (2019) Minimizing tardiness for data-intensive applications in heterogeneous systems: a matching theory perspective. IEEE Trans Parallel Distrib Syst 31(1):144–158
Article Google Scholar
Mon EE, Thein MM, Aung MT (2016) Clustering based on task dependency for data-intensive workflow scheduling optimization. In: 2016 9th Workshop on Many-Task Computing on Clouds, Grids, and Supercomputers (MTAGS), pp. 20–25. IEEE
Zhao L, Yang Y, Munir A, Liu AX, Li Y, Qu W (2019) Optimizing geo-distributed data analytics with coordinated task scheduling and routing. IEEE Trans Parallel Distrib Syst 31(2):279–293
Article Google Scholar
Wang M, Zhang J, Dong F, Luo J (2014) Data placement and task scheduling optimization for data intensive scientific workflow in multiple data centers environment. In: 2014 Second International Conference on Advanced Cloud and Big Data, pp. 77–84. IEEE
Szabo C, Sheng QZ, Kroeger T, Zhang Y, Yu J (2014) Science in the cloud: allocation and execution of data-intensive scientific workflows. J Grid Comput 12(2):245–264
Article Google Scholar
Zhang J, Zhou X, Ge T, Wang X, Hwang T (2021) Joint task scheduling and containerizing for efficient edge computing. IEEE Trans Parallel Distrib Syst 32(8):2086–2100
Article Google Scholar
Bryk P, Malawski M, Juve G, Deelman E (2016) Storage-aware algorithms for scheduling of workflow ensembles in clouds. J Grid Comput 14(2):359–378
Article Google Scholar
Hu Z, Li B, Luo J (2017) Time-and cost-efficient task scheduling across geo-distributed data centers. IEEE Trans Parallel Distrib Syst 29(3):705–718
Article Google Scholar
Cheng B, Guan X, Wu H (2015) A hypergraph based task scheduling strategy for massive parallel spatial data processing on master-slave platforms. In: 2015 23rd International Conference on Geoinformatics, pp. 1–5. IEEE
Sheikh S, Pasha MA (2021) Energy-efficient cache-aware scheduling on heterogeneous multicore systems. IEEE Trans Parallel Distrib Syst 99:1
Google Scholar
Sajedi SN, Maadani M, Moghadam MN (2021) F-leach: a fuzzy-based data aggregation scheme for healthcare iot systems. J Supercomput 5:871
Google Scholar
Chen C-Y (2015) Task scheduling for maximizing performance and reliability considering fault recovery in heterogeneous distributed systems. IEEE Trans Parallel Distrib Syst 27(2):521–532
Article Google Scholar
Hoenisch P, Hochreiner C, Schuller D, Schulte S, Mendling J, Dustdar S (2015) Cost-efficient scheduling of elastic processes in hybrid clouds. In: 2015 IEEE 8th International Conference on Cloud Computing, pp. 17–24. IEEE
Edinger J, Schäfer D, Krupitzer C, Raychoudhury V, Becker C (2017) Fault-avoidance strategies for context-aware schedulers in pervasive computing systems. In: 2017 IEEE International Conference on Pervasive Computing and Communications (PerCom), pp. 79–88. IEEE
Xu H, Lau WC (2016) Optimization for speculative execution in big data processing clusters. IEEE Trans Parallel Distrib Syst 28(2):530–545
Google Scholar
Hu M, Luo J, Wang Y, Veeravalli B (2016) Adaptive scheduling of task graphs with dynamic resilience. IEEE Trans Computers 66(1):17–23
Article MathSciNet Google Scholar
Li Z, Chang V, Hu H, Hu H, Ge J (2021) Real-time and dynamic fault-tolerant scheduling for scientific workflows in clouds. Inf Sci 568:12
Article Google Scholar
Yeung G, Borowiec D, Yang R, Friday A, Harper R, Garraghan P (2021) Horus: interference-aware and prediction-based scheduling in deep learning systems. IEEE Trans Parallel Distrib Syst 33:88–100
Article Google Scholar
Wei W, Fan X, Song H, Fan X, Yang J (2018) Imperfect information dynamic stackelberg game based resource allocation using hidden markov for cloud computing. IEEE Trans Serv Comput 11(99):78–89
Article Google Scholar
Devine KD, Boman EG, Heaphy RT, Bisseling RH, Çatalyürek Ümit V (2006) Parallel hypergraph partitioning for scientific computing. In: International Parallel & Distributed Processing Symposium
Zhou Q, Guo S, Lu H, Li L, Guo M, Sun Y, Wang K (2021) A comprehensive inspection of the straggler problem. Computer 54:4–5
Article Google Scholar
Schafer D, Edinger J, Paluska JM, Vansyckel S, Becker C (2016) Tasklets: "better than best-effort" computing. In: International Conference on Computer Communication & Networks
Wei B, Xiao L, Song Y, Qin G, Zhu J, Yan B, Wang C, Huo Z (2022) A self-tuning client-side metadata prefetching scheme for wide area network file systems. Sci China Inf Sci 65(3):1–17
Article Google Scholar
Bharadwaj V, Ghose D, Robertazzi TG (2003) Divisible load theory: a new paradigm for load scheduling in distributed systems. Cluster Comput 6(1):7–17
Article Google Scholar
Wei X, Li L, Li X, Wang X, Gao S, Li H (2019) Pec: Proactive elastic collaborative resource scheduling in data stream processing. Parallel Distrib Syst, IEEE Trans Parallel Distrib Syst 30:1628–1642
Article Google Scholar
Zheng N, Chen Q, Yang Y, Li J, Guo M (2019) Poster: Precise capacity planning for database public clouds. In: 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT)
Kremer-Herman N, Tovar B, Thain D (2018) A lightweight model for right-sizing master-worker applications. SC18: International Conference for High Performance Computing, Networking, Storage and Analysis, 504–516
Song Y, Xiao L, Wang L, Qin G, Wei B, Yan B, Zhang C (2022) Gcss: a global collaborative scheduling strategy for wide-area high-performance computing. Front Computer Sci 16:1–15
Google Scholar
Selvakkumaran N, Karypis G (2006) Multiobjective hypergraph-partitioning algorithms for cut and maximum subdomain-degree minimization. IEEE Trans Computer-Aided Des Integr Circuit Syst 25:504–517
Article Google Scholar
Boman EG, Çatalyürek ÜV, Chevalier C, Devine KD (2012) The zoltan and isorropia parallel toolkits for combinatorial scientific computing: partitioning, ordering and coloring. Sci Program 20:129–150
Google Scholar
Liu L-T, Kuo M-T, Huang S-C, Cheng C-K (1995) A gradient method on the initial partition of fiduccia-mattheyses algorithm. Proceedings of IEEE International Conference on Computer Aided Design (ICCAD), 229–234
Devine KD, Boman, EG, Heaphy, RT, Bisseling, RH, Çatalyürek, ÜV (2006) Parallel hypergraph partitioning for scientific computing. Proceedings 20th IEEE International Parallel & Distributed Processing Symposium, 10
Casanova H, Legrand A, Quinson M (2008) Simgrid: A generic framework for large-scale distributed experiments. Tenth International Conference on Computer Modeling and Simulation (uksim 2008), 126–131
Feitelson DG, Tsafrir D, Krakov D (2014) Experience with using the parallel workloads archive. J Parallel Distrib Comput 74:2967–2982
Article Google Scholar
Chen Y, Ganapathi A, Griffith R, Katz RH (2011) The case for evaluating mapreduce performance using workload suites. 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems, 390–399

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China under Grant No. 61772053 and Grant No. 62104014, and the fund of the State Key Laboratory of Software Development Environment under Grant No. SKLSDE-2020ZX-15. This work was also supported by Natural Science Foundation of Shaanxi Province of China(2021JM-344) and Shaanxi Key Laboratory of Intelligent Processing for Big Energy Data(No.IPBED7).

Author information

Authors and Affiliations

State Key Laboratory of Software Development Environment, Beihang University, Beijing, 100191, China
Yao Song, Liang Wang, Limin Xiao & Jinquan Wang
School of Computer Science and Engineering, Beihang University, Beijing, 100191, China
Yao Song, Liang Wang, Limin Xiao & Jinquan Wang
School of Computer Science and Engineering, Xi’an University of Technology, Xi’an, 710048, China
Wei Wei
Institute of Computational Intelligence, Czestochowa University of Technology, Czestochowa, 42-200, Poland
Rafał Scherer
Smart City College, Beijing Union University, Beijing, 100101, China
Guangjun Qin

Authors

Yao Song
View author publications
Search author on:PubMed Google Scholar
Liang Wang
View author publications
Search author on:PubMed Google Scholar
Limin Xiao
View author publications
Search author on:PubMed Google Scholar
Wei Wei
View author publications
Search author on:PubMed Google Scholar
Rafał Scherer
View author publications
Search author on:PubMed Google Scholar
Guangjun Qin
View author publications
Search author on:PubMed Google Scholar
Jinquan Wang
View author publications
Search author on:PubMed Google Scholar

Corresponding authors

Correspondence to Liang Wang or Limin Xiao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Song, Y., Wang, L., Xiao, L. et al. Hypergraph-partitioning-based online joint scheduling of tasks and data. J Supercomput 78, 16088–16117 (2022). https://doi.org/10.1007/s11227-022-04460-0

Download citation

Accepted: 17 March 2022
Published: 30 April 2022
Issue Date: September 2022
DOI: https://doi.org/10.1007/s11227-022-04460-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hypergraph-partitioning-based online joint scheduling of tasks and data

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Dynamic two-side matching of tasks and resources in wide-area distributed computing environments

GCSS: a global collaborative scheduling strategy for wide-area high-performance computing

A demand-centered scheduling framework for shared supercomputing resources: modeling, metrics, and case insights

Explore related subjects

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now