Energy-aware scheduling for spark job based on deep reinforcement learning in cloud

Li, Hongjian; Lu, Liang; Shi, Wenhu; Tan, Gangfan; Luo, Hao

doi:10.1007/s00607-023-01171-z

Energy-aware scheduling for spark job based on deep reinforcement learning in cloud

Regular Paper
Published: 08 March 2023

Volume 105, pages 1717–1743, (2023)
Cite this article

Computing Aims and scope Submit manuscript

Hongjian Li ORCID: orcid.org/0000-0002-8558-0838¹,
Liang Lu¹,
Wenhu Shi¹,
Gangfan Tan¹ &
…
Hao Luo¹

519 Accesses
5 Citations
Explore all metrics

Abstract

Big data frameworks such as Storm, Spark and Hadoop are widely deployed in commercial and research applications, the energy consumption of cloud data centers that support big data processing platforms is becoming more and more prominent. However, job scheduling is a complex problem in the presence of various service level agreement (SLA) goals, such as cost reduction and job performance improvement. The highly heterogeneous nature of clusters and the variability of resource requirements acrossworkloads make energy-efficient scheduling on big data platforms extremely complex under SLA constraints. Existing performance-based models and heuristic scheduling methods rely excessively on historical data and are difficult to optimize or modify for changes in load and clusters. In this paper, we construct an energy consumption model based on resource utilization and a reinforcement learning model for energy-efficient scheduling under SLA constraints for Spark clusters, and design two Deep Reinforcement Learning (DRL) algorithms. The cluster scheduler designed and implemented based on this model can automatically capture different load characteristics and inherent cluster characteristics, find the appropriate executor creation policy for resource allocation, and reduce cluster energy consumption under the constraint of job execution time. Experimental results show that the DRL scheduler proposed in this paper saves a maximum energy of about 33% under different load characteristics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

DRL-based and Bsld-Aware Job Scheduling for Apache Spark Cluster in Hybrid Cloud Computing Environments

Article 09 December 2022

Intelligent energy pairing scheduler (InEPS) for heterogeneous HPC clusters

Article Open access 24 January 2025

Optimizing HPC scheduling: a hierarchical reinforcement learning approach for intelligent job selection and allocation

Article Open access 29 May 2025

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

Availability of data and materials

The datasets generated during the current study are available from the corresponding author on reasonable request.

References

Borthakur D (2007) The Hadoop distributed file system: architecture and design. Hadoop Project Website 11(2007):21
Google Scholar
Zaharia M, Xin RS, Wendell P, Das T, Armbrust M, Dave A, Meng X, Rosen J, Venkataraman S, Franklin MJ et al (2016) Apache spark: a unified engine for big data processing. Commun ACM 59(11):56–65
Article Google Scholar
Karau H, Konwinski A, Wendell P, Zaharia M (2015) Learning spark: lightning-fast big data analysis. O’Reilly Media, Inc.
Zhang Q, Cheng L, Boutaba R (2010) Cloud computing: state-of-the-art and research challenges. J Internet Serv Appl 1(1):7–18
Article Google Scholar
Elsedimy E, Algarni F (2021) Toward enhancing the energy efficiency and minimizing the sla violations in cloud data centers. Appl Comput Intell Soft Comput 2021
Verma A, Pedrosa L, Korupolu M, Oppenheimer D, Tune E, Wilkes J (2015) Large-scale cluster management at google with borg. In: Proceedings of the tenth European conference on computer systems, pp 1–17
Ferguson AD, Bodik P, Kandula S, Boutin E, Fonseca R (2012) Jockey: guaranteed job latency in data parallel clusters. In: Proceedings of the 7th ACM European conference on computer systems, pp 99–112
Luo L, Wu W-J, Zhang F (2014) Energy modeling based on cloud data center. J Softw 25(7):1371–1387
Google Scholar
Dayarathna M, Wen Y, Fan R (2015) Data center energy consumption modeling: a survey. IEEE Commun Surv Tutor 18(1):732–794
Article Google Scholar
Li H, Wang H, Fang S, Zou Y, Tian W (2020) An energy-aware scheduling algorithm for big data applications in spark. Clust Comput 23(2):593–609
Article Google Scholar
Bhuiyan MFH, Wang C (2014) Capability-aware energy-efficient virtual machine scheduling in heterogeneous datacenters. In: 2014 IEEE international conference on systems, man, and cybernetics (SMC). IEEE, pp 106–111
Charan B, Goutham K, Mampilli RJ, Kempaiah BU, Phalachandra H (2021) Energy efficient vm scheduling in reservation supported cloud data centers under availability constraints. In: 2021 International conference on intelligent technologies (CONIT). IEEE, pp 1–8
Zhao H, Li S, Wang Q, Wang J (2021) Work in progress: power-aware scheduling strategy for multiple dags in the heterogeneous cloud. In: 2021 IEEE 27th real-time and embedded technology and applications symposium (RTAS). IEEE. pp 509–512
Yuan H, Bi J, Zhou M, Liu Q, Ammari AC (2020) Biobjective task scheduling for distributed green data centers. IEEE Trans Autom Sci Eng 18(2):731–742
Article Google Scholar
Jyothi SA, Curino C, Menache I, Narayanamurthy SM, Tumanov A, Yaniv J, Mavlyutov R, Goiri I, Krishnan S, Kulkarni J, et al (2016) Morpheus: towards automated $\{$SLOs$\}$ for enterprise clusters. In: 12th USENIX symposium on operating systems design and implementation (OSDI 16), pp 117–134
Gu H, Li X, Lu Z (2020) Scheduling spark tasks with data skew and deadline constraints. IEEE Access 9:2793–2804
Article Google Scholar
Sidhanta S, Golab W, Mukhopadhyay S (2016) Optex: a deadline-aware cost optimization model for spark. In: 2016 16th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGrid). IEEE, pp 193–202
Hu Z, Li D, Zhang Y, Guo D, Li Z (2019) Branch scheduling: dag-aware scheduling for speeding up data-parallel jobs. In: Proceedings of the international symposium on quality of service, pp 1–10
Hu Z, Li D (2021) Improved heuristic job scheduling method to enhance throughput for big data analytics. Tsinghua Sci Technol 27(2):344–357
Article Google Scholar
Li C, Cai Q, Luo Y (2022) Dynamic data replacement and adaptive scheduling policies in spark. Clust Comput 25(2):1421–1439
Article Google Scholar
Liu L, Xu H (2021) Elasecutor: elastic executor scheduling in data analytics systems. IEEE/ACM Trans Netw 29(2):681–694
Article MathSciNet Google Scholar
Islam MT, Wu H, Karunasekera S, Buyya R (2021) Sla-based scheduling of spark jobs in hybrid cloud computing environments. IEEE Trans Comput 71(5):1117–1132
Article MATH Google Scholar
Mao H, Alizadeh M, Menache I, Kandula S (2016) Resource management with deep reinforcement learning. In: Proceedings of the 15th ACM workshop on hot topics in networks, pp 50–56
Mao H, Schwarzkopf M, Venkatakrishnan SB, Meng Z, Alizadeh M (2019) Learning scheduling algorithms for data processing clusters. In: Proceedings of the ACM special interest group on data communication, pp 270–288
Che Y, Lin F, Liu J (2021) Deep reinforcement learning in m2m communication for resource scheduling. In: 2021 world conference on computing and communication technologies (WCCCT). IEEE, pp 97–100
Liu N, Li Z, Xu J, Xu Z, Lin S, Qiu Q, Tang J, Wang Y (2017) A hierarchical framework of cloud resource allocation and power management using deep reinforcement learning. In: 2017 IEEE 37th international conference on distributed computing systems (ICDCS). IEEE, pp 372–382
Grinsztajn N, Beaumont O, Jeannot E, Preux P (2021) Readys: a reinforcement learning based strategy for heterogeneous dynamic scheduling. In: 2021 IEEE international conference on cluster computing (CLUSTER). IEEE, pp 70–81
Ma Y, Yang L, Hu F (2021) Research on a cloud resource scheduling strategy based on asynchronous reinforcement learning. In: 2021 IEEE international conference on power electronics, computer applications (ICPECA). IEEE, pp 920–923
Li T, Xu Z, Tang J, Wang Y (2018) Model-free control for distributed stream data processing using deep reinforcement learning. Preprint arXiv:1803.01016
Rjoub G, Bentahar J, Wahab OA, Bataineh A (2019) Deep smart scheduling: a deep learning approach for automated big data scheduling over the cloud. In: 2019 7th international conference on future internet of things and cloud (FiCloud). IEEE, pp 189–196
Liu K, Quan W, Gao D, Yu C, Liu M, Zhang Y (2021) Distributed asynchronous learning for multipath data transmission based on p-ddqn. China Commun 18(8):62–74
Article Google Scholar
Islam MT, Karunasekera S, Buyya R (2021) Performance and cost-efficient spark job scheduling based on deep reinforcement learning in cloud computing environments. IEEE Trans Parallel Distrib Syst 33(7):1695–1710
Article Google Scholar
Yadwadkar NJ, Hariharan B, Gonzalez JE, Smith B, Katz RH (2017) Selecting the best vm across multiple public clouds: a data-driven performance modeling approach. In: Proceedings of the 2017 symposium on cloud computing, pp 452–465
Huang S, Huang J, Dai J, Xie T, Huang B (2010) The hibench benchmark suite: characterization of the mapreduce-based data analysis. In: 2010 IEEE 26th international conference on data engineering workshops (ICDEW 2010). IEEE, pp 41–51

Download references

Acknowledgements

This work was supported by Chongqing science and Technology Commission Project (Grant No:cstc2018jcyjAX0525), Key Research and Development Projects of Sichuan Science and Technology Department (Grant No: 2019YFG0107).

Funding

This work was supported by Chongqing science and Technology Commission Project (Grant No: cstc2018jcyjAX0525; Recipient: Hongjian Li), Key Research and Development Projects of Sichuan Science and Technology Department (Grant No: 2019YFG0107;Recipient: Hongjian Li).

Author information

Authors and Affiliations

Department of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
Hongjian Li, Liang Lu, Wenhu Shi, Gangfan Tan & Hao Luo

Authors

Hongjian Li
View author publications
You can also search for this author inPubMed Google Scholar
Liang Lu
View author publications
You can also search for this author inPubMed Google Scholar
Wenhu Shi
View author publications
You can also search for this author inPubMed Google Scholar
Gangfan Tan
View author publications
You can also search for this author inPubMed Google Scholar
Hao Luo
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

HL: Proposed an idea, Experiment, Wrote the manuscript. LL: Proposed an idea, Experiment, Wrote the manuscript. WS: Proposed an idea, Experiment, Wrote the manuscript. GT: Helped to wrote also several sections of the manuscript, Proofreading. HL: Helped to wrote also several sections of the manuscript, Proofreading.

Corresponding author

Correspondence to Hongjian Li.

Ethics declarations

Conflict of interest

The authors declare that they have no known conflict financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, H., Lu, L., Shi, W. et al. Energy-aware scheduling for spark job based on deep reinforcement learning in cloud. Computing 105, 1717–1743 (2023). https://doi.org/10.1007/s00607-023-01171-z

Download citation

Received: 07 September 2022
Accepted: 25 February 2023
Published: 08 March 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s00607-023-01171-z

Keywords

Mathematics Subject Classification

68T07

Part of a collection:

Computer Science SDG 7: Affordable and Clean Energy

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Energy-aware scheduling for spark job based on deep reinforcement learning in cloud

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

DRL-based and Bsld-Aware Job Scheduling for Apache Spark Cluster in Hybrid Cloud Computing Environments

Intelligent energy pairing scheduler (InEPS) for heterogeneous HPC clusters

Optimizing HPC scheduling: a hierarchical reinforcement learning approach for intelligent job selection and allocation

Explore related subjects

Availability of data and materials

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Subscribe and save

Buy Now