Abstract
Big data frameworks such as Storm, Spark and Hadoop are widely deployed in commercial and research applications, the energy consumption of cloud data centers that support big data processing platforms is becoming more and more prominent. However, job scheduling is a complex problem in the presence of various service level agreement (SLA) goals, such as cost reduction and job performance improvement. The highly heterogeneous nature of clusters and the variability of resource requirements acrossworkloads make energy-efficient scheduling on big data platforms extremely complex under SLA constraints. Existing performance-based models and heuristic scheduling methods rely excessively on historical data and are difficult to optimize or modify for changes in load and clusters. In this paper, we construct an energy consumption model based on resource utilization and a reinforcement learning model for energy-efficient scheduling under SLA constraints for Spark clusters, and design two Deep Reinforcement Learning (DRL) algorithms. The cluster scheduler designed and implemented based on this model can automatically capture different load characteristics and inherent cluster characteristics, find the appropriate executor creation policy for resource allocation, and reduce cluster energy consumption under the constraint of job execution time. Experimental results show that the DRL scheduler proposed in this paper saves a maximum energy of about 33% under different load characteristics.
Similar content being viewed by others
Availability of data and materials
The datasets generated during the current study are available from the corresponding author on reasonable request.
References
Borthakur D (2007) The Hadoop distributed file system: architecture and design. Hadoop Project Website 11(2007):21
Zaharia M, Xin RS, Wendell P, Das T, Armbrust M, Dave A, Meng X, Rosen J, Venkataraman S, Franklin MJ et al (2016) Apache spark: a unified engine for big data processing. Commun ACM 59(11):56–65
Karau H, Konwinski A, Wendell P, Zaharia M (2015) Learning spark: lightning-fast big data analysis. O’Reilly Media, Inc.
Zhang Q, Cheng L, Boutaba R (2010) Cloud computing: state-of-the-art and research challenges. J Internet Serv Appl 1(1):7–18
Elsedimy E, Algarni F (2021) Toward enhancing the energy efficiency and minimizing the sla violations in cloud data centers. Appl Comput Intell Soft Comput 2021
Verma A, Pedrosa L, Korupolu M, Oppenheimer D, Tune E, Wilkes J (2015) Large-scale cluster management at google with borg. In: Proceedings of the tenth European conference on computer systems, pp 1–17
Ferguson AD, Bodik P, Kandula S, Boutin E, Fonseca R (2012) Jockey: guaranteed job latency in data parallel clusters. In: Proceedings of the 7th ACM European conference on computer systems, pp 99–112
Luo L, Wu W-J, Zhang F (2014) Energy modeling based on cloud data center. J Softw 25(7):1371–1387
Dayarathna M, Wen Y, Fan R (2015) Data center energy consumption modeling: a survey. IEEE Commun Surv Tutor 18(1):732–794
Li H, Wang H, Fang S, Zou Y, Tian W (2020) An energy-aware scheduling algorithm for big data applications in spark. Clust Comput 23(2):593–609
Bhuiyan MFH, Wang C (2014) Capability-aware energy-efficient virtual machine scheduling in heterogeneous datacenters. In: 2014 IEEE international conference on systems, man, and cybernetics (SMC). IEEE, pp 106–111
Charan B, Goutham K, Mampilli RJ, Kempaiah BU, Phalachandra H (2021) Energy efficient vm scheduling in reservation supported cloud data centers under availability constraints. In: 2021 International conference on intelligent technologies (CONIT). IEEE, pp 1–8
Zhao H, Li S, Wang Q, Wang J (2021) Work in progress: power-aware scheduling strategy for multiple dags in the heterogeneous cloud. In: 2021 IEEE 27th real-time and embedded technology and applications symposium (RTAS). IEEE. pp 509–512
Yuan H, Bi J, Zhou M, Liu Q, Ammari AC (2020) Biobjective task scheduling for distributed green data centers. IEEE Trans Autom Sci Eng 18(2):731–742
Jyothi SA, Curino C, Menache I, Narayanamurthy SM, Tumanov A, Yaniv J, Mavlyutov R, Goiri I, Krishnan S, Kulkarni J, et al (2016) Morpheus: towards automated \(\{\)SLOs\(\}\) for enterprise clusters. In: 12th USENIX symposium on operating systems design and implementation (OSDI 16), pp 117–134
Gu H, Li X, Lu Z (2020) Scheduling spark tasks with data skew and deadline constraints. IEEE Access 9:2793–2804
Sidhanta S, Golab W, Mukhopadhyay S (2016) Optex: a deadline-aware cost optimization model for spark. In: 2016 16th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGrid). IEEE, pp 193–202
Hu Z, Li D, Zhang Y, Guo D, Li Z (2019) Branch scheduling: dag-aware scheduling for speeding up data-parallel jobs. In: Proceedings of the international symposium on quality of service, pp 1–10
Hu Z, Li D (2021) Improved heuristic job scheduling method to enhance throughput for big data analytics. Tsinghua Sci Technol 27(2):344–357
Li C, Cai Q, Luo Y (2022) Dynamic data replacement and adaptive scheduling policies in spark. Clust Comput 25(2):1421–1439
Liu L, Xu H (2021) Elasecutor: elastic executor scheduling in data analytics systems. IEEE/ACM Trans Netw 29(2):681–694
Islam MT, Wu H, Karunasekera S, Buyya R (2021) Sla-based scheduling of spark jobs in hybrid cloud computing environments. IEEE Trans Comput 71(5):1117–1132
Mao H, Alizadeh M, Menache I, Kandula S (2016) Resource management with deep reinforcement learning. In: Proceedings of the 15th ACM workshop on hot topics in networks, pp 50–56
Mao H, Schwarzkopf M, Venkatakrishnan SB, Meng Z, Alizadeh M (2019) Learning scheduling algorithms for data processing clusters. In: Proceedings of the ACM special interest group on data communication, pp 270–288
Che Y, Lin F, Liu J (2021) Deep reinforcement learning in m2m communication for resource scheduling. In: 2021 world conference on computing and communication technologies (WCCCT). IEEE, pp 97–100
Liu N, Li Z, Xu J, Xu Z, Lin S, Qiu Q, Tang J, Wang Y (2017) A hierarchical framework of cloud resource allocation and power management using deep reinforcement learning. In: 2017 IEEE 37th international conference on distributed computing systems (ICDCS). IEEE, pp 372–382
Grinsztajn N, Beaumont O, Jeannot E, Preux P (2021) Readys: a reinforcement learning based strategy for heterogeneous dynamic scheduling. In: 2021 IEEE international conference on cluster computing (CLUSTER). IEEE, pp 70–81
Ma Y, Yang L, Hu F (2021) Research on a cloud resource scheduling strategy based on asynchronous reinforcement learning. In: 2021 IEEE international conference on power electronics, computer applications (ICPECA). IEEE, pp 920–923
Li T, Xu Z, Tang J, Wang Y (2018) Model-free control for distributed stream data processing using deep reinforcement learning. Preprint arXiv:1803.01016
Rjoub G, Bentahar J, Wahab OA, Bataineh A (2019) Deep smart scheduling: a deep learning approach for automated big data scheduling over the cloud. In: 2019 7th international conference on future internet of things and cloud (FiCloud). IEEE, pp 189–196
Liu K, Quan W, Gao D, Yu C, Liu M, Zhang Y (2021) Distributed asynchronous learning for multipath data transmission based on p-ddqn. China Commun 18(8):62–74
Islam MT, Karunasekera S, Buyya R (2021) Performance and cost-efficient spark job scheduling based on deep reinforcement learning in cloud computing environments. IEEE Trans Parallel Distrib Syst 33(7):1695–1710
Yadwadkar NJ, Hariharan B, Gonzalez JE, Smith B, Katz RH (2017) Selecting the best vm across multiple public clouds: a data-driven performance modeling approach. In: Proceedings of the 2017 symposium on cloud computing, pp 452–465
Huang S, Huang J, Dai J, Xie T, Huang B (2010) The hibench benchmark suite: characterization of the mapreduce-based data analysis. In: 2010 IEEE 26th international conference on data engineering workshops (ICDEW 2010). IEEE, pp 41–51
Acknowledgements
This work was supported by Chongqing science and Technology Commission Project (Grant No:cstc2018jcyjAX0525), Key Research and Development Projects of Sichuan Science and Technology Department (Grant No: 2019YFG0107).
Funding
This work was supported by Chongqing science and Technology Commission Project (Grant No: cstc2018jcyjAX0525; Recipient: Hongjian Li), Key Research and Development Projects of Sichuan Science and Technology Department (Grant No: 2019YFG0107;Recipient: Hongjian Li).
Author information
Authors and Affiliations
Contributions
HL: Proposed an idea, Experiment, Wrote the manuscript. LL: Proposed an idea, Experiment, Wrote the manuscript. WS: Proposed an idea, Experiment, Wrote the manuscript. GT: Helped to wrote also several sections of the manuscript, Proofreading. HL: Helped to wrote also several sections of the manuscript, Proofreading.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known conflict financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, H., Lu, L., Shi, W. et al. Energy-aware scheduling for spark job based on deep reinforcement learning in cloud. Computing 105, 1717–1743 (2023). https://doi.org/10.1007/s00607-023-01171-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-023-01171-z