Skip to main content

Advertisement

Log in

Energy-aware scheduling for spark job based on deep reinforcement learning in cloud

  • Regular Paper
  • Published:
Computing Aims and scope Submit manuscript

Abstract

Big data frameworks such as Storm, Spark and Hadoop are widely deployed in commercial and research applications, the energy consumption of cloud data centers that support big data processing platforms is becoming more and more prominent. However, job scheduling is a complex problem in the presence of various service level agreement (SLA) goals, such as cost reduction and job performance improvement. The highly heterogeneous nature of clusters and the variability of resource requirements acrossworkloads make energy-efficient scheduling on big data platforms extremely complex under SLA constraints. Existing performance-based models and heuristic scheduling methods rely excessively on historical data and are difficult to optimize or modify for changes in load and clusters. In this paper, we construct an energy consumption model based on resource utilization and a reinforcement learning model for energy-efficient scheduling under SLA constraints for Spark clusters, and design two Deep Reinforcement Learning (DRL) algorithms. The cluster scheduler designed and implemented based on this model can automatically capture different load characteristics and inherent cluster characteristics, find the appropriate executor creation policy for resource allocation, and reduce cluster energy consumption under the constraint of job execution time. Experimental results show that the DRL scheduler proposed in this paper saves a maximum energy of about 33% under different load characteristics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Availability of data and materials

The datasets generated during the current study are available from the corresponding author on reasonable request.

References

  1. Borthakur D (2007) The Hadoop distributed file system: architecture and design. Hadoop Project Website 11(2007):21

    Google Scholar 

  2. Zaharia M, Xin RS, Wendell P, Das T, Armbrust M, Dave A, Meng X, Rosen J, Venkataraman S, Franklin MJ et al (2016) Apache spark: a unified engine for big data processing. Commun ACM 59(11):56–65

    Article  Google Scholar 

  3. Karau H, Konwinski A, Wendell P, Zaharia M (2015) Learning spark: lightning-fast big data analysis. O’Reilly Media, Inc.

  4. Zhang Q, Cheng L, Boutaba R (2010) Cloud computing: state-of-the-art and research challenges. J Internet Serv Appl 1(1):7–18

    Article  Google Scholar 

  5. Elsedimy E, Algarni F (2021) Toward enhancing the energy efficiency and minimizing the sla violations in cloud data centers. Appl Comput Intell Soft Comput 2021

  6. Verma A, Pedrosa L, Korupolu M, Oppenheimer D, Tune E, Wilkes J (2015) Large-scale cluster management at google with borg. In: Proceedings of the tenth European conference on computer systems, pp 1–17

  7. Ferguson AD, Bodik P, Kandula S, Boutin E, Fonseca R (2012) Jockey: guaranteed job latency in data parallel clusters. In: Proceedings of the 7th ACM European conference on computer systems, pp 99–112

  8. Luo L, Wu W-J, Zhang F (2014) Energy modeling based on cloud data center. J Softw 25(7):1371–1387

    Google Scholar 

  9. Dayarathna M, Wen Y, Fan R (2015) Data center energy consumption modeling: a survey. IEEE Commun Surv Tutor 18(1):732–794

    Article  Google Scholar 

  10. Li H, Wang H, Fang S, Zou Y, Tian W (2020) An energy-aware scheduling algorithm for big data applications in spark. Clust Comput 23(2):593–609

    Article  Google Scholar 

  11. Bhuiyan MFH, Wang C (2014) Capability-aware energy-efficient virtual machine scheduling in heterogeneous datacenters. In: 2014 IEEE international conference on systems, man, and cybernetics (SMC). IEEE, pp 106–111

  12. Charan B, Goutham K, Mampilli RJ, Kempaiah BU, Phalachandra H (2021) Energy efficient vm scheduling in reservation supported cloud data centers under availability constraints. In: 2021 International conference on intelligent technologies (CONIT). IEEE, pp 1–8

  13. Zhao H, Li S, Wang Q, Wang J (2021) Work in progress: power-aware scheduling strategy for multiple dags in the heterogeneous cloud. In: 2021 IEEE 27th real-time and embedded technology and applications symposium (RTAS). IEEE. pp 509–512

  14. Yuan H, Bi J, Zhou M, Liu Q, Ammari AC (2020) Biobjective task scheduling for distributed green data centers. IEEE Trans Autom Sci Eng 18(2):731–742

    Article  Google Scholar 

  15. Jyothi SA, Curino C, Menache I, Narayanamurthy SM, Tumanov A, Yaniv J, Mavlyutov R, Goiri I, Krishnan S, Kulkarni J, et al (2016) Morpheus: towards automated \(\{\)SLOs\(\}\) for enterprise clusters. In: 12th USENIX symposium on operating systems design and implementation (OSDI 16), pp 117–134

  16. Gu H, Li X, Lu Z (2020) Scheduling spark tasks with data skew and deadline constraints. IEEE Access 9:2793–2804

    Article  Google Scholar 

  17. Sidhanta S, Golab W, Mukhopadhyay S (2016) Optex: a deadline-aware cost optimization model for spark. In: 2016 16th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGrid). IEEE, pp 193–202

  18. Hu Z, Li D, Zhang Y, Guo D, Li Z (2019) Branch scheduling: dag-aware scheduling for speeding up data-parallel jobs. In: Proceedings of the international symposium on quality of service, pp 1–10

  19. Hu Z, Li D (2021) Improved heuristic job scheduling method to enhance throughput for big data analytics. Tsinghua Sci Technol 27(2):344–357

    Article  Google Scholar 

  20. Li C, Cai Q, Luo Y (2022) Dynamic data replacement and adaptive scheduling policies in spark. Clust Comput 25(2):1421–1439

    Article  Google Scholar 

  21. Liu L, Xu H (2021) Elasecutor: elastic executor scheduling in data analytics systems. IEEE/ACM Trans Netw 29(2):681–694

    Article  MathSciNet  Google Scholar 

  22. Islam MT, Wu H, Karunasekera S, Buyya R (2021) Sla-based scheduling of spark jobs in hybrid cloud computing environments. IEEE Trans Comput 71(5):1117–1132

    Article  MATH  Google Scholar 

  23. Mao H, Alizadeh M, Menache I, Kandula S (2016) Resource management with deep reinforcement learning. In: Proceedings of the 15th ACM workshop on hot topics in networks, pp 50–56

  24. Mao H, Schwarzkopf M, Venkatakrishnan SB, Meng Z, Alizadeh M (2019) Learning scheduling algorithms for data processing clusters. In: Proceedings of the ACM special interest group on data communication, pp 270–288

  25. Che Y, Lin F, Liu J (2021) Deep reinforcement learning in m2m communication for resource scheduling. In: 2021 world conference on computing and communication technologies (WCCCT). IEEE, pp 97–100

  26. Liu N, Li Z, Xu J, Xu Z, Lin S, Qiu Q, Tang J, Wang Y (2017) A hierarchical framework of cloud resource allocation and power management using deep reinforcement learning. In: 2017 IEEE 37th international conference on distributed computing systems (ICDCS). IEEE, pp 372–382

  27. Grinsztajn N, Beaumont O, Jeannot E, Preux P (2021) Readys: a reinforcement learning based strategy for heterogeneous dynamic scheduling. In: 2021 IEEE international conference on cluster computing (CLUSTER). IEEE, pp 70–81

  28. Ma Y, Yang L, Hu F (2021) Research on a cloud resource scheduling strategy based on asynchronous reinforcement learning. In: 2021 IEEE international conference on power electronics, computer applications (ICPECA). IEEE, pp 920–923

  29. Li T, Xu Z, Tang J, Wang Y (2018) Model-free control for distributed stream data processing using deep reinforcement learning. Preprint arXiv:1803.01016

  30. Rjoub G, Bentahar J, Wahab OA, Bataineh A (2019) Deep smart scheduling: a deep learning approach for automated big data scheduling over the cloud. In: 2019 7th international conference on future internet of things and cloud (FiCloud). IEEE, pp 189–196

  31. Liu K, Quan W, Gao D, Yu C, Liu M, Zhang Y (2021) Distributed asynchronous learning for multipath data transmission based on p-ddqn. China Commun 18(8):62–74

    Article  Google Scholar 

  32. Islam MT, Karunasekera S, Buyya R (2021) Performance and cost-efficient spark job scheduling based on deep reinforcement learning in cloud computing environments. IEEE Trans Parallel Distrib Syst 33(7):1695–1710

    Article  Google Scholar 

  33. Yadwadkar NJ, Hariharan B, Gonzalez JE, Smith B, Katz RH (2017) Selecting the best vm across multiple public clouds: a data-driven performance modeling approach. In: Proceedings of the 2017 symposium on cloud computing, pp 452–465

  34. Huang S, Huang J, Dai J, Xie T, Huang B (2010) The hibench benchmark suite: characterization of the mapreduce-based data analysis. In: 2010 IEEE 26th international conference on data engineering workshops (ICDEW 2010). IEEE, pp 41–51

Download references

Acknowledgements

This work was supported by Chongqing science and Technology Commission Project (Grant No:cstc2018jcyjAX0525), Key Research and Development Projects of Sichuan Science and Technology Department (Grant No: 2019YFG0107).

Funding

This work was supported by Chongqing science and Technology Commission Project (Grant No: cstc2018jcyjAX0525; Recipient: Hongjian Li), Key Research and Development Projects of Sichuan Science and Technology Department (Grant No: 2019YFG0107;Recipient: Hongjian Li).

Author information

Authors and Affiliations

Authors

Contributions

HL: Proposed an idea, Experiment, Wrote the manuscript. LL: Proposed an idea, Experiment, Wrote the manuscript. WS: Proposed an idea, Experiment, Wrote the manuscript. GT: Helped to wrote also several sections of the manuscript, Proofreading. HL: Helped to wrote also several sections of the manuscript, Proofreading.

Corresponding author

Correspondence to Hongjian Li.

Ethics declarations

Conflict of interest

The authors declare that they have no known conflict financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, H., Lu, L., Shi, W. et al. Energy-aware scheduling for spark job based on deep reinforcement learning in cloud. Computing 105, 1717–1743 (2023). https://doi.org/10.1007/s00607-023-01171-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00607-023-01171-z

Keywords

Mathematics Subject Classification

Navigation