Skip to main content

Advertisement

Log in

A YARN-based Energy-Aware Scheduling Method for Big Data Applications under Deadline Constraints

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

Hadoop is a distributed framework for processing big data. One of the critical parts of Hadoop is YARN, which carries out scheduling and resource management. A scheduling algorithm should consider multiple objectives. However, YARN schedulers do not consider the Service Level Agreement (SLA) and the energy-related issues. The present paper proposes an energy-efficient deadline-aware model for the scheduling problem. The scheduling issue is an NP-hard problem regarding the deadline of applications and reducing energy. Hence, an Energy-efficient Deadline-aware Scheduling Algorithm based on the Moth-Flame Optimization algorithm (EDSA-MFO) is suggested to minimize the energy consumption and execute the application within a given soft deadline. Moreover, the earliest deadline first-based (EDF-based) heuristic approach is proposed to decode a moth into a scheduling solution. The algorithm is implemented for both static and dynamic scheduling. To evaluate the performance of the proposed algorithm, extensive simulations are conducted. The outcomes demonstrated that the suggested method could find near-optimal scheduling. It outperforms the YARN default FIFO scheduler, EDF, the energy-aware greedy algorithm (EAGA), and the deadline-aware energy-efficient MapReduce scheduling algorithm for YARN (EMRSAY) in total cluster energy consumption and meeting job deadline.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Data Availibility

The datasets generated during and/or analyzed during the current study are available from the first author on reasonable request.

References

  1. Sidhanta, S., Golab, W., Mukhopadhyay, S.: Deadline-aware cost optimization for spark. IEEE Transactions on Big Data (2019)

  2. Artail, H., et al.: Speedy cloud: Cloud computing with support for hardware acceleration services. IEEE Transactions on Cloud Computing (2017). https://doi.org/10.1109/TCC.2017.2665493

  3. Banerjee, S., Roy, S., Khatua, S.: Sla-aware stochastic load balancing in dynamic cloud environment. Journal of Grid Computing 19(4), 1–24 (2021)

    Article  Google Scholar 

  4. Pashazadeh, A., Navimipour, N.J.: Big data handling mechanisms in the healthcare applications: A comprehensive and systematic literature review. Journal of biomedical informatics 82, 47–62 (2018)

    Article  Google Scholar 

  5. Irandoost, M.A., Rahmani, A.M., Setayeshi, S.: Mapreduce data skewness handling: a systematic literature review. International Journal of Parallel Programming 47(5–6), 907–950 (2019)

    Article  Google Scholar 

  6. Khezr, S.N., Navimipour, N.J.: Mapreduce and its applications, challenges, and architecture: a comprehensive review and directions for future research. Journal of Grid Computing 15(3), 295–321 (2017)

    Article  Google Scholar 

  7. Shabestari, F., Rahmani, A.M., Navimipour, N.J., Jabbehdari, S.: A taxonomy of software-based and hardware-based approaches for energy efficiency management in the hadoop. Journal of Network and Computer Applications 126, 162–177 (2019). https://doi.org/10.1016/j.jnca.2018.11.007

    Article  Google Scholar 

  8. Mohamed, A., Najafabadi, M. K., Wah, Y. B., Zaman, E. A. K., Maskat, R.: The state of the art and taxonomy of big data analytics: view from new big data framework. Artificial Intelligence Review 1–49 (2019)

  9. Ghazali, R., Adabi, S., Down, D. G., Movaghar, A.: A classification of hadoop job schedulers based on performance optimization approaches. Cluster Computing 1–23 (2021)

  10. Cai, X., Li, F., Li, P., Ju, L., Jia, Z.: Sla-aware energy-efficient scheduling scheme for hadoop yarn. The Journal of Supercomputing 73(8), 3526–3546 (2017)

    Article  Google Scholar 

  11. Sharma, A., Singh, G.: A review of scheduling algorithms in hadoop. Proceedings of ICRIC 2019, 125–135 (2020)

    Google Scholar 

  12. Azad, P., Navimipour, N.J.: An energy-aware task scheduling in the cloud computing using a hybrid cultural and ant colony optimization algorithm. International Journal of Cloud Applications and Computing (IJCAC). 7(4), 20–40 (2017)

    Article  Google Scholar 

  13. Hussain, M., et al.: Deadline-constrained energy-aware workflow scheduling in geographically distributed cloud data centers. Future Generation Computer Systems (2022)

  14. Cheng, D., Zhou, X., Xu, Y., Liu, L., Jiang, C.: Deadline-aware mapreduce job scheduling with dynamic resource availability. IEEE Transactions on Parallel and Distributed Systems (2018)

  15. Arshed, A., Habib, M. A., Ahmad, M.: Temporal performance evaluation of hadoop variants for diabetes big data, 223–229 IEEE, (2022)

  16. Yao, Y., Gao, H., Wang, J., Sheng, B.,Mi, N.:New scheduling algorithms for improving performance and resource utilization in hadoop yarn clusters. IEEE Transactions on Cloud Computing (2019)

  17. Savsani, V., Tawhid, M. A. Non-dominated sorting moth flame optimization (ns-mfo) for multi-objective problems. Engineering Applications of Artificial Intelligence 63,20–32 (2017). https://doi.org/10.1016/j.engappai.2017.04.018

  18. Mirjalili, S.: Moth-flame optimization algorithm: A novel nature-inspired heuristic paradigm. Knowledge-Based Systems 89, 228–249 (2015)

    Article  Google Scholar 

  19. Shehab, M., et al.: Moth-flame optimization algorithm: variants and applications. Neural Computing and Applications 1–26 (2019)

  20. Usama, M., Liu, M., Chen, M.: ob schedulers for big data processing in hadoop environment: Testing real-life schedulers using benchmark programs. Digital Communications and Networks (2017)

  21. Senthilkumar, M., Ilango, P.: A survey on job scheduling in big data. Cybernetics and Information Technologies 16(3), 35–51 (2016)

    Article  Google Scholar 

  22. Mashayekhy, L., Nejad, M.M., Grosu, D., Zhang, Q., Shi, W.: Energy-aware scheduling of mapreduce jobs for big data applications. IEEE transactions on Parallel and distributed systems 26(10), 2720–2733 (2015)

    Article  Google Scholar 

  23. Shu, T., Wu, C. Q.: Energy-efficient mapping of large-scale workflows under deadline constraints in big data computing systems. Future Generation Computer Systems (2017). http://www.sciencedirect.com/science/article/pii/S0167739X17300468. https://doi.org/10.1016/j.future.2017.07.050

  24. Yousefi, M. H. N., Goudarzi, M.: A task-based greedy scheduling algorithm for minimizing energy of mapreduce jobs. Journal of Grid Computing 16(4), 535–551 (2018). https://doi.org/10.1007/s10723-018-9464-0

  25. Shao, Y., Li, C., Gu, J., Zhang, J., Luo, Y.: Efficient jobs scheduling approach for big data applications. Computers and Industrial Engineering 117, 249–261 (2018)

    Article  Google Scholar 

  26. Pandey, V., Saini, P.: A heuristic method towards deadline-aware energy-efficient mapreduce scheduling problem in hadoop yarn. Cluster Computing, 1–17 (2020)

  27. Pandey, V., Saini, P.: Constraint programming versus heuristic approach to mapreduce scheduling problem in hadoop yarn for energy minimization. The Journal of Supercomputing, 1–29 (2021)

  28. Kvasov, D. E., Mukhametzhanov, M. S.: Metaheuristic vs. deterministic global optimization algorithms: The univariate case. Applied Mathematics and Computation 318, 245–259 (2018)

  29. Kalra, M., Singh, S.: A review of metaheuristic scheduling techniques in cloud computing. Egyptian Informatics Journal 16(3),275–295 (2015). https://doi.org/10.1016/j.eij.2015.07.001

  30. Wang, X., Wang, Y., Cui, Y.: A new multi-objective bi-level programming model for energy and locality aware multi-job scheduling in cloud computing. Future Generation Computer Systems 36, 91–101 (2014)

    Article  Google Scholar 

  31. Cheng, D., Zhou, X., Lama, P., Ji, M., Jiang, C.: Energy efficiency aware task assignment with dvfs in heterogeneous hadoop clusters. IEEE Transactions on Parallel and Distributed Systems 29(1), 70–82 (2017)

    Article  Google Scholar 

  32. Guerrero, C., Lera, I., Juiz, C.: Migration-aware genetic optimization for mapreduce scheduling and replica placement in hadoop. Journal of Grid Computing, 1–20 (2018)

  33. Wang, J., Li, X., Ruiz, R., Yang, J., Chu, D.: Energy utilization task scheduling for mapreduce in heterogeneous clusters. IEEE Transactions on Services Computing (2020)

  34. Handaoui, M., Dartois, J.-E., Lemarchand, L., Boukhobza, J.: Salamander: a holistic scheduling of mapreduce jobs on ephemeral cloud resources, 320–329. IEEE, (2020)

  35. Cheng, D., Zhou, X., Lama, P., Ji, M., Jiang, C.: Energy efficiency aware task assignment with dvfs in heterogeneous hadoop clusters. IEEE Transactions on Parallel and Distributed Systems (2017)

  36. Maleki, N., Rahmani, A. M., Conti, M.: Spo: A secure and performance-aware optimization for mapreduce scheduling. Journal of Network and Computer Applications, 102944 (2020)

  37. Tang, S., Yu, C., Li, Y.: Fairness-efficiency scheduling for cloud computing with soft fairness guarantees. IEEE Transactions on Cloud Computing (2020)

  38. Alqudah, M.A., Ahmed, I., Ahmad, F., Naseem, S., Nisar, K.S.: Energy reduction through memory aware real-time scheduling on virtual machine in multi-cores server. IEEE Access 9, 55436–55447 (2021)

    Article  Google Scholar 

  39. Jiang, Y., Huang, Z., Tsang, D.H.: On power-peak-aware scheduling for large-scale shared clusters. IEEE Transactions on Big Data 6(2), 412–426 (2018)

    Article  Google Scholar 

  40. Priyanka, E. B., Thangavel, S., Meenakshipriya, B., Prabu, D. V., Sivakumar, N. S.: Big Data Technologies with Computational Model Computing Using Hadoop with Scheduling Challenges, 3–19 (2021). https://doi.org/10.1007/978-3-030-65661-4_1

  41. Varga, M., Petrescu-Nita, A., Pop, F.: Deadline scheduling algorithm for sustainable computing in hadoop environment. Computers & Security 76, 354–366 (2018). https://doi.org/10.1016/j.cose.2017.12.014

  42. Verma, A., Cherkasova, L., Campbell, R. H.: Aria: automatic resource inference and allocation for mapreduce environments, 235–244. ACM, (2011)

  43. Yousefipour, A., Rahmani, A. M., Jahanshahi, M.: Energy and cost-aware virtual machine consolidation in cloud computing. Software: Practice and Experience 48(10), 1758–1774 (2018)

  44. Han, J., Pei, J., Kamber, M.: Data mining: concepts and techniques. Elsevier, (2011)

  45. Gupta, A., Kaushal, R.: Towards detecting fake user accounts in facebook, 1–6. IEEE, (2017)

  46. Nghiem, P. P., Figueira, S. M.: Towards efficient resource provisioning in mapreduce. Journal of Parallel and Distributed Computing 95, 29–41 (2016). https://doi.org/10.1016/j.jpdc.2016.04.001

  47. Fischer, M. J., Su, X., Yin, Y.: Assigning tasks for efficiency in Hadoop. In: Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures, pp. 30–39. (2010)

  48. Tiwari, N., Sarkar, S., Bellur, U., Indrawan, M.: Classification framework of mapreduce scheduling algorithms. ACM Computing Surveys (CSUR) 47(3), 1–38 (2015)

    Article  Google Scholar 

  49. Chhabra, A., Huang, K.-C., Bacanin, N., Rashid, T. A.: Optimizing bag-of-tasks scheduling on cloud data centers using hybrid swarm-intelligence meta-heuristic. The Journal of Supercomputing 1–63 (2022)

  50. Veiga, J., Enes, J., Exp?sito, R. R., Tourino, J.: Bdev 3.0: Energy efficiency and microarchitectural characterization of big data processing frameworks. Future Generation Computer Systems 86, 565–581 (2018)

  51. Panda, P. R., Silpa, B., Shrivastava, A., Gummidipudi, K.: Power-efficient system design (Springer Science & Business Media) (2010)

  52. Khan, A. A., Zakarya, M., Khan, R.: Energy-aware dynamic resource management in elastic cloud datacenters. Simulation Modelling Practice and Theory (2018). https://doi.org/10.1016/j.simpat.2018.12.001

  53. Khan, A.A., Zakarya, M., Khan, R., Rahman, I.U., Khan, M.: An energy, performance efficient resource consolidation scheme for heterogeneous cloud datacenters. Journal of Network and Computer Applications 150, 102497 (2020)

    Article  Google Scholar 

  54. Sharma, S., Hsu, C.-H., Feng, W.-C.: Making a case for a green500 list. In: Proceedings 20th IEEE International Parallel & Distributed Processing Symposium, p. 8. IEEE (2006)

  55. Beloglazov, A., Buyya, R.: Optimal online deterministic algorithms and adaptive heuristics for energy and performance efficient dynamic consolidation of virtual machines in cloud data centers. Concurrency and Computation: Practice and Experience 24(13), 1397–1420 (2012)

    Article  Google Scholar 

  56. Thomopoulos, N. T.: Statistical distributions. Applications and Parameter Estimates. Cham, Switzerland: Springer International Publishing (2017)

  57. Verma, A., Cherkasova, L., Kumar, V. S., Campbell, R. H.: Deadline-based workload management for mapreduce environments: Pieces of the performance puzzle, 900–905 (IEEE) 2012

  58. Laporte, G., Toth, P.: A gap in scientific reporting. 4OR 20(1), 169–171 (2022)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nima Jafari Navimipour.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shabestari, F., Rahmani, A.M., Navimipour, N.J. et al. A YARN-based Energy-Aware Scheduling Method for Big Data Applications under Deadline Constraints. J Grid Computing 20, 38 (2022). https://doi.org/10.1007/s10723-022-09627-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10723-022-09627-w

Keywords

Navigation