Skip to main content

Advertisement

Log in

SLA-aware energy-efficient scheduling scheme for Hadoop YARN

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Apache Hadoop becomes ubiquitous for cloud computing which provides resources as services for multi-tenant applications. YARN (a.k.a. MapReduce 2.0) is one of the key features in the second-generation Hadoop, which provides resource management and scheduling for large-scale MapReduce environments. Two enormous challenges in the YARN scheduler are the abilities to automatically tailor and control resource allocations to different jobs for achieving their Service Level Agreements (SLAs), and minimize energy consumption of the overall cloud computing system. In this work, we propose an SLA-aware energy-efficient scheduling scheme which allocates appropriate amount of resources to MapReduce applications with YARN architecture. In our task scheduling policy, We consider the data locality information to save the MapReduce network traffic. Furthermore, the slack time between the actual execution time of completed tasks and expected completion time of the application is utilized to improve the energy-efficiency of the system. An online userspace governor-based dynamic voltage and frequency scaling (DVFS) scheme is designed in the YARN per-application ApplicationMaster to dynamically change the CPU frequency for upcoming tasks given the slack time from previous completed tasks. Experimental evaluation shows that our proposed scheme outperforms the existing MapReduce scheduling policies in terms of both resource ultization and energy-efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113

    Article  Google Scholar 

  2. Apache Hadoop. http://hadoop.apache.org/. Accessed 5 Feb 2016

  3. Vavilapalli VK et al (2013) Apache hadoop yarn: yet another resource negotiator. In: Proceedings of the 4th annual Symposium on Cloud Computing. ACM, p 5

  4. Van Heddeghem W et al (2014) Trends in worldwide ICT electricity consumption from 2007 to 2012. Comput Commun 50:64–76

    Article  Google Scholar 

  5. Capacity Scheduler. http://hadoop.apache.org/docs/current/hadoop-yarn/hadoopyarn-site/CapacityScheduler.html. Accessed 5 Feb 2016

  6. Fair Scheduler. http://hadoop.apache.org/docs/current/hadoop-yarn/hadoopyarn-site/FairScheduler.html. Accessed 5 Feb 2016

  7. Ibrahim S et al (2014) Towards efficient power management in MapReduce: investigation of CPU-frequencies scaling on power efficiency in Hadoop. In: Adaptive resource management and scheduling for cloud computing. Springer, pp 147–164

  8. Verma A, Cherkasova L, Campbell RH (2011) ARIA: automatic resource inference and allocation for mapreduce environments. In: Proceedings of the 8th ACM international conference on Autonomic computing. ACM, pp 235–244

  9. Calheiros RN, Ranjan R, Beloglazov A, De Rose CA, Buyya R (2011) Cloudsim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Softw Pract Exp 41(1):23–50

    Article  Google Scholar 

  10. Polo J, et al (2010) Performance-driven task co-scheduling for mapreduce environments. In: Network operations and management symposium (NOMS). IEEE, pp 373–380

  11. Ferguson AD, Bodik P, Kandula S, Boutin E, Fonseca R (2012) Jockey: guaranteed job latency in data parallel clusters. In: Proceedings of the 7th ACM european conference on Computer Systems. ACM, pp 99–112

  12. Yao Y, Wang J, Sheng B, Lin J, Mi N (2014) Haste: Hadoop yarn scheduling based on task-dependency and resource-demand. In: IEEE 7th International Conference on Cloud Computing (CLOUD). IEEE, pp 184–191

  13. Davis RI, Burns A (2011) A survey of hard real-time scheduling for multiprocessor systems. ACM Comput Surv (CSUR) 43(4):35

    Article  MATH  Google Scholar 

  14. Qiu M, Sha EH-M (2009) Cost minimization while satisfying hard/soft timing constraints for heterogeneous embedded systems. ACM Trans Des Autom Electron Syst (TODAES) 14(2):25

    Google Scholar 

  15. Li J, Ming Z, Qiu M, Quan G, Qin X, Chen T (2011) Resource allocation robustness in multi-core embedded systems with inaccurate information. J Syst Archit 57(9):840–849

    Article  Google Scholar 

  16. Krishna CM, Lee Y-H (2000) Voltage-clock-scaling adaptive scheduling techniques for low power in hard real-time systems. In: 19th Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, pp 156–156

  17. Kim W, Shin D, Yun H-S, Kim J, Min SL (2002) Performance comparison of dynamic voltage scaling algorithms for hard real-time systems. In: Proceedings of the Eighth IEEE Real-Time and Embedded Technology and Applications Symposium. IEEE, pp 219–228

  18. Ge R et al (2010) Powerpack: energy profiling and analysis of high-performance systems and applications. IEEE Trans Parallel Distrib Syst 21(5):658–671

    Article  Google Scholar 

  19. Wang L, Von Laszewski G, Dayal J, Wang F (2010) Towards energy aware scheduling for precedence constrained parallel tasks in a cluster with DVFS. In: IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGrid). IEEE, pp 368–377

  20. Wang Y, Liu H, Liu D, Qin Z, Shao Z, Sha EH-M (2011) Overhead-aware energy optimization for real-time streaming applications on multiprocessor system-on-chip. ACM Trans Des Autom Electron Syst (TODAES) 16(2):14

    Google Scholar 

  21. Wirtz T, Ge R (2011) Improving mapreduce energy efficiency for computation intensive workloads. In: 2011 International Green Computing Conference and Workshops (IGCC). IEEE, pp 1–8

  22. Ge R, Feng X, Feng W-C, Cameron KW (2007) Cpu miser: A performance-directed, run-time system for power-aware clusters. In: International Conference on Parallel Processing (ICPP). IEEE, pp 18–18

  23. Kim W, Gupta MS, Wei G-Y, Brooks D (2008) System level analysis of fast, per-core DVFS using on-chip switching regulators. In: IEEE 14th International Symposium on High Performance Computer Architecture. IEEE, pp 123–134

  24. Maheshwari N, Nanduri R, Varma V (2012) Dynamic energy efficient data placement and cluster reconfiguration algorithm for MapReduce framework. Future Gener Comput Syst 28(1):119–127

    Article  Google Scholar 

  25. Cardosa M, Singh A, Pucha H, Chandra A (2012) Exploiting spatio-temporal tradeoffs for energy-aware mapreduce in the cloud. IEEE Trans Comput 61(12):1737–1751

    Article  MathSciNet  MATH  Google Scholar 

  26. Beloglazov A, Buyya R (2012) Optimal online deterministic algorithms and adaptive heuristics for energy and performance efficient dynamic consolidation of virtual machines in cloud data centers. Concurr Comput Pract Exp 24(13):1397–1420

    Article  Google Scholar 

  27. Babu S (2010) Towards automatic optimization of MapReduce programs. In: Proceedings of the 1st ACM symposium on Cloud computing. ACM, pp 137–142

  28. Belalem G, Tayeb FZ, Zaoui W (2010) Approaches to improve the resources management in the simulator CloudSim. In: Information computing and applications. Springer, pp 189–196

  29. Singleton LC, Poellabauer C, Schwan K (2005) Monitoring of cache miss rates for accurate dynamic voltage and frequency scaling. In: Electronic imaging 2005. International Society for Optics and Photonics, pp 121–125

  30. Norstad J (2009) A MapReduce algorithm for matrix multiplication. http://www.norstad.org/matrix-multiply/. Accessed 5 Feb 2016

  31. Hammoud M, Rehman MS, Sakr MF (2012) Center-of-gravity reduce task scheduling to lower mapreduce network traffic. In: International Conference on Cloud Computing (CLOUD). IEEE, pp 49–58

  32. Kc K, Anyanwu K (2010) Scheduling hadoop jobs to meet deadlines. In: 2010 IEEE Second International Conference on Cloud Computing Technology and Science. IEEE, pp 388–392

  33. He C, Lu Y, Swanson D (2013) Real-time scheduling in mapreduce clusters. In: High performance computing and communications and embedded and ubiquitous computing (\(HPCC\_EUC\)). IEEE, pp 1536–1544

  34. Jung J, Kim H (2012) MR-CloudSim: Designing and implementing MapReduce computing model on CloudSim. In: 2012 International Conference on ICT Convergence (ICTC). IEEE, pp 504–509

  35. Minas L, Ellison B (2009) Energy efficiency for information technology: how to reduce power consumption in servers and data centers. Intel Press

Download references

Acknowledgments

This research is sponsored by the Natural Science Foundation of China (NSFC) under Grant no. 61202015 and 61533011, Shandong Provincial Natural Science Foundation under Grant no. ZR2013FM028 and ZR2015FM001, the Fundamental Research Funds of Shandong University under no. 2015JC030.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Ju.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cai, X., Li, F., Li, P. et al. SLA-aware energy-efficient scheduling scheme for Hadoop YARN. J Supercomput 73, 3526–3546 (2017). https://doi.org/10.1007/s11227-016-1653-7

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-016-1653-7

Keywords

Navigation