Abstract
Apache Hadoop becomes ubiquitous for cloud computing which provides resources as services for multi-tenant applications. YARN (a.k.a. MapReduce 2.0) is one of the key features in the second-generation Hadoop, which provides resource management and scheduling for large-scale MapReduce environments. Two enormous challenges in the YARN scheduler are the abilities to automatically tailor and control resource allocations to different jobs for achieving their Service Level Agreements (SLAs), and minimize energy consumption of the overall cloud computing system. In this work, we propose an SLA-aware energy-efficient scheduling scheme which allocates appropriate amount of resources to MapReduce applications with YARN architecture. In our task scheduling policy, We consider the data locality information to save the MapReduce network traffic. Furthermore, the slack time between the actual execution time of completed tasks and expected completion time of the application is utilized to improve the energy-efficiency of the system. An online userspace governor-based dynamic voltage and frequency scaling (DVFS) scheme is designed in the YARN per-application ApplicationMaster to dynamically change the CPU frequency for upcoming tasks given the slack time from previous completed tasks. Experimental evaluation shows that our proposed scheme outperforms the existing MapReduce scheduling policies in terms of both resource ultization and energy-efficiency.








Similar content being viewed by others
References
Dean J, Ghemawat S (2008) MapReduce: simplified data processing on large clusters. Commun ACM 51(1):107–113
Apache Hadoop. http://hadoop.apache.org/. Accessed 5 Feb 2016
Vavilapalli VK et al (2013) Apache hadoop yarn: yet another resource negotiator. In: Proceedings of the 4th annual Symposium on Cloud Computing. ACM, p 5
Van Heddeghem W et al (2014) Trends in worldwide ICT electricity consumption from 2007 to 2012. Comput Commun 50:64–76
Capacity Scheduler. http://hadoop.apache.org/docs/current/hadoop-yarn/hadoopyarn-site/CapacityScheduler.html. Accessed 5 Feb 2016
Fair Scheduler. http://hadoop.apache.org/docs/current/hadoop-yarn/hadoopyarn-site/FairScheduler.html. Accessed 5 Feb 2016
Ibrahim S et al (2014) Towards efficient power management in MapReduce: investigation of CPU-frequencies scaling on power efficiency in Hadoop. In: Adaptive resource management and scheduling for cloud computing. Springer, pp 147–164
Verma A, Cherkasova L, Campbell RH (2011) ARIA: automatic resource inference and allocation for mapreduce environments. In: Proceedings of the 8th ACM international conference on Autonomic computing. ACM, pp 235–244
Calheiros RN, Ranjan R, Beloglazov A, De Rose CA, Buyya R (2011) Cloudsim: a toolkit for modeling and simulation of cloud computing environments and evaluation of resource provisioning algorithms. Softw Pract Exp 41(1):23–50
Polo J, et al (2010) Performance-driven task co-scheduling for mapreduce environments. In: Network operations and management symposium (NOMS). IEEE, pp 373–380
Ferguson AD, Bodik P, Kandula S, Boutin E, Fonseca R (2012) Jockey: guaranteed job latency in data parallel clusters. In: Proceedings of the 7th ACM european conference on Computer Systems. ACM, pp 99–112
Yao Y, Wang J, Sheng B, Lin J, Mi N (2014) Haste: Hadoop yarn scheduling based on task-dependency and resource-demand. In: IEEE 7th International Conference on Cloud Computing (CLOUD). IEEE, pp 184–191
Davis RI, Burns A (2011) A survey of hard real-time scheduling for multiprocessor systems. ACM Comput Surv (CSUR) 43(4):35
Qiu M, Sha EH-M (2009) Cost minimization while satisfying hard/soft timing constraints for heterogeneous embedded systems. ACM Trans Des Autom Electron Syst (TODAES) 14(2):25
Li J, Ming Z, Qiu M, Quan G, Qin X, Chen T (2011) Resource allocation robustness in multi-core embedded systems with inaccurate information. J Syst Archit 57(9):840–849
Krishna CM, Lee Y-H (2000) Voltage-clock-scaling adaptive scheduling techniques for low power in hard real-time systems. In: 19th Real-Time and Embedded Technology and Applications Symposium (RTAS). IEEE, pp 156–156
Kim W, Shin D, Yun H-S, Kim J, Min SL (2002) Performance comparison of dynamic voltage scaling algorithms for hard real-time systems. In: Proceedings of the Eighth IEEE Real-Time and Embedded Technology and Applications Symposium. IEEE, pp 219–228
Ge R et al (2010) Powerpack: energy profiling and analysis of high-performance systems and applications. IEEE Trans Parallel Distrib Syst 21(5):658–671
Wang L, Von Laszewski G, Dayal J, Wang F (2010) Towards energy aware scheduling for precedence constrained parallel tasks in a cluster with DVFS. In: IEEE/ACM International Conference on Cluster, Cloud and Grid Computing (CCGrid). IEEE, pp 368–377
Wang Y, Liu H, Liu D, Qin Z, Shao Z, Sha EH-M (2011) Overhead-aware energy optimization for real-time streaming applications on multiprocessor system-on-chip. ACM Trans Des Autom Electron Syst (TODAES) 16(2):14
Wirtz T, Ge R (2011) Improving mapreduce energy efficiency for computation intensive workloads. In: 2011 International Green Computing Conference and Workshops (IGCC). IEEE, pp 1–8
Ge R, Feng X, Feng W-C, Cameron KW (2007) Cpu miser: A performance-directed, run-time system for power-aware clusters. In: International Conference on Parallel Processing (ICPP). IEEE, pp 18–18
Kim W, Gupta MS, Wei G-Y, Brooks D (2008) System level analysis of fast, per-core DVFS using on-chip switching regulators. In: IEEE 14th International Symposium on High Performance Computer Architecture. IEEE, pp 123–134
Maheshwari N, Nanduri R, Varma V (2012) Dynamic energy efficient data placement and cluster reconfiguration algorithm for MapReduce framework. Future Gener Comput Syst 28(1):119–127
Cardosa M, Singh A, Pucha H, Chandra A (2012) Exploiting spatio-temporal tradeoffs for energy-aware mapreduce in the cloud. IEEE Trans Comput 61(12):1737–1751
Beloglazov A, Buyya R (2012) Optimal online deterministic algorithms and adaptive heuristics for energy and performance efficient dynamic consolidation of virtual machines in cloud data centers. Concurr Comput Pract Exp 24(13):1397–1420
Babu S (2010) Towards automatic optimization of MapReduce programs. In: Proceedings of the 1st ACM symposium on Cloud computing. ACM, pp 137–142
Belalem G, Tayeb FZ, Zaoui W (2010) Approaches to improve the resources management in the simulator CloudSim. In: Information computing and applications. Springer, pp 189–196
Singleton LC, Poellabauer C, Schwan K (2005) Monitoring of cache miss rates for accurate dynamic voltage and frequency scaling. In: Electronic imaging 2005. International Society for Optics and Photonics, pp 121–125
Norstad J (2009) A MapReduce algorithm for matrix multiplication. http://www.norstad.org/matrix-multiply/. Accessed 5 Feb 2016
Hammoud M, Rehman MS, Sakr MF (2012) Center-of-gravity reduce task scheduling to lower mapreduce network traffic. In: International Conference on Cloud Computing (CLOUD). IEEE, pp 49–58
Kc K, Anyanwu K (2010) Scheduling hadoop jobs to meet deadlines. In: 2010 IEEE Second International Conference on Cloud Computing Technology and Science. IEEE, pp 388–392
He C, Lu Y, Swanson D (2013) Real-time scheduling in mapreduce clusters. In: High performance computing and communications and embedded and ubiquitous computing (\(HPCC\_EUC\)). IEEE, pp 1536–1544
Jung J, Kim H (2012) MR-CloudSim: Designing and implementing MapReduce computing model on CloudSim. In: 2012 International Conference on ICT Convergence (ICTC). IEEE, pp 504–509
Minas L, Ellison B (2009) Energy efficiency for information technology: how to reduce power consumption in servers and data centers. Intel Press
Acknowledgments
This research is sponsored by the Natural Science Foundation of China (NSFC) under Grant no. 61202015 and 61533011, Shandong Provincial Natural Science Foundation under Grant no. ZR2013FM028 and ZR2015FM001, the Fundamental Research Funds of Shandong University under no. 2015JC030.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cai, X., Li, F., Li, P. et al. SLA-aware energy-efficient scheduling scheme for Hadoop YARN. J Supercomput 73, 3526–3546 (2017). https://doi.org/10.1007/s11227-016-1653-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-016-1653-7