An Energy-Efficient Greedy MapReduce Scheduler for Heterogeneous Hadoop YARN Cluster

Pandey, Vaibhav; Saini, Poonam

doi:10.1007/978-3-030-04780-1_19

Vaibhav Pandey¹⁸ &
Poonam Saini¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11297))

Included in the following conference series:

International Conference on Big Data Analytics

1540 Accesses
6 Citations

Abstract

Energy efficiency of a MapReduce system has become an essential part of infrastructure management in the field of big data analytics. Here, Hadoop scheduler plays a vital role in order to ensure the energy efficiency of the system. A handful of MapReduce scheduling algorithms have been proposed in the literature for slot-based Hadoop system (i.e., Hadoop 0.x and Hadoop 1.x) to minimize the overall energy consumption. However, YARN-based Hadoop schedulers have not been discussed much in the literature. In this paper, we design a scheduling model for Hadoop YARN architecture and formulate the energy efficient scheduling problem as an Integer Program. To solve the problem, we propose a Greedy scheduler which selects the best job with minimum energy consumption in each iteration. We evaluate the performance of the proposed algorithm against the FAIR and Capacity schedulers and find out that our greedy scheduler shows better results for both CPU- and I/O intensive workloads.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies, MSST2010 (2010)
Google Scholar
Welcome to Apache Pig! https://pig.apache.org/. Accessed 25 June 2018
Apache Hive TM. https://hive.apache.org/. Accessed 25 June 2018
Apache Mahout: Scalable machine learning and data mining. http://mahout.apache.org/. Accessed 25 June 2018
ZooKeeper. https://zookeeper.apache.org/doc/trunk/zookeeperOver.html. Accessed 25 June 2018
Shehabi, A., et al.: United States Data Center Energy Usage Report, June 2016
Google Scholar
Cai, X., Li, F., Li, P., Ju, L., Jia, Z.: SLA-aware energy-efficient scheduling scheme for Hadoop YARN. J. Supercomput. 73(8), 3526–3546 (2017)
Article Google Scholar
Bampis, E., Chau, V., Letsios, D., Lucarelli, G., Milis, I., Zois, G.: Energy efficient scheduling of MapReduce jobs. In: Silva, F., Dutra, I., Santos Costa, V. (eds.) Euro-Par 2014. LNCS, vol. 8632, pp. 198–209. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09873-9_17
Chapter Google Scholar
Leverich, J., Kozyrakis, C.: On the energy (in)efficiency of Hadoop clusters. ACM SIGOPS Oper. Syst. Rev. 44(1), 61 (2010)
Article Google Scholar
Lang, W., Patel, J.M.: Energy management for MapReduce clusters. Proc. VLDB Endow. 3(1–2), 129–139 (2010)
Article Google Scholar
Chen, Y., Alspaugh, S., Borthakur, D., Katz, R.: Energy efficiency for large-scale MapReduce workloads with significant interactive analysis. In: Proceedings of the 7th ACM European Conference on Computer Systems – EuroSys 2012, p. 43 (2012)
Google Scholar
Yigitbasi, N., Datta, K., Jain, N., Willke, T.: Energy efficient scheduling of MapReduce workloads on heterogeneous clusters. In: Green Computing Middleware on Proceedings of the 2nd International Workshop – GCM 2011, pp. 1–6 (2011)
Google Scholar
Mashayekhy, L., Nejad, M.M., Grosu, D., Zhang, Q., Shi, W.: Energy-aware scheduling of MapReduce jobs for big data applications. IEEE Trans. Parallel Distrib. Syst. (1), 1 (2015)
Google Scholar
Verma, A., Cherkasova, L., Campbell, R.H.: ARIA: automatic resource inference and allocation for MapReduce environments. In: Proceedings of the 8th ACM International Conference on Autonomic Computing - ICAC 2011, p. 235 (2011)
Google Scholar

Download references

Acknowledgment

Authors would like to thank Ministry of Electronics and IT, Govt. of India for providing financial support to perform this work under the Visvesvaraya Ph.D. scheme.

Author information

Authors and Affiliations

Department of CSE, Punjab Engineering College (Deemed to be University), Chandigarh, India
Vaibhav Pandey & Poonam Saini

Authors

Vaibhav Pandey
View author publications
You can also search for this author in PubMed Google Scholar
Poonam Saini
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vaibhav Pandey .

Editor information

Editors and Affiliations

Ashoka University, Sonepat, India
Anirban Mondal
IBM Research - India, New Delhi, India
Himanshu Gupta
University of Minnesota, Minneapolis, MN, USA
Jaideep Srivastava
IIIT, Hyderabad, India
P. Krishna Reddy
National Institute of Technology, Warangal, India
D.V.L.N. Somayajulu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pandey, V., Saini, P. (2018). An Energy-Efficient Greedy MapReduce Scheduler for Heterogeneous Hadoop YARN Cluster. In: Mondal, A., Gupta, H., Srivastava, J., Reddy, P., Somayajulu, D. (eds) Big Data Analytics. BDA 2018. Lecture Notes in Computer Science(), vol 11297. Springer, Cham. https://doi.org/10.1007/978-3-030-04780-1_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-04780-1_19
Published: 22 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04779-5
Online ISBN: 978-3-030-04780-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics