Skip to main content

An Energy-Efficient Greedy MapReduce Scheduler for Heterogeneous Hadoop YARN Cluster

  • Conference paper
  • First Online:
Big Data Analytics (BDA 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11297))

Included in the following conference series:

Abstract

Energy efficiency of a MapReduce system has become an essential part of infrastructure management in the field of big data analytics. Here, Hadoop scheduler plays a vital role in order to ensure the energy efficiency of the system. A handful of MapReduce scheduling algorithms have been proposed in the literature for slot-based Hadoop system (i.e., Hadoop 0.x and Hadoop 1.x) to minimize the overall energy consumption. However, YARN-based Hadoop schedulers have not been discussed much in the literature. In this paper, we design a scheduling model for Hadoop YARN architecture and formulate the energy efficient scheduling problem as an Integer Program. To solve the problem, we propose a Greedy scheduler which selects the best job with minimum energy consumption in each iteration. We evaluate the performance of the proposed algorithm against the FAIR and Capacity schedulers and find out that our greedy scheduler shows better results for both CPU- and I/O intensive workloads.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Shvachko, K., Kuang, H., Radia, S., Chansler, R.: The Hadoop distributed file system. In: 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies, MSST2010 (2010)

    Google Scholar 

  2. Welcome to Apache Pig! https://pig.apache.org/. Accessed 25 June 2018

  3. Apache Hive TM. https://hive.apache.org/. Accessed 25 June 2018

  4. Apache Mahout: Scalable machine learning and data mining. http://mahout.apache.org/. Accessed 25 June 2018

  5. ZooKeeper. https://zookeeper.apache.org/doc/trunk/zookeeperOver.html. Accessed 25 June 2018

  6. Shehabi, A., et al.: United States Data Center Energy Usage Report, June 2016

    Google Scholar 

  7. Cai, X., Li, F., Li, P., Ju, L., Jia, Z.: SLA-aware energy-efficient scheduling scheme for Hadoop YARN. J. Supercomput. 73(8), 3526–3546 (2017)

    Article  Google Scholar 

  8. Bampis, E., Chau, V., Letsios, D., Lucarelli, G., Milis, I., Zois, G.: Energy efficient scheduling of MapReduce jobs. In: Silva, F., Dutra, I., Santos Costa, V. (eds.) Euro-Par 2014. LNCS, vol. 8632, pp. 198–209. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09873-9_17

    Chapter  Google Scholar 

  9. Leverich, J., Kozyrakis, C.: On the energy (in)efficiency of Hadoop clusters. ACM SIGOPS Oper. Syst. Rev. 44(1), 61 (2010)

    Article  Google Scholar 

  10. Lang, W., Patel, J.M.: Energy management for MapReduce clusters. Proc. VLDB Endow. 3(1–2), 129–139 (2010)

    Article  Google Scholar 

  11. Chen, Y., Alspaugh, S., Borthakur, D., Katz, R.: Energy efficiency for large-scale MapReduce workloads with significant interactive analysis. In: Proceedings of the 7th ACM European Conference on Computer Systems – EuroSys 2012, p. 43 (2012)

    Google Scholar 

  12. Yigitbasi, N., Datta, K., Jain, N., Willke, T.: Energy efficient scheduling of MapReduce workloads on heterogeneous clusters. In: Green Computing Middleware on Proceedings of the 2nd International Workshop – GCM 2011, pp. 1–6 (2011)

    Google Scholar 

  13. Mashayekhy, L., Nejad, M.M., Grosu, D., Zhang, Q., Shi, W.: Energy-aware scheduling of MapReduce jobs for big data applications. IEEE Trans. Parallel Distrib. Syst. (1), 1 (2015)

    Google Scholar 

  14. Verma, A., Cherkasova, L., Campbell, R.H.: ARIA: automatic resource inference and allocation for MapReduce environments. In: Proceedings of the 8th ACM International Conference on Autonomic Computing - ICAC 2011, p. 235 (2011)

    Google Scholar 

Download references

Acknowledgment

Authors would like to thank Ministry of Electronics and IT, Govt. of India for providing financial support to perform this work under the Visvesvaraya Ph.D. scheme.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vaibhav Pandey .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pandey, V., Saini, P. (2018). An Energy-Efficient Greedy MapReduce Scheduler for Heterogeneous Hadoop YARN Cluster. In: Mondal, A., Gupta, H., Srivastava, J., Reddy, P., Somayajulu, D. (eds) Big Data Analytics. BDA 2018. Lecture Notes in Computer Science(), vol 11297. Springer, Cham. https://doi.org/10.1007/978-3-030-04780-1_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-04780-1_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-04779-5

  • Online ISBN: 978-3-030-04780-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics