Skip to main content

Advertisement

Log in

A Task-Based Greedy Scheduling Algorithm for Minimizing Energy of MapReduce Jobs

  • Published:
Journal of Grid Computing Aims and scope Submit manuscript

Abstract

MapReduce and its open source implementation, Hadoop, have gained widespread adoption for parallel processing of big data jobs. Since the number of such big data jobs is also rapidly rising, reducing their energy consumption is increasingly more important to reduce environmental impact as well as operational costs. Prior work by Mashayekhy et al. (IEEE Trans. Parallel Distributed Syst. 26, 2720–2733, 2016), has tackled the problem of energy-aware scheduling of a single MapReduce job but we provide a far more efficient heuristic in this paper. We first model the problem as an Integer Linear Program to find the optimal solution using ILP solvers. Then we present a task-based greedy scheduling algorithm, TGSAVE, to select a slot for each task to minimize the total energy consumption of the MapReduce job for big data applications in heterogeneous environments without significant performance loss while satisfying the service level agreement (SLA). We perform several experiments on a Hadoop cluster to measure characteristics of tasks for nine different applications to evaluate our proposed algorithm. The results show that the total energy consumption of MapReduce jobs obtained by TGSAVE is up to 35% less than that achieved by EMRSA proposed in Mashayekhy et al. (IEEE Trans. Parallel Distributed Syst. 26, 2720–2733, 2016), its closest rival, for same workloads. Besides, TGSAVE is capable of finding a solution in same order of time for up to 74% tighter deadlines than the tightest deadline that EMRSA can find a feasible one. On average, TGSAVE solution is approximately 1.4% far from the optimal solution, and it can meet deadlines as tight as 12%, on average, above the energy-oblivious minimum makespan in the benchmarks we examined.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Annual Energy Review: Tech. rep. (2012). http://www.eia.gov/totalenergy/data/annual/pdf/aer.pdf (2011)

  2. Adaptive Computing, I.: TORQUE Resource Manager. http://www.adaptivecomputing.com/products/open-source/torque/

  3. Ahmad, F., Chakradhar, S., Raghunathan, A., Vijaykumar, T.N.: Tarazu : Optimizing MapReduce On Heterogeneous Clusters. In: Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems APLOS 40, pp. 61–74 (2012)

  4. Ahmad, F., Lee, S., Thottethodi, M., Vijaykumar, T.N.: PUMA: Purdue MapReduce Benchmarks Suite (2012)

  5. Anjos, J.C.S., Carrera, I., Kolberg, W., Tibola, A.L., Arantes, L.B., Geyer, C.R.: MRA++: scheduling and data placement on MapReduce for heterogeneous environments. Futur Gener Comput Syst 42, 22–35 (2015)

    Article  Google Scholar 

  6. Apache: Capacity Schedular for Hadoop. https://hadoop.apache.org/docs/r1.2.1/capacity_scheduler.html

  7. Apache: Hadoop. Hadoop.Apache.org (2016)

  8. Apache: Hadoop Fair Scheduler. hadoop.apache.org/docs/r1.2.1/fair_scheduler.html (2016)

  9. Apache: HOD Schedular. https://hadoop.apache.org/docs/r1.2.1/capacity_scheduler.html (2016)

  10. Bryk, P., Malawski, M., Juve, G., Deelman, E.: Storage-aware algorithms for scheduling of workflow ensembles in clouds. Journal of Grid Computing 14(2), 359–378 (2016)

    Article  Google Scholar 

  11. Cho, B., Rahman, M., Chajed, T., Gupta, I.: Natjam: eviction policies for supporting priorities and deadlines in mapreduce clusters (2013)

  12. Cisco: Cisco Global Cloud Index : Forecast and Methodology , pp. 2014–2019. Tech. rep. (2014)

  13. Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM 51, 107–113 (2008)

    Article  Google Scholar 

  14. Ebrahimirad, V., Goudarzi, M., Rajabi, A.: Energy-aware scheduling for precedence-constrained parallel virtual machines in virtualized data centers. Journal of Grid Computing 13(2), 233–253 (2015)

    Article  Google Scholar 

  15. Fredman, M., Tarjan, R.: Fibonacci heaps and their uses in improved network optimization algorithms. J. Assoc. Comput. Mach. 34, 596–615 (1987)

    Article  MathSciNet  Google Scholar 

  16. Guo, Z., Fox, G.: Improving MapReduce performance in heterogeneous network environments and resource utilization. In: Proceedings - 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2012, pp. 714–716 (2012)

  17. Kansal, N.J., Chana, I.: Energy-aware virtual machine migration for cloud computing-a firefly optimization approach. Journal of Grid Computing 14(2), 327–345 (2016)

    Article  Google Scholar 

  18. Kim, H., Ahn, J. H., Kim, J.: Exploiting replicated cache blocks to reduce L2 cache leakage in CMPs. IEEE Transactions on Very Large Scale Integration (VLSI) Systems (10), 1863–1877 (2013)

    Article  Google Scholar 

  19. Krish, K., Anwar, A., Butt, A.R.: [phi]Sched: A Heterogeneity-Aware Hadoop Workflow Scheduler. In: 2014 IEEE 22nd International Symposium on Modelling, Analysis &, Simulation of Computer and Telecommunication Systems, pp. 255–264 (2014)

  20. Lang, W., Patel, J.M.: Energy management for MapReduce clusters. Proceedings of the VLDB Endowment 3, 129–139 (2010)

    Article  Google Scholar 

  21. Leverich, J., Kozyrakis, C.: On the energy (in)efficiency of Hadoop clusters. ACM SIGOPS Operating Systems Review 44, 61–65 (2010)

    Article  Google Scholar 

  22. Marszałkowski, J.M., Drozdowski, M., Marszałkowski, J.: Time and energy performance of parallel systems with hierarchical memory. Journal of Grid Computing 14(1), 153–170 (2016)

    Article  Google Scholar 

  23. Mashayekhy, L., Movahed Nejad, M., Grosu, D., Zhang, Q., Shi, W.: Energy-aware Scheduling of MapReduce Jobs for Big Data Applications. IEEE Transactions on Parallel and Distributed Systems 26, 2720–2733 (2015)

    Article  Google Scholar 

  24. Meisner, D., Gold, B.T., Wenisch, T.F.: PowerNap. ACM SIGARCH Computer Architecture News 37, 205 (2009)

    Article  Google Scholar 

  25. Nabavinejad, S.M., Goudarzi, M., Abedi, S.: MapReduce Service Provisioning for Frequent Jobs on Green Clouds Considering Data Transfers. Technical Report, Computer Engineering Department Sharif University of Technology (2016)

  26. Pereira, R., Couto, M., Ribeiro, F., Rua, R., Cunha, J., Fernandes, J.P., Saraiva, J.: Energy efficiency across programming languages: how do energy, time, and memory relate?. In: Proceedings of the 10th ACM SIGPLAN International Conference on Software Language Engineering, pp. 256–267. ACM (2017)

  27. Powell, M.D., Yang, S.H., Falsafi, B., Roy, K., Vijaykumar, T.N.: An Energy-Efficient High-Performance Deep-Submicron instruction cache. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, pp. 1–13 (2001)

  28. Rasooli, A., Down, D.G.: Guidelines for selecting hadoop schedulers based on system heterogeneity. Journal of grid computing 12(3), 499–519 (2014)

    Article  Google Scholar 

  29. Sueur, E.L., Heiser, G.: Dynamic voltage and frequency scaling: The laws of diminishing returns. In: Proceedings of the 2010 international conference on Power aware computing and systems, pp. 1–8 (2010)

  30. Tang, Z., Qi, L., Cheng, Z., Li, K., Khan, S.U., Li, K.: An energy-efficient task scheduling algorithm in dvfs-enabled cloud environment. Journal of Grid Computing 14(1), 55–74 (2016)

    Article  Google Scholar 

  31. Tavarageri, S., Sadayappan, P.: A compiler analysis to determine useful cache size for energy efficiency. In: 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum, pp. 923–930 (2013)

  32. Tian, C., Zhou, H., He, Y., Zha, L.: A dynamic mapreduce scheduler for heterogeneous workloads 2009 Eighth International Conference on Grid and Cooperative Computing, pp. 218–224 (2009)

  33. Wang, Y., Lu, W., Lou, R., Wei, B.: Improving mapreduce performance with partial speculative execution. Journal of Grid Computing 13(4), 587–604 (2015)

    Article  Google Scholar 

  34. White, T.: Hadoop: The Definitive Guide, O’Reilly Media, Inc (2012)

  35. Wolf, J., Rajan, D., Hildrum, K., Khandekar, R., Kumar, V., Parekh, S., Wu, K.L., Balmin, A.: FLEX: a slot allocation scheduling optimizer for MapReduce workloads. In: ACM/IFIP/USENIX International Conference on Distributed Systems Platforms and Open Distributed Processing, pp. 1–20 (2010)

    Google Scholar 

  36. Xie, J., Yin, S., Ruan, X., Ding, Z., Tian, Y., Majors, J., Manzanares, A., Qin, X.: Improving MapReduce performance through data placement in heterogeneous Hadoop clusters. Parallel & Distributed Processing. In: 2010 IEEE International Symposium on Workshops and Phd Forum (IPDPSW) 9, pp. 29–42 (2010)

  37. Yan, F., Cherkasova, L., Zhang, Z., Smirni, E.: DyScale: a mapreduce job scheduler for heterogeneous multicore processors (2015)

  38. Yang, S.J., Chen, Y.R.: Design adaptive task allocation scheduler to improve MapReduce performance in heterogeneous clouds. J. Netw. Comput. Appl. 57, 61–70 (2015)

    Article  Google Scholar 

  39. Yigitbasi, N., Datta, K., Jain, N., Willke, T.: Energy efficient scheduling of MapReduce workloads on heterogeneous clusters. In: Proceedings of the 2nd International Workshop - GCM ’11 on Green Computing Middleware, pp. 1–6 (2011)

  40. Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R., Stoica, I.: Improving mapreduce performance in heterogeneous environments. Proceedings of the USENIX OSDI, pp. 8 (2008)

  41. Zhang, Q., Zhani, M.F., Boutaba, R., Hellerstein, J.L.: Dynamic heterogeneity-aware resource provisioning in the cloud. In: Distributed Computing Systems (ICDCS), 2013 IEEE 33Rd International Conference on, pp. 510–519 (2013)

Download references

Acknowledgements

The authors would like to thank Seyed Morteza Nabavinejad for his helpful advice and helping us in profiling workloads. We would also like to cordially thank Lena Mashayekhy for kindly providing us with the profiled data of Tera Sort benchmark workloads they used in their experiments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maziar Goudarzi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yousefi, M.H.N., Goudarzi, M. A Task-Based Greedy Scheduling Algorithm for Minimizing Energy of MapReduce Jobs. J Grid Computing 16, 535–551 (2018). https://doi.org/10.1007/s10723-018-9464-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10723-018-9464-0

Keywords

Navigation