A Task-Based Greedy Scheduling Algorithm for Minimizing Energy of MapReduce Jobs

Yousefi, Mostafa Hadadian Nejad; Goudarzi, Maziar

doi:10.1007/s10723-018-9464-0

A Task-Based Greedy Scheduling Algorithm for Minimizing Energy of MapReduce Jobs

Published: 22 August 2018

Volume 16, pages 535–551, (2018)
Cite this article

Journal of Grid Computing Aims and scope Submit manuscript

189 Accesses
9 Citations
Explore all metrics

Abstract

MapReduce and its open source implementation, Hadoop, have gained widespread adoption for parallel processing of big data jobs. Since the number of such big data jobs is also rapidly rising, reducing their energy consumption is increasingly more important to reduce environmental impact as well as operational costs. Prior work by Mashayekhy et al. (IEEE Trans. Parallel Distributed Syst. 26, 2720–2733, 2016), has tackled the problem of energy-aware scheduling of a single MapReduce job but we provide a far more efficient heuristic in this paper. We first model the problem as an Integer Linear Program to find the optimal solution using ILP solvers. Then we present a task-based greedy scheduling algorithm, TGSAVE, to select a slot for each task to minimize the total energy consumption of the MapReduce job for big data applications in heterogeneous environments without significant performance loss while satisfying the service level agreement (SLA). We perform several experiments on a Hadoop cluster to measure characteristics of tasks for nine different applications to evaluate our proposed algorithm. The results show that the total energy consumption of MapReduce jobs obtained by TGSAVE is up to 35% less than that achieved by EMRSA proposed in Mashayekhy et al. (IEEE Trans. Parallel Distributed Syst. 26, 2720–2733, 2016), its closest rival, for same workloads. Besides, TGSAVE is capable of finding a solution in same order of time for up to 74% tighter deadlines than the tightest deadline that EMRSA can find a feasible one. On average, TGSAVE solution is approximately 1.4% far from the optimal solution, and it can meet deadlines as tight as 12%, on average, above the energy-oblivious minimum makespan in the benchmarks we examined.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Big data analytics on Apache Spark

Article 13 October 2016

A survey of Kubernetes scheduling algorithms

Article Open access 13 June 2023

Task scheduling and VM placement to resource allocation in Cloud computing: challenges and opportunities

Article 08 July 2023

References

Annual Energy Review: Tech. rep. (2012). http://www.eia.gov/totalenergy/data/annual/pdf/aer.pdf (2011)
Adaptive Computing, I.: TORQUE Resource Manager. http://www.adaptivecomputing.com/products/open-source/torque/
Ahmad, F., Chakradhar, S., Raghunathan, A., Vijaykumar, T.N.: Tarazu : Optimizing MapReduce On Heterogeneous Clusters. In: Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems APLOS 40, pp. 61–74 (2012)
Ahmad, F., Lee, S., Thottethodi, M., Vijaykumar, T.N.: PUMA: Purdue MapReduce Benchmarks Suite (2012)
Anjos, J.C.S., Carrera, I., Kolberg, W., Tibola, A.L., Arantes, L.B., Geyer, C.R.: MRA++: scheduling and data placement on MapReduce for heterogeneous environments. Futur Gener Comput Syst 42, 22–35 (2015)
Article Google Scholar
Apache: Capacity Schedular for Hadoop. https://hadoop.apache.org/docs/r1.2.1/capacity_scheduler.html
Apache: Hadoop. Hadoop.Apache.org (2016)
Apache: Hadoop Fair Scheduler. hadoop.apache.org/docs/r1.2.1/fair_scheduler.html (2016)
Apache: HOD Schedular. https://hadoop.apache.org/docs/r1.2.1/capacity_scheduler.html (2016)
Bryk, P., Malawski, M., Juve, G., Deelman, E.: Storage-aware algorithms for scheduling of workflow ensembles in clouds. Journal of Grid Computing 14(2), 359–378 (2016)
Article Google Scholar
Cho, B., Rahman, M., Chajed, T., Gupta, I.: Natjam: eviction policies for supporting priorities and deadlines in mapreduce clusters (2013)
Cisco: Cisco Global Cloud Index : Forecast and Methodology , pp. 2014–2019. Tech. rep. (2014)
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Clusters. Commun. ACM 51, 107–113 (2008)
Article Google Scholar
Ebrahimirad, V., Goudarzi, M., Rajabi, A.: Energy-aware scheduling for precedence-constrained parallel virtual machines in virtualized data centers. Journal of Grid Computing 13(2), 233–253 (2015)
Article Google Scholar
Fredman, M., Tarjan, R.: Fibonacci heaps and their uses in improved network optimization algorithms. J. Assoc. Comput. Mach. 34, 596–615 (1987)
Article MathSciNet Google Scholar
Guo, Z., Fox, G.: Improving MapReduce performance in heterogeneous network environments and resource utilization. In: Proceedings - 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2012, pp. 714–716 (2012)
Kansal, N.J., Chana, I.: Energy-aware virtual machine migration for cloud computing-a firefly optimization approach. Journal of Grid Computing 14(2), 327–345 (2016)
Article Google Scholar
Kim, H., Ahn, J. H., Kim, J.: Exploiting replicated cache blocks to reduce L2 cache leakage in CMPs. IEEE Transactions on Very Large Scale Integration (VLSI) Systems (10), 1863–1877 (2013)
Article Google Scholar
Krish, K., Anwar, A., Butt, A.R.: [phi]Sched: A Heterogeneity-Aware Hadoop Workflow Scheduler. In: 2014 IEEE 22nd International Symposium on Modelling, Analysis &, Simulation of Computer and Telecommunication Systems, pp. 255–264 (2014)
Lang, W., Patel, J.M.: Energy management for MapReduce clusters. Proceedings of the VLDB Endowment 3, 129–139 (2010)
Article Google Scholar
Leverich, J., Kozyrakis, C.: On the energy (in)efficiency of Hadoop clusters. ACM SIGOPS Operating Systems Review 44, 61–65 (2010)
Article Google Scholar
Marszałkowski, J.M., Drozdowski, M., Marszałkowski, J.: Time and energy performance of parallel systems with hierarchical memory. Journal of Grid Computing 14(1), 153–170 (2016)
Article Google Scholar
Mashayekhy, L., Movahed Nejad, M., Grosu, D., Zhang, Q., Shi, W.: Energy-aware Scheduling of MapReduce Jobs for Big Data Applications. IEEE Transactions on Parallel and Distributed Systems 26, 2720–2733 (2015)
Article Google Scholar
Meisner, D., Gold, B.T., Wenisch, T.F.: PowerNap. ACM SIGARCH Computer Architecture News 37, 205 (2009)
Article Google Scholar
Nabavinejad, S.M., Goudarzi, M., Abedi, S.: MapReduce Service Provisioning for Frequent Jobs on Green Clouds Considering Data Transfers. Technical Report, Computer Engineering Department Sharif University of Technology (2016)
Pereira, R., Couto, M., Ribeiro, F., Rua, R., Cunha, J., Fernandes, J.P., Saraiva, J.: Energy efficiency across programming languages: how do energy, time, and memory relate?. In: Proceedings of the 10th ACM SIGPLAN International Conference on Software Language Engineering, pp. 256–267. ACM (2017)
Powell, M.D., Yang, S.H., Falsafi, B., Roy, K., Vijaykumar, T.N.: An Energy-Efficient High-Performance Deep-Submicron instruction cache. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, pp. 1–13 (2001)
Rasooli, A., Down, D.G.: Guidelines for selecting hadoop schedulers based on system heterogeneity. Journal of grid computing 12(3), 499–519 (2014)
Article Google Scholar
Sueur, E.L., Heiser, G.: Dynamic voltage and frequency scaling: The laws of diminishing returns. In: Proceedings of the 2010 international conference on Power aware computing and systems, pp. 1–8 (2010)
Tang, Z., Qi, L., Cheng, Z., Li, K., Khan, S.U., Li, K.: An energy-efficient task scheduling algorithm in dvfs-enabled cloud environment. Journal of Grid Computing 14(1), 55–74 (2016)
Article Google Scholar
Tavarageri, S., Sadayappan, P.: A compiler analysis to determine useful cache size for energy efficiency. In: 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum, pp. 923–930 (2013)
Tian, C., Zhou, H., He, Y., Zha, L.: A dynamic mapreduce scheduler for heterogeneous workloads 2009 Eighth International Conference on Grid and Cooperative Computing, pp. 218–224 (2009)
Wang, Y., Lu, W., Lou, R., Wei, B.: Improving mapreduce performance with partial speculative execution. Journal of Grid Computing 13(4), 587–604 (2015)
Article Google Scholar
White, T.: Hadoop: The Definitive Guide, O’Reilly Media, Inc (2012)
Wolf, J., Rajan, D., Hildrum, K., Khandekar, R., Kumar, V., Parekh, S., Wu, K.L., Balmin, A.: FLEX: a slot allocation scheduling optimizer for MapReduce workloads. In: ACM/IFIP/USENIX International Conference on Distributed Systems Platforms and Open Distributed Processing, pp. 1–20 (2010)
Google Scholar
Xie, J., Yin, S., Ruan, X., Ding, Z., Tian, Y., Majors, J., Manzanares, A., Qin, X.: Improving MapReduce performance through data placement in heterogeneous Hadoop clusters. Parallel & Distributed Processing. In: 2010 IEEE International Symposium on Workshops and Phd Forum (IPDPSW) 9, pp. 29–42 (2010)
Yan, F., Cherkasova, L., Zhang, Z., Smirni, E.: DyScale: a mapreduce job scheduler for heterogeneous multicore processors (2015)
Yang, S.J., Chen, Y.R.: Design adaptive task allocation scheduler to improve MapReduce performance in heterogeneous clouds. J. Netw. Comput. Appl. 57, 61–70 (2015)
Article Google Scholar
Yigitbasi, N., Datta, K., Jain, N., Willke, T.: Energy efficient scheduling of MapReduce workloads on heterogeneous clusters. In: Proceedings of the 2nd International Workshop - GCM ’11 on Green Computing Middleware, pp. 1–6 (2011)
Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R., Stoica, I.: Improving mapreduce performance in heterogeneous environments. Proceedings of the USENIX OSDI, pp. 8 (2008)
Zhang, Q., Zhani, M.F., Boutaba, R., Hellerstein, J.L.: Dynamic heterogeneity-aware resource provisioning in the cloud. In: Distributed Computing Systems (ICDCS), 2013 IEEE 33Rd International Conference on, pp. 510–519 (2013)

Download references

Acknowledgements

The authors would like to thank Seyed Morteza Nabavinejad for his helpful advice and helping us in profiling workloads. We would also like to cordially thank Lena Mashayekhy for kindly providing us with the profiled data of Tera Sort benchmark workloads they used in their experiments.

Author information

Authors and Affiliations

Computer Engineering Department, Sharif University of Technology, Tehran, Iran
Mostafa Hadadian Nejad Yousefi & Maziar Goudarzi

Authors

Mostafa Hadadian Nejad Yousefi
View author publications
You can also search for this author in PubMed Google Scholar
Maziar Goudarzi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Maziar Goudarzi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yousefi, M.H.N., Goudarzi, M. A Task-Based Greedy Scheduling Algorithm for Minimizing Energy of MapReduce Jobs. J Grid Computing 16, 535–551 (2018). https://doi.org/10.1007/s10723-018-9464-0

Download citation

Received: 21 August 2017
Accepted: 07 August 2018
Published: 22 August 2018
Issue Date: December 2018
DOI: https://doi.org/10.1007/s10723-018-9464-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Task-Based Greedy Scheduling Algorithm for Minimizing Energy of MapReduce Jobs

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

A survey of Kubernetes scheduling algorithms

Task scheduling and VM placement to resource allocation in Cloud computing: challenges and opportunities

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Task-Based Greedy Scheduling Algorithm for Minimizing Energy of MapReduce Jobs

Abstract

Access this article

Similar content being viewed by others

Big data analytics on Apache Spark

A survey of Kubernetes scheduling algorithms

Task scheduling and VM placement to resource allocation in Cloud computing: challenges and opportunities

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation