An energy-aware scheduling algorithm for big data applications in Spark

Li, Hongjian; Wang, Huochen; Fang, Shuyong; Zou, Yang; Tian, Wenhong

doi:10.1007/s10586-019-02947-9

An energy-aware scheduling algorithm for big data applications in Spark

Published: 04 June 2019

Volume 23, pages 593–609, (2020)
Cite this article

Cluster Computing Aims and scope Submit manuscript

Hongjian Li ORCID: orcid.org/0000-0002-8558-0838^1,2,
Huochen Wang¹,
Shuyong Fang¹,
Yang Zou¹ &
…
Wenhong Tian²

771 Accesses
14 Citations
Explore all metrics

Abstract

Energy consumption is explosive increasing with the fast growth of big data applications. High carbon emissions from big data platforms have serious impacts on environment. In this paper, we propose an energy-aware scheduling algorithm for Spark (EASAS) to reduce energy consumption while satisfying the service level agreement (SLA). First, we present a new energy consumption model based on Spark framework. Then a strategy table for the relationship between tasks and executors is designed to record the execution time and energy consumption of tasks. The task scheduling in Spark is conducted and optimized based on the strategy table. The proposed strategy overcomes the defect of the default scheduling strategy FIFO and FAIR which cannot perceive energy consumption with the characteristics of energy consumption perception and dynamic optimization scheduling. Compared against FIFO and FAIR, Our EASAS effectively reduces on average about 25–40% of the total energy consumption of Spark applications under deadline constrains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Research on Load Balancing Algorithm Optimization Based on Spark Platform

A frequency-aware and energy-saving strategy based on DVFS for Spark

Article 26 March 2021

Cost-Aware Scheduling and Data Skew Alleviation for Big Data Processing in Heterogeneous Cloud Environment

Article 22 June 2023

References

Hintemann, R., Beucker, S., Clausen, J., Stobbe, L., Proske, M., Nissen, N.F.: Energy efficiency of data centers-A system-oriented analysis of current development trends. In: Proceedings of the Electronics Goes Green 2016 + (EGG), pp. 1–5. IEEE (2016)
Salahuddin, M., Alam, K.: Information and Communication Technology, electricity consumption and economic growth in OECD countries: a panel data analysis. Int. J. Electr. Power Energy Syst. 76, 185–193 (2016)
Article Google Scholar
Zaharia, M., Franklin, M.J., Ghodsi, A., Gonzalez, J., Shenker, S., Stoica, I., Venkataraman, S., et al.: Apache Spark. Commun. ACM 59(11), 56–65 (2016)
Article Google Scholar
Zhang, A.Z.: Spark Technology Insider. Mechanical Industry Press, Beijing (2015)
Google Scholar
Palanisamy, B., Singh, A., Liu, L.: Cost-effective resource provisioning for mapreduce in a cloud. IEEE Trans. Parallel Distrib. Syst. 26(5), 1265–1279 (2015)
Article Google Scholar
Mashayekhy, L., Nejad, M.M., Grosu, D., Zhang, Q., Shi, W.: Energy-aware scheduling of mapreduce jobs for big data applications. IEEE Trans. Parallel Distrib. Syst. 1, 1–1 (2015)
Google Scholar
Buyya, R., Yeo, C.S., Venugopal, S., Broberg, J., Brandic, I.: Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility. Fut. Gener. Comput. Syst. 25(6), 599–616 (2009)
Article Google Scholar
Srikantaiah, S., Kansal, A., Zhao, F.: Energy aware consolidation for cloud computing. Clust. Comput. 12, 10 (2008)
Google Scholar
Ge, R., Feng, X., Wirtz, T., Zong, Z., Chen, Z.: ETune: a power analysis framework for data-intensive computing. In: Proceedings of the 2012 41st International Conference on Parallel Processing Workshops, pp. 254–261. IEEE (2012)
Zhan, J., Wang, L., Li, X., Shi, W., Weng, C., Zhang, W., Zang, X.: Cost-aware cooperative resource provisioning for heterogeneous workloads in data centers. IEEE Trans. Comput. 62(11), 2155–2168 (2013)
Article MathSciNet MATH Google Scholar
Song, Y., Sun, Y., Shi, W.: A two-tiered on-demand resource allocation mechanism for VM-based data centers. IEEE Trans. Serv. Comput. 6(1), 116–129 (2013)
Article Google Scholar
Nejad, M.M., Mashayekhy, L., Grosu, D.: Truthful greedy mechanisms for dynamic virtual machine provisioning and allocation in clouds. IEEE Trans. Parallel Distrib. Syst. 26(2), 594–603 (2015)
Article Google Scholar
Mashayekhy, L., Nejad, M.M., Grosu, D.: Cloud federations in the sky: formation game and mechanism. IEEE Trans. Cloud Comput. 3(1), 14–27 (2015)
Article Google Scholar
Mashayekhy, L., Nejad, M.M., Grosu, D., Vasilakos, A.V.: Incentive-compatible online mechanisms for resource provisioning and allocation in clouds. In: Proceedings of the 2014 IEEE 7th International Conference on Cloud Computing (CLOUD), pp. 312–319. IEEE (2014)
Hacker, T.J., Mahadik, K.: Flexible resource allocation for reliable virtual cluster computing systems. In: Proceedings of the 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, p. 48. ACM (2011)
Rajyashree, V.R.: Double threshold based load balancing approach by using VM migration for the cloud computing environment. Int. J. Eng. Comput. Sci. 4(01), 9966–9970 (2015)
Google Scholar
Verma, A., Ahuja, P., Neogi, A.: pMapper: power and migration cost aware application placement in virtualized systems. In: Proceedings of the 9th ACM/IFIP/USENIX International Conference on Middleware, pp. 243–264. Springer, New York (2008)
Beloglazov, A., Abawajy, J., Buyya, R.: Energy-aware resource allocation heuristics for efficient management of data centers for cloud computing. Fut. Gener. Comput. Syst. 28(5), 755–768 (2012)
Article Google Scholar
Beloglazov, A., Buyya, R.: Optimal online deterministic algorithms and adaptive heuristics for energy and performance efficient dynamic consolidation of virtual machines in cloud data centers. Concurr. Comput. 24(13), 1397–1420 (2012)
Article Google Scholar
Gupta, R., Bose, S.K., Sundarrajan, S., Chebiyam, M., Chakrabarti, A.: A two stage heuristic algorithm for solving the server consolidation problem with item-item and bin-item incompatibility constraints. In: Proceedings of the 2008 SCC’08, IEEE International Conference on Services Computing, vol. 2, pp. 39–46. IEEE (2008)
Dai, X., Wang, J.M., Bensaou, B.: Energy-efficient virtual machines scheduling in multi-tenant data centers. IEEE Trans. Cloud Comput. 4(2), 210–221 (2016)
Article Google Scholar
Quang-Hung, N., Nien, P.D., Nam, N. H., Tuong, N.H., Thoai, N.: A genetic algorithm for power-aware virtual machine allocation in private cloud. In: Proceedings of the Information and Communication Technology-EurAsia Conference, pp. 183-191. Springer, Berlin (2013)
Agrawal, S., Bose, S.K., Sundarrajan, S.: Grouping genetic algorithm for solving the server consolidation problem with conflicts. In: Proceedings of the first ACM/SIGEVO Summit on Genetic and Evolutionary Computation, pp. 1–8. ACM (2009)
Eberhart, R., Kennedy, J.: A new optimizer using particle swarm theory. In: Proceedings of the Sixth International Symposium on Micro Machine and Human Science, MHS’95, pp. 39–43. IEEE (1995)
Del Valle, Y., Venayagamoorthy, G.K., Mohagheghi, S., Harley, R.G., Hernandez, J.C.: Particle swarm optimization: basic concepts, variants and applications in power systems. IEEE Trans. Evol. Comput. 12, 171–195 (2008)
Article Google Scholar
Zeng, N., Wang, Z., Zhang, H., Alsaadi, F.E.: A novel switching delayed PSO algorithm for estimating unknown parameters of lateral flow immunoassay. Cogn. Comput. 8(2), 143–152 (2016)
Article Google Scholar
Xiong, A.P., Xu, C.X.: Energy efficient multiresource allocation of virtual machine based on PSO in cloud data center. Math. Probl. Eng. (2014). https://doi.org/10.1155/2014/816518
Article Google Scholar
Li, H., Zhu, G., Cui, C., Tang, H., Dou, Y., He, C.: Energy-efficient migration and consolidation algorithm of virtual machines in data centers for cloud computing. Computing 98(3), 303–317 (2016)
Article MathSciNet MATH Google Scholar
Li, H., Zhu, G., Zhao, Y., Dai, Y., Tian, W.: Energy-efficient and QoS-aware model based resource consolidation in cloud data centers. Clust. Comput. 20(3), 2793–2803 (2017)
Article Google Scholar
Ren, Z., Wan, J., Shi, W., Xu, X., Zhou, M.: Workload analysis, implications, and optimization on a production hadoop cluster: a case study on taobao. IEEE Trans. Serv. Comput. 7(2), 307–321 (2014)
Article Google Scholar
Gufler, B., Augsten, N., Reiser, A., Kemper, A.: Handling Data skew in MapReduce. Closer 11, 574–583 (2011)
Google Scholar
Cheng, D., Rao, J., Guo, Y., Jiang, C., Zhou, X.: Improving performance of heterogeneous mapreduce clusters with adaptive task tuning. IEEE Trans. Parallel Distrib. Syst. 28(3), 774–786 (2017)
Article Google Scholar
Tian, W., Li, G., Yang, W., Buyya, R.: HScheduler: an optimal approach to minimize the makespan of multiple MapReduce jobs. J. Supercomput. 72(6), 2376–2393 (2016)
Article Google Scholar
Islam, M.T., Karunasekera, S., Buyya, R.: dSpark: deadline-based resource allocation for big data applications in Apache Spark. In: Proceedings of the 2017 IEEE 13th International Conference one-Science (e-Science), pp. 89–98. IEEE (2017)
Sidhanta, S., Golab, W., Mukhopadhyay, S.: Optex: a deadline-aware cost optimization model for spark. In: Proceedings of the 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 193–202. IEEE (2016)
Chen, J., Li, K., Tang, Z., Bilal, K., Yu, S., Weng, C., Li, K.: A parallel random forest algorithm for big data in a spark cloud computing environment. IEEE Trans. Parallel Distrib. Syst. 1, 1–1 (2017)
Google Scholar
Yang, H., Liu, X., Chen, S., Lei, Z., Du, H., Zhu, C.: Improving Spark performance with MPTE in heterogeneous environments. In: Proceedings of the 2016 International Conference on Audio, Language and Image Processing (ICALIP), pp. 28–33. IEEE (2016)
Chen, H., Wang, F.Z.: Spark on entropy: a reliable & efficient scheduler for low-latency parallel jobs in heterogeneous cloud. In: Proceedings of the 2015 IEEE 40th International Conference on Local Computer Networks Conference Workshops (LCN Workshops), pp. 708–713. IEEE (2015)
Gounaris, A., Kougka, G., Tous, R., Montes, C.T., Torres, J.: Dynamic configuration of partitioning in spark applications. IEEE Trans. Parallel Distrib. Syst. 28(7), 1891–1904 (2017)
Article Google Scholar
Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In: Proceedings of the 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW), pp. 41–51. IEEE (2010)
Luo, L., Wu, W.J., Zhang, F.: Energy modeling based on cloud data center. J. Softw. 25(7), 1371–1387 (2014)
Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant Nos. 61672136, 6167060383, 61650110513, 61672004), China Postdoctoral Science (Grant No. 2016M600733), Chongqing science and Technology Commission Project (Grant Nos: cstc2017jcyjAX0142 and cstc2018jcyjAX0525), Key Research and Development Projects of Sichuan Science and Technology Department (Grant No: 2019YFG0107) and Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Science (Project ID: R51A150Z10).

Author information

Authors and Affiliations

Department of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China
Hongjian Li, Huochen Wang, Shuyong Fang & Yang Zou
Department of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, 610054, China
Hongjian Li & Wenhong Tian

Authors

Hongjian Li
View author publications
You can also search for this author in PubMed Google Scholar
Huochen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shuyong Fang
View author publications
You can also search for this author in PubMed Google Scholar
Yang Zou
View author publications
You can also search for this author in PubMed Google Scholar
Wenhong Tian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Hongjian Li or Wenhong Tian.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, H., Wang, H., Fang, S. et al. An energy-aware scheduling algorithm for big data applications in Spark. Cluster Comput 23, 593–609 (2020). https://doi.org/10.1007/s10586-019-02947-9

Download citation

Received: 28 September 2018
Revised: 15 April 2019
Accepted: 27 May 2019
Published: 04 June 2019
Issue Date: June 2020
DOI: https://doi.org/10.1007/s10586-019-02947-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An energy-aware scheduling algorithm for big data applications in Spark

Abstract

Access this article

Similar content being viewed by others

Research on Load Balancing Algorithm Optimization Based on Spark Platform

A frequency-aware and energy-saving strategy based on DVFS for Spark

Cost-Aware Scheduling and Data Skew Alleviation for Big Data Processing in Heterogeneous Cloud Environment

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An energy-aware scheduling algorithm for big data applications in Spark

Abstract

Access this article

Similar content being viewed by others

Research on Load Balancing Algorithm Optimization Based on Spark Platform

A frequency-aware and energy-saving strategy based on DVFS for Spark

Cost-Aware Scheduling and Data Skew Alleviation for Big Data Processing in Heterogeneous Cloud Environment

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation