Skip to main content

Advertisement

Log in

An energy-aware scheduling algorithm for big data applications in Spark

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Energy consumption is explosive increasing with the fast growth of big data applications. High carbon emissions from big data platforms have serious impacts on environment. In this paper, we propose an energy-aware scheduling algorithm for Spark (EASAS) to reduce energy consumption while satisfying the service level agreement (SLA). First, we present a new energy consumption model based on Spark framework. Then a strategy table for the relationship between tasks and executors is designed to record the execution time and energy consumption of tasks. The task scheduling in Spark is conducted and optimized based on the strategy table. The proposed strategy overcomes the defect of the default scheduling strategy FIFO and FAIR which cannot perceive energy consumption with the characteristics of energy consumption perception and dynamic optimization scheduling. Compared against FIFO and FAIR, Our EASAS effectively reduces on average about 25–40% of the total energy consumption of Spark applications under deadline constrains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Hintemann, R., Beucker, S., Clausen, J., Stobbe, L., Proske, M., Nissen, N.F.: Energy efficiency of data centers-A system-oriented analysis of current development trends. In: Proceedings of the Electronics Goes Green 2016 + (EGG), pp. 1–5. IEEE (2016)

  2. Salahuddin, M., Alam, K.: Information and Communication Technology, electricity consumption and economic growth in OECD countries: a panel data analysis. Int. J. Electr. Power Energy Syst. 76, 185–193 (2016)

    Article  Google Scholar 

  3. Zaharia, M., Franklin, M.J., Ghodsi, A., Gonzalez, J., Shenker, S., Stoica, I., Venkataraman, S., et al.: Apache Spark. Commun. ACM 59(11), 56–65 (2016)

    Article  Google Scholar 

  4. Zhang, A.Z.: Spark Technology Insider. Mechanical Industry Press, Beijing (2015)

    Google Scholar 

  5. Palanisamy, B., Singh, A., Liu, L.: Cost-effective resource provisioning for mapreduce in a cloud. IEEE Trans. Parallel Distrib. Syst. 26(5), 1265–1279 (2015)

    Article  Google Scholar 

  6. Mashayekhy, L., Nejad, M.M., Grosu, D., Zhang, Q., Shi, W.: Energy-aware scheduling of mapreduce jobs for big data applications. IEEE Trans. Parallel Distrib. Syst. 1, 1–1 (2015)

    Google Scholar 

  7. Buyya, R., Yeo, C.S., Venugopal, S., Broberg, J., Brandic, I.: Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility. Fut. Gener. Comput. Syst. 25(6), 599–616 (2009)

    Article  Google Scholar 

  8. Srikantaiah, S., Kansal, A., Zhao, F.: Energy aware consolidation for cloud computing. Clust. Comput. 12, 10 (2008)

    Google Scholar 

  9. Ge, R., Feng, X., Wirtz, T., Zong, Z., Chen, Z.: ETune: a power analysis framework for data-intensive computing. In: Proceedings of the 2012 41st International Conference on Parallel Processing Workshops, pp. 254–261. IEEE (2012)

  10. Zhan, J., Wang, L., Li, X., Shi, W., Weng, C., Zhang, W., Zang, X.: Cost-aware cooperative resource provisioning for heterogeneous workloads in data centers. IEEE Trans. Comput. 62(11), 2155–2168 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  11. Song, Y., Sun, Y., Shi, W.: A two-tiered on-demand resource allocation mechanism for VM-based data centers. IEEE Trans. Serv. Comput. 6(1), 116–129 (2013)

    Article  Google Scholar 

  12. Nejad, M.M., Mashayekhy, L., Grosu, D.: Truthful greedy mechanisms for dynamic virtual machine provisioning and allocation in clouds. IEEE Trans. Parallel Distrib. Syst. 26(2), 594–603 (2015)

    Article  Google Scholar 

  13. Mashayekhy, L., Nejad, M.M., Grosu, D.: Cloud federations in the sky: formation game and mechanism. IEEE Trans. Cloud Comput. 3(1), 14–27 (2015)

    Article  Google Scholar 

  14. Mashayekhy, L., Nejad, M.M., Grosu, D., Vasilakos, A.V.: Incentive-compatible online mechanisms for resource provisioning and allocation in clouds. In: Proceedings of the 2014 IEEE 7th International Conference on Cloud Computing (CLOUD), pp. 312–319. IEEE (2014)

  15. Hacker, T.J., Mahadik, K.: Flexible resource allocation for reliable virtual cluster computing systems. In: Proceedings of the 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, p. 48. ACM (2011)

  16. Rajyashree, V.R.: Double threshold based load balancing approach by using VM migration for the cloud computing environment. Int. J. Eng. Comput. Sci. 4(01), 9966–9970 (2015)

    Google Scholar 

  17. Verma, A., Ahuja, P., Neogi, A.: pMapper: power and migration cost aware application placement in virtualized systems. In: Proceedings of the 9th ACM/IFIP/USENIX International Conference on Middleware, pp. 243–264. Springer, New York (2008)

  18. Beloglazov, A., Abawajy, J., Buyya, R.: Energy-aware resource allocation heuristics for efficient management of data centers for cloud computing. Fut. Gener. Comput. Syst. 28(5), 755–768 (2012)

    Article  Google Scholar 

  19. Beloglazov, A., Buyya, R.: Optimal online deterministic algorithms and adaptive heuristics for energy and performance efficient dynamic consolidation of virtual machines in cloud data centers. Concurr. Comput. 24(13), 1397–1420 (2012)

    Article  Google Scholar 

  20. Gupta, R., Bose, S.K., Sundarrajan, S., Chebiyam, M., Chakrabarti, A.: A two stage heuristic algorithm for solving the server consolidation problem with item-item and bin-item incompatibility constraints. In: Proceedings of the 2008 SCC’08, IEEE International Conference on Services Computing, vol. 2, pp. 39–46. IEEE (2008)

  21. Dai, X., Wang, J.M., Bensaou, B.: Energy-efficient virtual machines scheduling in multi-tenant data centers. IEEE Trans. Cloud Comput. 4(2), 210–221 (2016)

    Article  Google Scholar 

  22. Quang-Hung, N., Nien, P.D., Nam, N. H., Tuong, N.H., Thoai, N.: A genetic algorithm for power-aware virtual machine allocation in private cloud. In: Proceedings of the Information and Communication Technology-EurAsia Conference, pp. 183-191. Springer, Berlin (2013)

  23. Agrawal, S., Bose, S.K., Sundarrajan, S.: Grouping genetic algorithm for solving the server consolidation problem with conflicts. In: Proceedings of the first ACM/SIGEVO Summit on Genetic and Evolutionary Computation, pp. 1–8. ACM (2009)

  24. Eberhart, R., Kennedy, J.: A new optimizer using particle swarm theory. In: Proceedings of the Sixth International Symposium on Micro Machine and Human Science, MHS’95, pp. 39–43. IEEE (1995)

  25. Del Valle, Y., Venayagamoorthy, G.K., Mohagheghi, S., Harley, R.G., Hernandez, J.C.: Particle swarm optimization: basic concepts, variants and applications in power systems. IEEE Trans. Evol. Comput. 12, 171–195 (2008)

    Article  Google Scholar 

  26. Zeng, N., Wang, Z., Zhang, H., Alsaadi, F.E.: A novel switching delayed PSO algorithm for estimating unknown parameters of lateral flow immunoassay. Cogn. Comput. 8(2), 143–152 (2016)

    Article  Google Scholar 

  27. Xiong, A.P., Xu, C.X.: Energy efficient multiresource allocation of virtual machine based on PSO in cloud data center. Math. Probl. Eng. (2014). https://doi.org/10.1155/2014/816518

    Article  Google Scholar 

  28. Li, H., Zhu, G., Cui, C., Tang, H., Dou, Y., He, C.: Energy-efficient migration and consolidation algorithm of virtual machines in data centers for cloud computing. Computing 98(3), 303–317 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  29. Li, H., Zhu, G., Zhao, Y., Dai, Y., Tian, W.: Energy-efficient and QoS-aware model based resource consolidation in cloud data centers. Clust. Comput. 20(3), 2793–2803 (2017)

    Article  Google Scholar 

  30. Ren, Z., Wan, J., Shi, W., Xu, X., Zhou, M.: Workload analysis, implications, and optimization on a production hadoop cluster: a case study on taobao. IEEE Trans. Serv. Comput. 7(2), 307–321 (2014)

    Article  Google Scholar 

  31. Gufler, B., Augsten, N., Reiser, A., Kemper, A.: Handling Data skew in MapReduce. Closer 11, 574–583 (2011)

    Google Scholar 

  32. Cheng, D., Rao, J., Guo, Y., Jiang, C., Zhou, X.: Improving performance of heterogeneous mapreduce clusters with adaptive task tuning. IEEE Trans. Parallel Distrib. Syst. 28(3), 774–786 (2017)

    Article  Google Scholar 

  33. Tian, W., Li, G., Yang, W., Buyya, R.: HScheduler: an optimal approach to minimize the makespan of multiple MapReduce jobs. J. Supercomput. 72(6), 2376–2393 (2016)

    Article  Google Scholar 

  34. Islam, M.T., Karunasekera, S., Buyya, R.: dSpark: deadline-based resource allocation for big data applications in Apache Spark. In: Proceedings of the 2017 IEEE 13th International Conference one-Science (e-Science), pp. 89–98. IEEE (2017)

  35. Sidhanta, S., Golab, W., Mukhopadhyay, S.: Optex: a deadline-aware cost optimization model for spark. In: Proceedings of the 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 193–202. IEEE (2016)

  36. Chen, J., Li, K., Tang, Z., Bilal, K., Yu, S., Weng, C., Li, K.: A parallel random forest algorithm for big data in a spark cloud computing environment. IEEE Trans. Parallel Distrib. Syst. 1, 1–1 (2017)

    Google Scholar 

  37. Yang, H., Liu, X., Chen, S., Lei, Z., Du, H., Zhu, C.: Improving Spark performance with MPTE in heterogeneous environments. In: Proceedings of the 2016 International Conference on Audio, Language and Image Processing (ICALIP), pp. 28–33. IEEE (2016)

  38. Chen, H., Wang, F.Z.: Spark on entropy: a reliable & efficient scheduler for low-latency parallel jobs in heterogeneous cloud. In: Proceedings of the 2015 IEEE 40th International Conference on Local Computer Networks Conference Workshops (LCN Workshops), pp. 708–713. IEEE (2015)

  39. Gounaris, A., Kougka, G., Tous, R., Montes, C.T., Torres, J.: Dynamic configuration of partitioning in spark applications. IEEE Trans. Parallel Distrib. Syst. 28(7), 1891–1904 (2017)

    Article  Google Scholar 

  40. Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In: Proceedings of the 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW), pp. 41–51. IEEE (2010)

  41. Luo, L., Wu, W.J., Zhang, F.: Energy modeling based on cloud data center. J. Softw. 25(7), 1371–1387 (2014)

    Google Scholar 

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (Grant Nos. 61672136, 6167060383, 61650110513, 61672004), China Postdoctoral Science (Grant No. 2016M600733), Chongqing science and Technology Commission Project (Grant Nos: cstc2017jcyjAX0142 and cstc2018jcyjAX0525), Key Research and Development Projects of Sichuan Science and Technology Department (Grant No: 2019YFG0107) and Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Science (Project ID: R51A150Z10).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Hongjian Li or Wenhong Tian.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, H., Wang, H., Fang, S. et al. An energy-aware scheduling algorithm for big data applications in Spark. Cluster Comput 23, 593–609 (2020). https://doi.org/10.1007/s10586-019-02947-9

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-019-02947-9

Keywords

Navigation