Abstract
Energy consumption is explosive increasing with the fast growth of big data applications. High carbon emissions from big data platforms have serious impacts on environment. In this paper, we propose an energy-aware scheduling algorithm for Spark (EASAS) to reduce energy consumption while satisfying the service level agreement (SLA). First, we present a new energy consumption model based on Spark framework. Then a strategy table for the relationship between tasks and executors is designed to record the execution time and energy consumption of tasks. The task scheduling in Spark is conducted and optimized based on the strategy table. The proposed strategy overcomes the defect of the default scheduling strategy FIFO and FAIR which cannot perceive energy consumption with the characteristics of energy consumption perception and dynamic optimization scheduling. Compared against FIFO and FAIR, Our EASAS effectively reduces on average about 25–40% of the total energy consumption of Spark applications under deadline constrains.
Similar content being viewed by others
References
Hintemann, R., Beucker, S., Clausen, J., Stobbe, L., Proske, M., Nissen, N.F.: Energy efficiency of data centers-A system-oriented analysis of current development trends. In: Proceedings of the Electronics Goes Green 2016 + (EGG), pp. 1–5. IEEE (2016)
Salahuddin, M., Alam, K.: Information and Communication Technology, electricity consumption and economic growth in OECD countries: a panel data analysis. Int. J. Electr. Power Energy Syst. 76, 185–193 (2016)
Zaharia, M., Franklin, M.J., Ghodsi, A., Gonzalez, J., Shenker, S., Stoica, I., Venkataraman, S., et al.: Apache Spark. Commun. ACM 59(11), 56–65 (2016)
Zhang, A.Z.: Spark Technology Insider. Mechanical Industry Press, Beijing (2015)
Palanisamy, B., Singh, A., Liu, L.: Cost-effective resource provisioning for mapreduce in a cloud. IEEE Trans. Parallel Distrib. Syst. 26(5), 1265–1279 (2015)
Mashayekhy, L., Nejad, M.M., Grosu, D., Zhang, Q., Shi, W.: Energy-aware scheduling of mapreduce jobs for big data applications. IEEE Trans. Parallel Distrib. Syst. 1, 1–1 (2015)
Buyya, R., Yeo, C.S., Venugopal, S., Broberg, J., Brandic, I.: Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility. Fut. Gener. Comput. Syst. 25(6), 599–616 (2009)
Srikantaiah, S., Kansal, A., Zhao, F.: Energy aware consolidation for cloud computing. Clust. Comput. 12, 10 (2008)
Ge, R., Feng, X., Wirtz, T., Zong, Z., Chen, Z.: ETune: a power analysis framework for data-intensive computing. In: Proceedings of the 2012 41st International Conference on Parallel Processing Workshops, pp. 254–261. IEEE (2012)
Zhan, J., Wang, L., Li, X., Shi, W., Weng, C., Zhang, W., Zang, X.: Cost-aware cooperative resource provisioning for heterogeneous workloads in data centers. IEEE Trans. Comput. 62(11), 2155–2168 (2013)
Song, Y., Sun, Y., Shi, W.: A two-tiered on-demand resource allocation mechanism for VM-based data centers. IEEE Trans. Serv. Comput. 6(1), 116–129 (2013)
Nejad, M.M., Mashayekhy, L., Grosu, D.: Truthful greedy mechanisms for dynamic virtual machine provisioning and allocation in clouds. IEEE Trans. Parallel Distrib. Syst. 26(2), 594–603 (2015)
Mashayekhy, L., Nejad, M.M., Grosu, D.: Cloud federations in the sky: formation game and mechanism. IEEE Trans. Cloud Comput. 3(1), 14–27 (2015)
Mashayekhy, L., Nejad, M.M., Grosu, D., Vasilakos, A.V.: Incentive-compatible online mechanisms for resource provisioning and allocation in clouds. In: Proceedings of the 2014 IEEE 7th International Conference on Cloud Computing (CLOUD), pp. 312–319. IEEE (2014)
Hacker, T.J., Mahadik, K.: Flexible resource allocation for reliable virtual cluster computing systems. In: Proceedings of the 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, p. 48. ACM (2011)
Rajyashree, V.R.: Double threshold based load balancing approach by using VM migration for the cloud computing environment. Int. J. Eng. Comput. Sci. 4(01), 9966–9970 (2015)
Verma, A., Ahuja, P., Neogi, A.: pMapper: power and migration cost aware application placement in virtualized systems. In: Proceedings of the 9th ACM/IFIP/USENIX International Conference on Middleware, pp. 243–264. Springer, New York (2008)
Beloglazov, A., Abawajy, J., Buyya, R.: Energy-aware resource allocation heuristics for efficient management of data centers for cloud computing. Fut. Gener. Comput. Syst. 28(5), 755–768 (2012)
Beloglazov, A., Buyya, R.: Optimal online deterministic algorithms and adaptive heuristics for energy and performance efficient dynamic consolidation of virtual machines in cloud data centers. Concurr. Comput. 24(13), 1397–1420 (2012)
Gupta, R., Bose, S.K., Sundarrajan, S., Chebiyam, M., Chakrabarti, A.: A two stage heuristic algorithm for solving the server consolidation problem with item-item and bin-item incompatibility constraints. In: Proceedings of the 2008 SCC’08, IEEE International Conference on Services Computing, vol. 2, pp. 39–46. IEEE (2008)
Dai, X., Wang, J.M., Bensaou, B.: Energy-efficient virtual machines scheduling in multi-tenant data centers. IEEE Trans. Cloud Comput. 4(2), 210–221 (2016)
Quang-Hung, N., Nien, P.D., Nam, N. H., Tuong, N.H., Thoai, N.: A genetic algorithm for power-aware virtual machine allocation in private cloud. In: Proceedings of the Information and Communication Technology-EurAsia Conference, pp. 183-191. Springer, Berlin (2013)
Agrawal, S., Bose, S.K., Sundarrajan, S.: Grouping genetic algorithm for solving the server consolidation problem with conflicts. In: Proceedings of the first ACM/SIGEVO Summit on Genetic and Evolutionary Computation, pp. 1–8. ACM (2009)
Eberhart, R., Kennedy, J.: A new optimizer using particle swarm theory. In: Proceedings of the Sixth International Symposium on Micro Machine and Human Science, MHS’95, pp. 39–43. IEEE (1995)
Del Valle, Y., Venayagamoorthy, G.K., Mohagheghi, S., Harley, R.G., Hernandez, J.C.: Particle swarm optimization: basic concepts, variants and applications in power systems. IEEE Trans. Evol. Comput. 12, 171–195 (2008)
Zeng, N., Wang, Z., Zhang, H., Alsaadi, F.E.: A novel switching delayed PSO algorithm for estimating unknown parameters of lateral flow immunoassay. Cogn. Comput. 8(2), 143–152 (2016)
Xiong, A.P., Xu, C.X.: Energy efficient multiresource allocation of virtual machine based on PSO in cloud data center. Math. Probl. Eng. (2014). https://doi.org/10.1155/2014/816518
Li, H., Zhu, G., Cui, C., Tang, H., Dou, Y., He, C.: Energy-efficient migration and consolidation algorithm of virtual machines in data centers for cloud computing. Computing 98(3), 303–317 (2016)
Li, H., Zhu, G., Zhao, Y., Dai, Y., Tian, W.: Energy-efficient and QoS-aware model based resource consolidation in cloud data centers. Clust. Comput. 20(3), 2793–2803 (2017)
Ren, Z., Wan, J., Shi, W., Xu, X., Zhou, M.: Workload analysis, implications, and optimization on a production hadoop cluster: a case study on taobao. IEEE Trans. Serv. Comput. 7(2), 307–321 (2014)
Gufler, B., Augsten, N., Reiser, A., Kemper, A.: Handling Data skew in MapReduce. Closer 11, 574–583 (2011)
Cheng, D., Rao, J., Guo, Y., Jiang, C., Zhou, X.: Improving performance of heterogeneous mapreduce clusters with adaptive task tuning. IEEE Trans. Parallel Distrib. Syst. 28(3), 774–786 (2017)
Tian, W., Li, G., Yang, W., Buyya, R.: HScheduler: an optimal approach to minimize the makespan of multiple MapReduce jobs. J. Supercomput. 72(6), 2376–2393 (2016)
Islam, M.T., Karunasekera, S., Buyya, R.: dSpark: deadline-based resource allocation for big data applications in Apache Spark. In: Proceedings of the 2017 IEEE 13th International Conference one-Science (e-Science), pp. 89–98. IEEE (2017)
Sidhanta, S., Golab, W., Mukhopadhyay, S.: Optex: a deadline-aware cost optimization model for spark. In: Proceedings of the 2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 193–202. IEEE (2016)
Chen, J., Li, K., Tang, Z., Bilal, K., Yu, S., Weng, C., Li, K.: A parallel random forest algorithm for big data in a spark cloud computing environment. IEEE Trans. Parallel Distrib. Syst. 1, 1–1 (2017)
Yang, H., Liu, X., Chen, S., Lei, Z., Du, H., Zhu, C.: Improving Spark performance with MPTE in heterogeneous environments. In: Proceedings of the 2016 International Conference on Audio, Language and Image Processing (ICALIP), pp. 28–33. IEEE (2016)
Chen, H., Wang, F.Z.: Spark on entropy: a reliable & efficient scheduler for low-latency parallel jobs in heterogeneous cloud. In: Proceedings of the 2015 IEEE 40th International Conference on Local Computer Networks Conference Workshops (LCN Workshops), pp. 708–713. IEEE (2015)
Gounaris, A., Kougka, G., Tous, R., Montes, C.T., Torres, J.: Dynamic configuration of partitioning in spark applications. IEEE Trans. Parallel Distrib. Syst. 28(7), 1891–1904 (2017)
Huang, S., Huang, J., Dai, J., Xie, T., Huang, B.: The HiBench benchmark suite: characterization of the MapReduce-based data analysis. In: Proceedings of the 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW), pp. 41–51. IEEE (2010)
Luo, L., Wu, W.J., Zhang, F.: Energy modeling based on cloud data center. J. Softw. 25(7), 1371–1387 (2014)
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grant Nos. 61672136, 6167060383, 61650110513, 61672004), China Postdoctoral Science (Grant No. 2016M600733), Chongqing science and Technology Commission Project (Grant Nos: cstc2017jcyjAX0142 and cstc2018jcyjAX0525), Key Research and Development Projects of Sichuan Science and Technology Department (Grant No: 2019YFG0107) and Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Science (Project ID: R51A150Z10).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Li, H., Wang, H., Fang, S. et al. An energy-aware scheduling algorithm for big data applications in Spark. Cluster Comput 23, 593–609 (2020). https://doi.org/10.1007/s10586-019-02947-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-019-02947-9