Abstract
Cloud computing is a suitable platform for workflows that work with massive data and big data. Through virtualization, cloud computing converts physical infrastructures to virtual machines (VMs). Virtual machines can meet fluctuating and dynamic requests through simpler management. Workflow scheduling in cloud computing is important, concerning the fact that proper scheduling can enhance the efficiency of the cloud and good scheduling can cause energy consumption reduction. As energy efficiency is one of the most important issues in cloud computing, in this paper a new statistical analysis-based algorithm is suggested for defining similarities of input workflows. The proposed algorithm, which is called massive data similarity statistics analysis algorithm (MSSA), classifies virtual machines into virtual clusters and it executes scheduling by reforming the virtual clusters. Furthermore, MSSA investigates the similarities of message passing in two different periods; it decides for the next period, and finally, carries out the load balancing by a new method for transferring the machines in virtual clusters. The results of simulation with CloudSim show that the proposed algorithm is more energy efficient in comparison with traditional methods, like FIFO, and heuristic methods such as BlindPick, and relatively new method, named eOO as well as makespan. The main parameter for comparing is makespan and energy consumption. The results showed that the proposed method is more energy efficient compared with similar algorithms and it reduced the makespan significantly.
Similar content being viewed by others
References
Schomm F, Stahl F, Vossen G (2013) Marketplaces for data: an initial survey. ACM SIGMOD Rec 42(1):15–26
Sedaghat M, Hern F, Elmroth E (2011) Unifying cloud management: towards overall governance of business level objectives. In: 2011 11th IEEE/ACM international symposium on cluster, cloud and grid computing. IEEE, pp 591–597
Panda SK, Jana PK (2015) Efficient task scheduling algorithms for heterogeneous multi-cloud environment. J Supercomput 71(4):1505–1533
Djebbar EI, Belalem G (2013) Optimization of tasks scheduling by an efficacy data placement and replication in cloud computing. In: International Conference on Algorithms and Architectures for Parallel Processing. Springer, pp 22–29
Duy TVT, Sato Y, Inoguchi Y (2010) Performance evaluation of a green scheduling algorithm for energy savings in cloud computing. In: 2010 IEEE international symposium on parallel and distributed processing, workshops and PhD forum (IPDPSW). IEEE, pp 1–8
Rani BK, Babu AV (2015) Scheduling of big data application workflows in cloud and inter-cloud environments. In: 2015 IEEE International Conference on Big Data (big data). IEEE, pp 2862–2864
Zhang F, Cao J, Hwang K, Li K, Khan SU (2014) Adaptive workflow scheduling on cloud computing platforms with iterativeordinal optimization. IEEE Trans Cloud Comput 3(2):156–168
Xiao P, Hu Z-G, Zhang Y-P (2013) An energy-aware heuristic scheduling for data-intensive workflows in virtualized datacenters. J Comput Sci Technol 28(6):948–961
Zhang F, Cao J, Tan W, Khan SU, Li K, Zomaya AY (2014) Evolutionary scheduling of dynamic multitasking workloads for big-data analytics in elastic cloud. IEEE Trans Emerg Top Comput 2(3):338–351
Madni SHH, AbdLatiff MS, Coulibaly Y (2016) Resource scheduling for infrastructure as a service (IAAS) in cloud computing: challenges and opportunities. J Netw Comput Appl 68:173–200
Smanchat S, Viriyapant K (2015) Taxonomies of workflow scheduling problem and techniques in the cloud. Futur Gener Comput Syst 52:1–12
Alkhanak EN, Lee SP, Khan SUR (2015) Cost-aware challenges for workflow scheduling approaches in cloud computing environments: taxonomy and opportunities. Futur Gener Comput Syst 50:3–21
Mansouri N, Dastghaibyfard GH, Mansouri E (2013) Combination of data replication and scheduling algorithm for improving data availability in data grids. J Netw Comput Appl 36(2):711–722
Zhang F, Cao J, Li K, Khan SU, Hwang K (2014) Multi-objective scheduling of many tasks in cloud platforms. Futur Gener Comput Syst 37:309–320
Hanani A, Rahmani AM, Sahafi A (2017) A multi-parameter scheduling method of dynamic workloads for big data calculation in cloud computing. J Supercomput 73(11):4796–4822
Navimipour NJ (2015) Task scheduling in the cloud environments based on an artificial bee colony algorithm. In: International Conference on Image Processing, pp 38–44
Qin P, Dai B, Huang B, Xu G (2015) Bandwidth-aware scheduling with SDN in Hadoop: a new trend for big data. IEEE Syst J 11(4):2337–2344
Mashayekhy L, Nejad MM, Grosu D, Zhang Q, Shi W (2014) Energy-aware scheduling of mapreduce jobs for big data applications. IEEE Trans Parallel Distrib Syst 26(10):2720–2733
Bodík P, Menache I, Naor J, Yaniv J (2014) Deadline-aware scheduling of big-data processing jobs. In: Proceedings of the 26th ACM symposium on parallelism in algorithms and architectures, pp 211–213
Abouelela M, El-Darieby M (2016) Scheduling big data applications within advance reservation framework in optical grids. Appl Soft Comput 38:1049–1059
Li X, Song J, Huang B (2016) A scientific workflow management system architecture and its scheduling based on cloud service platform for manufacturing big data analytics. Int J Adv Manuf Technol 84(1–4):119–131
Gautam JV, Prajapati HB, Dabhi VK, Chaudhary S (2015) A survey on job scheduling algorithms in big data processing. In: 2015 IEEE International Conference on Electrical, Computer and Communication Technologies (ICECCT). IEEE, pp 1–11
Wang K, Raicu I (2014) Scheduling data-intensive many-task computing applications in the cloud. In: NSFCloud workshop
Bardhan S, Menascé DA (2014) A contention aware hybrid evaluator for schedulers of big data applications in computer clusters. In: 2014 IEEE International Conference on Big Data (big data). IEEE, pp 11–19
Zhao Y, Fei X, Raicu I, Lu S (2011) Opportunities and challenges in running scientific workflows on the cloud. In: 2011 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery. IEEE, pp 455–462
Dashti SE, Rahmani AM (2016) Dynamic VMs placement for energy efficiency by PSO in cloud computing. J Exp Theor Artif Intell 28(1–2):97–112
Lorch JR, Smith AJ (2001) Improving dynamic voltage scaling algorithms with PACE. ACM SIGMETRICS Perform Evaluat Rev 29(1):50–61
Lee YC, Zomaya AY (2010) Energy conscious scheduling for distributed computing systems under different operating conditions. IEEE Trans Parallel Distrib Syst 22(8):1374–1381
Topcuoglu H, Hariri S, Wu M-Y (2002) Performance-effective and low-complexity task scheduling for heterogeneous computing. IEEE Trans Parallel Distrib Syst 13(3):260–274
Wang L, Von Laszewski G, Dayal J, Wang F (2010) Towards energy aware scheduling for precedence constrained parallel tasks in a cluster with DVFS. In: 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing. IEEE, pp 368–377
Kimura H, Sato M, Hotta Y, Boku T, Takahashi D (2006) Emprical study on reducing energy of parallel programs using slack reclamation by dvfs in a power-scalable high performance cluster. In: 2006 IEEE International Conference on Cluster Computing. IEEE, pp 1–10
Tang Z, Qi L, Cheng Z, Li K, Khan SU, Li K (2016) An energy-efficient task scheduling algorithm in DVFS-enabled cloud environment. J Grid Comput 14(1):55–74
Zhong X, Xu C-Z (2007) Energy-aware modeling and scheduling for dynamic voltage scaling with statistical real-time guarantee. IEEE Trans Comput 56(3):358–372
Bini E, Buttazzo G, Lipari G (2009) Minimizing CPU energy in real-time systems with discrete speed management. ACM Trans Embed Comput Syst (TECS) 8(4):1–23
Quan G, Hu XS (2007) Energy efficient dvs schedule for fixed-priority real-time systems. ACM Trans Embed Comput Syst (TECS) 6(4):29
Zhuo J, Chakrabarti C (2008) Energy-efficient dynamic task scheduling algorithms for DVS systems. ACM Trans Embed Comput Syst (TECS) 7(2):1–25
Juarez F, Ejarque J, Badia RM (2018) Dynamic energy-aware scheduling for parallel task-based application in cloud computing. Futur Gener Comput Syst 78:257–271
Duan H, Chen C, Min G, Wu Y (2017) Energy-aware scheduling of virtual machines in heterogeneous cloud computing systems. Futur Gener Comput Syst 74:142–150
Wen Y, Liu J, Dou W, Xu X, Cao B, Chen J (2020) Scheduling workflows with privacy protection constraints for big data applications on cloud. Futur Gener Comput Syst 108:1084–1091
Elhoseny M, Abdelaziz A, Salama AS, Riad AM, Muhammad K, Sangaiah AK (2018) A hybrid model of internet of things and cloud computing to manage big data in health services applications. Futur Gener Comput Syst 86:1383–1394
Alboaneen D, Tianfield H, Zhang Y, Pranggono B (2021) A metaheuristic method for joint task scheduling and virtual machine placement in cloud data centers. Futur Gener Comput Syst 115:201–212
Zhao Q, Xiong C, Yu C, Zhang C, Zhao X (2016) A new energy-aware task scheduling method for data-intensive applications in the cloud. J Netw Comput Appl 59:14–27
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Grami, M. An energy-aware scheduling of dynamic workflows using big data similarity statistical analysis in cloud computing. J Supercomput 78, 4261–4289 (2022). https://doi.org/10.1007/s11227-021-04016-8
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-021-04016-8