Distributed Grey Wolf Optimizer for scheduling of workflow applications in cloud environments

https://doi.org/10.1016/j.asoc.2021.107113Get rights and content

Highlights

  • A discrete variation of DGWO for scheduling of workflow applications is proposed..

  • DGWO considers computation and data transmission costs as objectives for scheduling.

  • DGWO is experimentally a better scheduler than popular schedulers.

Abstract

Optimal scheduling of workflows in cloud computing environments is an essential element to maximize the utilization of Virtual Machines (VMs). In practice, scheduling of dependent tasks in a workflow requires distributing the tasks to the available VMs on the cloud. This paper introduces a discrete variation of the Distributed Grey Wolf Optimizer (DGWO) for scheduling dependent tasks to VMs. The scheduling process in DGWO is modeled as a minimization problem for two objectives: computation and data transmission costs. DGWO uses the largest order value (LOV) method to convert the continuous candidate solutions produced by DGWO to discrete candidate solutions. DGWO was experimentally tested and compared to well-known optimization-based scheduling algorithms (Particle Swarm Optimization (PSO), Grey Wolf Optimizer). The experimental results suggest that DGWO distributes tasks to VMs faster than the other tested algorithms. Besides, DGWO was compared to PSO and Binary PSO (BPSO) using WorkflowSim and scientific workflows of different sizes. The obtained simulation results suggest that DGWO provides the best makespan compared to the other algorithms.

Introduction

A workflow application is a commonly used term to describe applications that comprise dependent tasks (i.e., applications with data or control flow dependencies [1], [2]). Cloud computing is a term used to describe a network of remote servers on the Internet that provides several services such as storage management and data processing services [3], [4], [5]. It provides Virtual Machines (VMs) or compute resources to execute workflow applications (e.g., bioinformatics, astronomy and physics applications [6]). A key factor to the success of data processing service in cloud environments is the scheduling techniques that schedule workflow applications on cloud environment such that the cost of using compute resources is minimized. However, the task scheduling problem in cloud computing environments is an NP-hard problem [7]. Therefore, several researchers have attempted in the recent years to find solutions for the scheduling problem using optimization-based scheduling algorithms (e.g., Particle Swarm Optimization (PSO) [8], PSO with load balancing mutation [9], Pareto-based Grey Wolf Optimizer (PGWO) [10], Multi-objective ant colony system (MOACS) [11], hybrid Gravitational Search Algorithm (GSA) [12], Genetic algorithm (GA) using an adaptive penalty function [13] and hybrid Bat and Binary Bat algorithm (BBA) [14]). The optimization algorithms which are the bases of the optimization-based scheduling algorithms may easily get trapped in local optima earlier than expected because of some limitations in their exploration methods [15], [16], [17], [18], [19]. Besides, the performance of the optimization algorithms degrades when dealing with medium and high dimensional optimization problems. It is also important to note that the hybrid scheduling algorithms (e.g., hybrid GSA, hybrid BBA) require normally more execution time than the traditional scheduling optimization algorithms (e.g., GSA, BA). This is because the hybrid optimization algorithms integrate in their optimization loops one or more local or global search methods. Therefore, the length of an iteration of a hybrid algorithm is the sum of the time required for the optimization operators to complete and the time required for the integrated search method to finish execution. Besides, the execution time of an iteration varies from one iteration to another depending on the complexity of the integrated search method inside the hybrid algorithm. This means that the average length of an iteration in the hybrid optimization algorithms is longer than the average length of an iteration in the traditional optimization algorithms  [16], [17], [18].

The Distributed Grey Wolf Optimizer (DGWO) is a parallel version of the Grey Wolf Optimizer (GWO) algorithm [18]. This means that the optimization process of DGWO can be performed in parallel machines. There are two main reasons that make DGWO an interesting choice for solving optimization problems. First, the DGWO has faster convergence rate than popular optimization algorithms such as the Grey Wolf Optimizer, Cuckoo search [20], [21], [22], memory-based hybrid Dragonfly algorithm [23] and Fireworks algorithm with differential mutation [24]. Second, DGWO has one parameter (the vector a (Section 3.1)) which does not require fine tuning.

In this paper, we propose an optimization-based scheduling algorithm for workflow applications based on the DGWO algorithm. We model the task scheduling problem of workflow applications in the proposed algorithm as a minimization problem (i.e., provide a mapping of dependent tasks of a workflow application to compute resources on a cloud computing environment such that the total execution cost (i.e., computation and data transmission costs) of the application is minimized). Finally, the candidate solutions in the scheduling problems can be generated in DGWO using the largest order value (LOV) method [25] as described in Section 4.1. We experimentally evaluated DGWO against well-known optimization-based scheduling algorithms such as Particle Swarm Optimization (PSO) [8] and GWO [26] using two types of workflows: balanced and imbalanced workflows. We noticed that the overall experimental results on balanced workflows indicate that DGWO outperforms the other algorithms when applied to scheduling problems with various data sizes. Moreover, we noticed that the overall experimental results conducted on imbalanced workflows indicate more clearly that DGWO performs better than the other algorithms.

To sum up, the main contributions of this paper are summarized as follows:

  • 1.

    We introduce a discrete variation of DGWO (Algorithm 4) that can be used to solve various types of scheduling problems.

  • 2.

    We modify the dynamic scheduling algorithm proposed in [8], [10] in order to make it suitable to DGWO. Like the original dynamic scheduling algorithm, the modified algorithm can deal with both balanced and imbalanced workflow applications. Algorithm 3 aims to minimize the computation and data transmission costs.

  • 3.

    We use WorkflowSim and real scientific workflows to evaluate the performance of DGWO against the simulation results reported in [27] for two algorithms: PSO and Binary PSO (BPSO). The experimental results suggest that DGWO provides the lowest makespans for different real scientific workflows with different sizes. Besides, it indicates that the performance of DGWO improves with the increase of size of workflow compared to BPSO and PSO.

  • 4.

    We also conduct experiments to evaluate and compare DGWO to two popular scheduling algorithms, PSO and GWO, using balanced and imbalanced workflows. The experimental results show that DGWO is the fastest converging algorithm.

The rest of the paper is organized as follows: Section 2 provides a review of recent methods that attempt to solve the task scheduling problem. Section 3 provides background discussions about the Grey Wolf Optimizer algorithm, the Distributed Grey Wolf Optimizer and the task scheduling problem in workflow applications. Section 4 discusses the proposed scheduling algorithms in details. Section 5 presents the experimental results. Finally, Section 6 presents the conclusions of this paper and discusses some future work directions.

Section snippets

Recent work

The task scheduling problem in cloud computing environments is an NP-hard problem [7]. In the recent years, several researchers have attempted to find solutions for the scheduling problem using optimization-based scheduling algorithms [8], [9], [10], [11], [12], [13], [14], [28], [29], [30], [31], [32], [33], [34]. This section provides an overview of recently proposed optimization-based scheduling algorithms.

Pandey et al. [8] proposed Particle Swarm Optimization-based Heuristic, one of the

Preliminaries

This section provides a summary of some of the underlying concepts of the Grey Wolf Optimizer algorithm (Section 3.1) and the Distributed Grey Wolf Optimizer algorithm (Section 3.2). The final part of the section (Section 3.3) provides a description about the mathematical formulation of the task scheduling problem in cloud computing environments.

Distributed Grey Wolf Optimizer for cloud workflowscheduling

One of the most important services of cloud environments is the distributed computing service that provides various distributed compute resources to execute applications such as workflow applications. The scheduling process of tasks in workflow applications requires assigning the tasks to compute resources in the cloud. This process mainly depends on the availability of compute resources and the network load. Unfortunately, the scheduling process is classified as an NP-hard problem [47]. This

Experiments

This section is divided into seven subsections as follows: Section 5.1 provides the measurement of comparison, Section 5.2 describes the experimental setup for the algorithms; including data and implementation, Section 5.3 provides a comparison between the algorithms when applied to balanced workflows, Section 5.4 provides a comparison between algorithms when applied to imbalanced workflows, Section 5.5 discusses the overall experimental results and the limitations of DGWO, Section 5.6 provides

Conclusions and future work

The current paper presented an optimization-based scheduling algorithm for workflow applications based on the Distributed Grey Wolf Optimizer (DGWO) algorithm. The task scheduling problem of workflow applications was modeled as an optimization problem as follows: Provide a mapping of dependent tasks of a workflow application to compute resources on a cloud computing environment such that the total execution cost (computation and data transmission costs) of the application is minimized.

The

CRediT authorship contribution statement

Bilal H. Abed-alguni: Conceptualization, Methodology, Investigation, Validation, Writing - original draft, Supervision. Noor Aldeen Alawad: Experimentation, Visualization, Reviewing and editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References (63)

  • TejaniG.G. et al.

    Adaptive symbiotic organisms search (sos) algorithm for structural design optimization

    J. Comput. Des. Eng.

    (2016)
  • UllmanJ.D.

    Np-complete scheduling problems

    J. Comput. Syst. Sci.

    (1975)
  • Al-BetarM.A. et al.

    Island bat algorithm for optimization

    Expert Syst. Appl.

    (2018)
  • Al-BetarM.A. et al.

    Island-based harmony search for optimization problems

    Expert Syst. Appl.

    (2015)
  • MehtaS. et al.

    Scheduling data intensive scientific workflows in cloud environment using nature inspired algorithms

  • M. Schwarzkopf, A. Konwinski, M. Abd-El-Malek, J. Wilkes, Omega: flexible, scalable schedulers for large compute...
  • Fernández-CereroD. et al.

    Bullfighting extreme scenarios in efficient hyper-scale cluster computing

    Cluster Comput.

    (2020)
  • RafatH. et al.

    A semantic-based approach for managing healthcare big data: A survey

    J. Healthc. Eng.

    (2020)
  • DinhH.T. et al.

    A survey of mobile cloud computing: architecture, applications, and approaches

    Wirel. Commun. Mob. Comput.

    (2013)
  • RajagopalanA. et al.

    Optimal scheduling of tasks in cloud computing using hybrid firefly-genetic algorithm

  • PandeyS. et al.

    A particle swarm optimization-based heuristic for scheduling workflow applications in cloud computing environments

  • KhaliliA. et al.

    Optimal scheduling workflows in cloud computing environment using Pareto-based Grey Wolf Optimizer

    Concurr. Comput.: Pract. Exper.

    (2017)
  • ChenZ.-G. et al.

    Multiobjective cloud workflow scheduling: a multiple populations ant colony system approach

    IEEE Trans. Cybern.

    (2018)
  • LiuL. et al.

    Deadline-constrained coevolutionary genetic algorithm for scientific workflow scheduling in cloud computing

    Concurr. Comput.: Pract. Exper.

    (2017)
  • RaghavanS. et al.

    Bat algorithm for scheduling workflow applications in cloud

  • Abed-alguniB.H. et al.

    Hybrid whale optimization and β-hill climbing algorithm

    Int. J. Comput. Sci. Math.

    (2018)
  • Abed-alguniB.H. et al.

    Island-based whale optimization algorithm for continuous optimization problems

    Int. J. Reason. Based Intell. Syst.

    (2019)
  • Abed-alguniB.H.

    Island-based cuckoo search with highly disruptive polynomial mutatio

    Int. J. Artif. Intell.

    (2019)
  • Abed-alguniB.H. et al.

    Distributed grey wolf optimizer for numerical optimization problems

    Jordanian J. Comput. Inf. Technol. (JJCIT)

    (2018)
  • Abed-alguniB.H. et al.

    Intelligent hybrid cuckoo search and β-hill climbing algorithm

    J. King Saud Univ. Comput. Inf. Sci.

    (2018)
  • AlkhateebF. et al.

    A hybrid cuckoo search and simulated annealing algorithm

    J. Intell. Syst.

    (2017)
  • Cited by (0)

    View full text