Elsevier

Future Generation Computer Systems

Volume 86, September 2018, Pages 480-506
Future Generation Computer Systems

A hyper-heuristic cost optimisation approach for Scientific Workflow Scheduling in cloud computing

https://doi.org/10.1016/j.future.2018.03.055Get rights and content

Highlights

  • Proposes a completion time driven hyper-heuristic approach for cost optimisation.

  • Provides a comprehensive background on scientific workflow scheduling in cloud.

  • Proposed approach helps in saving the cost and time of the cloud service providers.

  • Significantly optimises the execution cost and time compared to baseline approaches.

Abstract

Effective management of Scientific Workflow Scheduling (SWFS) processes in a cloud environment remains a challenging task when dealing with large and complex Scientific Workflow Applications (SWFAs). Cost optimisation of SWFS benefits cloud service consumers and providers by reducing temporal and monetary costs in processing SWFAs. However, cost optimisation performance of SWFS approaches is affected by the inherent nature of the SWFA as well as various types of scenarios that depend on the number of available virtual machines and varied sizes of SWFA datasets. Cost optimisation performance of existing SWFS approaches is still not satisfactory for all considered scenarios. Thus, there is a need to propose a dynamic hyper-heuristic approach that can effectively optimise the cost of SWFS for all different scenarios. This can be done by employing different meta-heuristic algorithms in order to utilise their strengths for each scenario. Thus, the main objective of this paper is to propose a Completion Time Driven Hyper-Heuristic (CTDHH) approach for cost optimisation of SWFS in a cloud environment. The CTDHH approach employs four well-known population-based meta-heuristic algorithms, which act as Low Level Heuristic (LLH) algorithms. In addition, the CTDHH approach enhances the native random selection way of existing hyper-heuristic approaches by incorporating the best computed workflow completion time to act as a high-level selector to dynamically pick a suitable algorithm from the pool of LLH algorithms after each run. A real-world cloud based experimentation environment has been considered to evaluate the performance of the proposed CTDHH approach by comparing it with five baseline approaches, i.e. four population-based approaches and an existing hyper-heuristic approach named Hyper-Heuristic Scheduling Algorithm (HHSA). Several different scenarios have also been considered to evaluate data-intensiveness and computation-intensive performance. Based on the results of the experimental comparison, the proposed approach has proven to yield the most effective performance results for all considered experimental scenarios.

Introduction

Scheduling the submitted Scientific Workflow Application (SWFA) tasks to the available computational resources while optimising the cost of executing the SWFA is one of the most challenging processes of Workflow Management System (WfMS) in a cloud computing environment [[1], [2], [3]]. The cost optimisation challenge of SWFS in cloud requires the consideration of three main perspectives: (i) strong inter-dependencies between the SWFA tasks. This computational-intensiveness of tasks introduces complexity to the scheduling processes, since the data of the SWFA tasks needs to be transferred between the computational resources (i.e. VMs) of cloud computing, (ii) different sizes of SWFA datasets that need to be considered by the scheduler, which significantly affect the execution time and execution cost [[4], [5]]. Thus, data-intensiveness needs to be considered while calculating the completion time (makespan) and total cost of executing the tasks of SWFA on available resources (i.e. VMs), and (iii) different numbers of computational resource (VMs) based on the user requirements. Accordingly, considering the abovementioned perspectives makes the SWFS process more complicated and requires a great amount of computational resources in terms of completion time and total computational cost [[6], [7], [8]]. Moreover, a large size of SWFA datasets can cause a significant increase in the dependency among workflow tasks, and this resultant tasks dependency ultimately increases the completion time required to process the SWFA in the available resources. Ultimately, any delay in completion time can negatively impacts on cost optimisation of SWFS.

In the literature, several types of approaches have been proposed for SWFS [[2], [7], [9], [10], [11]]. The existing population-based meta-heuristics solutions have shown good performance for the optimisation of the large search space problem. In contrast, single-based meta-heuristic solutions do not exhaustively search within the scheduling problem space, yet they use different underlying strategies to find the desired solution based on defined fitness criteria. Therefore, the population-based meta-heuristic solution takes less computational effort as compared to single-based solution while it can often find good solutions [[12], [13], [14], [15]]. However, each of these approaches has its strengths and limitations, which affect the SWFS processes. The hybrid meta-heuristics use the best features of two or more traditional meta-heuristics (e.g., Genetic Algorithm, Ant Colony Optimisation) in each iteration to provide a better optimal solution. Due to the complexity of hybrid meta-heuristics method, it might take a longer convergence time than the traditional meta-heuristics for each iteration [[3], [16], [17]]. On the other hand, the hyper-heuristic approach is an emerging class of meta-heuristic algorithms, which is integrated in such a manner that allows utilising the maximum strengths of the employed meta-heuristic algorithms to obtain an optimal solution. The hyper-heuristic mechanisms achieve better performance in terms of shorter execution time compared to other optimisation mechanisms. Additionally, hyper meta-heuristic is a class of new advanced techniques that are capable of accelerating the run-time of a single meta-heuristic algorithm. There are only few works [[18], [19], [20], [21]] that have considered utilising hyper-heuristic for SWFS, while hyper-heuristic can always find the most cost optimal solutions for different scenarios. Thus, the aim of this research is to propose a completion time driven hyper-heuristic approach for cost optimisation of SWFS in a cloud environment for several scenarios. Proposing such approach can optimise the completion time as well as the total computational cost by dynamically selecting the most suitable meta-heuristic algorithm based on completion time performance of the employed meta-heuristic algorithms.

The proposed Completion Time Driven Hyper-Heuristic (CTDHH) approach employs four well-known population-based meta-heuristic algorithms, which act as Low Level Heuristic (LLH) algorithms (i.e., genetic algorithm, particle swarm optimisation, invasive weed optimisation, and hybrid invasive weed optimisation). In addition, the proposed algorithm enhances the native random selection way of existing hyper-heuristic solutions by incorporating the best computed workflow completion time to act as a high-level selector to pick a suitable algorithm from the pool of LLH algorithms after each run. The main aim of the proposed approach is to reduce the completion time and total computational cost to execute the SWFA. Based on the lowest achieved completion time, the proposed algorithm dynamically guides the searching processes to find an optimal solution by continuously sorting the computed time scores (i.e. completion times of previous runs) of all the employed LLH algorithms for each considered scenario and after every run. The computed time scores are listed in a scoreboard table. Next, for each single run, the high-level selector adopts the LLH algorithm that has the lowest computed time score for each scenario. The proposed dynamic hyper-heuristic algorithm continuously updates the scoreboard table by replacing the existing time score with the lowest computed time score, which ultimately affects the total computational cost value for that run. Finally, based on the scoreboard table, the proposed approach selects the most appropriate LLH algorithm for the next run. Consequently, the mechanism of the proposed completion time driven hyper-heuristic approach becomes more effective in allowing to reuse and utilise the maximum strengths of the employed LLH algorithms in searching for the optimal solution of the targeted cost optimisation problem.

To evaluate and analyse the performance of cost optimisation parameters (i.e. completion time and total computational cost) of the proposed approach, the authors have evaluated the proposed approach in a real-world cloud based experimental environment by comparing the proposed approach with the baseline approaches, i.e. four population-based approaches and an existing hyper-heuristic approach named Hyper-Heuristic Scheduling Algorithm (HHSA). Furthermore, there are several considered scenarios with different numbers of VMs, and different sizes of SWFA datasets. The data that are collected from the real-world based environments have been analysed based on the completion time (makespan) and total computational cost parameters.

There are several contributions that have been gained from conducting this research. The following are the key contributions of this research:

    A completion time driven hyper-heuristic approach for cost optimisation of SWFS has been proposed. The proposed CTDHH approach helps in optimising the completion time (makespan) and total execution cost of SWFS in the cloud computing environment.

    The proposed CTDHH approach is profitable for service consumers, by way of reducing the total computational cost utilising the computational resources of the cloud. At the same time, the proposed approach provides more satisfying user requirements (i.g. shorter completion time and cheaper computational cost).

    The proposed CTDHH approach helps in saving the energy and time of the service providers, by judiciously utilising the computational resources. This would ultimately help in reducing the computation cost as well as handling the computation-intensive and data-intensive SWFAs.

    This research would open new doors for a high impact research with innovative values through SWFA and cloud computing.

In order to simplify the readability of the paper, the abbreviation list containing terms used in this paper is shown in Table 1.

This paper is organised as follows: In Section two, a comprehensive definition about the SWFS, and cost optimisation of SWFS is provided. Section three describes the proposed Completion Time Driven Hyper-Heuristic (CTDHH) approach along with an example of the proposed algorithm. Section four discusses the evaluation part by using a real-world cloud based experimentation environment. Next, in Section five, the results and discussion of this research paper are described. Then, Section six provides the related works and finally, Section seven provides a conclusion and future work.

Section snippets

Background

In this section, the authors provide a comprehensive detail about the main stages of Scientific Workflow Scheduling (SWFS) and cost optimisation of SWFS.

Completion time driven hyper-heuristic approach

This section presents the proposed dynamic hyper-heuristic algorithm for the cost optimisation challenge of SWFS in cloud environment. The proposed algorithm is considered as a new advanced technique capable of reducing the computation time and total computational cost. In the following sections (Sections 3.1 Dynamic Hyper-Heuristic Algorithm (DHHA), 3.2 Example of the proposed algorithm), the proposed algorithm and an explanatory example of the proposed algorithm are presented. The proposed

Evaluation of the proposed CTDHH approach

The WfMS in cloud computing has the ability to handle the requests from different domains of SWFAs. To execute the SWFA datasets, high performance resources such as supercomputers, need to be delivered by the service provider (i.e. infrastructure as a service) [[22], [59], [60], [61], [62]]. By utilising the WfMSs, the cloud services enable the scientists to define multi-stage computational and data processing pipelines that can be executed as resources with predefined QoS. In this way, the

Results and discussion

For the real-world evaluation environment, the proposed approach Completion Time Driven Hyper-Heuristic (CTDHH) approach has been compared with four baseline meta-heuristic approaches (i.e Genetic Algorithm (GA), Particle Swarm Optimisation (PSO), Invasive Weed Optimisation (IWO), and Hybrid Invasive Weed Optimisation (HIWO)) and an existing hyper-heuristic approach named Hyper-Heuristic Scheduling Algorithm (HHSA).

There are nine scenarios that have been utilised in the experiments and each of

Related works

In the literature, there are several types of meta-heuristic combinations that have been proposed for the scheduling problem. The most efficient technique is by hybridising two or more algorithms into a single meta-heuristic algorithm. The hybrid algorithms have shown a good performance for cost optimisation of SWFS problem by leveraging strengths of the combined algorithms. The single meta-heuristic algorithms are easy to implement in WfMS of cloud environment compared with hybrid

Conclusion

CTDHH has been presented for cost optimisation challenge of SWFS in a cloud environment. The proposed algorithm is considered as a new advanced technique that is capable of accelerating the run-time of a meta-heuristic algorithm. CTDHH uses the High Level Heuristic (HLH) strategy by employing four well-known population-based meta-heuristic algorithms, which act as the Low Level Heuristic (LLH). The main purpose of HLH strategy is to intelligently guide the search process based on the

Acknowledgement

This work has been developed within the framework of the research project supported by University Malaya Research Grant (UMRG) Scheme with the grant reference RG114-12ICT, awarded by University of Malaya, Malaysia .

Ehab Nabiel Alkhanak is a Ph.D. graduate from the University of Malaya in Malaysia. He has completed his Ph.D. in computer science (Software Engineering). He received a Master of Computer Science in 2009 from the same university. He has published several ISI-cited papers and his research interests are service oriented architecture, web-services, cost optimisation, workflow scheduling, scientific workflow application, meta-heuristic methods and cloud computing.

References (80)

  • LombardiF. et al.

    Secure virtualization for cloud computing

    J. Netw. Comput. Appl.

    (2011)
  • LiuX. et al.

    A novel general framework for automatic and cost-effective handling of recoverable temporal violations in scientific workflow systems

    J. Syst. Softw.

    (2011)
  • LiJ. et al.

    Cost-efficient coordinated scheduling for leasing cloud resources on hybrid workloads

    Parallel Comput.

    (2015)
  • AbrishamiS. et al.

    Deadline-constrained workflow scheduling in software as a service cloud

    Sci. Iran.

    (2012)
  • AlkhanakE.N. et al.

    Cost-aware challenges for workflow scheduling approaches in cloud computing environments: taxonomy and opportunities

    Future Gener. Comput. Syst.

    (2015)
  • StevensT. et al.

    Multi-cost job routing and scheduling in grid networks

    Future Gener. Comput. Syst.

    (2009)
  • KousalyaG. et al.

    Workflow scheduling algorithms and approaches

  • KhanS.U. et al.

    A cooperative game theoretical technique for joint optimization of energy consumption and response time in computational grids

    IEEE Trans. Parallel Distrib. Syst.

    (2009)
  • ArabnejadH. et al.

    List scheduling algorithm for heterogeneous systems by an optimistic cost table

    IEEE Trans. Parallel Distrib. Syst.

    (2014)
  • MalawskiM. et al.

    Cost optimization of execution of multi-level deadline-constrained scientific workflows on clouds

  • DiesteO. et al.

    Developing search strategies for detecting relevant experiments

    Empir. Softw. Eng.

    (2009)
  • ChenW. et al.

    Dynamic and fault-tolerant clustering for scientific workflows

    IEEE Trans. Cloud Comput.

    (2016)
  • BalaA. et al.

    Autonomic fault tolerant scheduling approach for scientific workflows in cloud computing

    Concurr. Eng.

    (2015)
  • de OliveiraD. et al.

    A provenance-based adaptive scheduling heuristic for parallel scientific workflows in clouds

    J. Grid Comput.

    (2012)
  • PandeyS. et al.

    A particle swarm optimization-based heuristic for scheduling workflow applications in cloud computing environments

  • AmbatiB.K. et al.

    Heuristic combinatorial optimization by simulated darwinian evolution: a polynomial time algorithm for the traveling salesman problem

    Biol. Cybernet.

    (1991)
  • TsaiC.-W. et al.

    Metaheuristic scheduling for cloud: A survey

    IEEE Syst. J.

    (2013)
  • SinghL. et al.

    A survey of workflow scheduling algorithms and research issues

    Int. J. Comput. Appl.

    (2013)
  • TsaiC.-W. et al.

    A hyper-heuristic scheduling algorithm for cloud

    IEEE Trans. Cloud Comput.

    (2014)
  • CowlingP.I. et al.

    Using a large set of low level heuristics in a hyperheuristic approach to personnel scheduling

  • LinJ. et al.

    Differential evolution based hyper-heuristic for the flexible job-shop scheduling problem with fuzzy processing time

  • WuZ. et al.

    A market-oriented hierarchical scheduling strategy in cloud workflow systems

    J. Supercomput.

    (2013)
  • BittencourtL.F. et al.

    HCOC: a cost optimization algorithm for workflow scheduling in hybrid clouds

    J. Internet Serv. Appl.

    (2011)
  • KaurN. et al.

    Comparison of workflow scheduling algorithms in cloud computing

    Int. J. Adv. Comput. Sci. Appl.

    (2011)
  • MaY. et al.

    Marginal pricing based scheduling strategy of scientific workflow using cost-gradient metric

  • YuanD. et al.

    A cost-effective strategy for intermediate data storage in scientific cloud workflow systems

  • MiuT. et al.

    Predicting the execution time of workflow activities based on their input features

  • TalukderA. et al.

    Multiobjective differential evolution for scheduling workflow applications on global grids

    Concurr. Comput.: Pract. Exper.

    (2009)
  • Saeid AbrishamiD.E. et al.

    Deadline-constrained workflow scheduling algorithms for infrastructure as a service clouds

    Future Gener. Comput. Syst.

    (2013)
  • YuJ. et al.

    A budget constrained scheduling of workflow applications on utility grids using genetic algorithms

  • Cited by (65)

    • Use of whale optimization algorithm and its variants for cloud task scheduling: a review

      2023, Handbook of Whale Optimization Algorithm: Variants, Hybrids, Improvements, and Applications
    • A self-adaptive hyper-heuristic based multi-objective optimisation approach for integrated supply chain scheduling problems

      2022, Knowledge-Based Systems
      Citation Excerpt :

      The latter is mostly based on meta-heuristics, including: particle swarm-based HH [40], and bacteria forging based HH [41]. These approaches have already solved many scheduling optimisation problems, such as flow-shop scheduling [42], job-shop scheduling [43], flexible job-shop scheduling [44], and scientific workflow scheduling [15]. However, interested readers are referred to the paper of Ozsoydan and Sağir [45] for the recent HH approaches and their applications.

    View all citing articles on Scopus

    Ehab Nabiel Alkhanak is a Ph.D. graduate from the University of Malaya in Malaysia. He has completed his Ph.D. in computer science (Software Engineering). He received a Master of Computer Science in 2009 from the same university. He has published several ISI-cited papers and his research interests are service oriented architecture, web-services, cost optimisation, workflow scheduling, scientific workflow application, meta-heuristic methods and cloud computing.

    Sai Peck Lee is a professor at Faculty of Computer Science and Information Technology, University of Malaya. She obtained her Master of Computer Science from University of Malaya, her Diplôme d’Études Approfondies (D.E.A.) in Computer Science from Université Pierre et Marie Curie (Paris VI) and her Ph.D. degree in Computer Science from Université Panthéon-Sorbonne (Paris I). Her current research interests in Software Engineering include Object-Oriented Techniques and CASE tools, Software Reuse, Requirements Engineering, Application and Persistence Frameworks, Information Systems and Database Engineering. She has published an academic book, a few book chapters as well as more than 100 papers in various local and international conferences and journals. She has been an active member in the reviewer committees and programme committees of several local and international conferences. She is currently in several Experts Referee Panels, both locally and internationally.

    View full text