Scheduling parameter sweep workflow in the Grid based on resource competition

https://doi.org/10.1016/j.future.2013.01.005Get rights and content

Abstract

Workflow technology has been adopted in scientific domains to orchestrate and automate scientific processes in order to facilitate experimentation. Such scientific workflows often involve large data sets and intensive computation that necessitate the use of the Grid. To execute a scientific workflow in the Grid, tasks within the workflow are assigned to Grid resources. Thus, to ensure efficient execution of the workflow, Grid workflow scheduling is required to manage the allocation of Grid resources.

Although many Grid workflow scheduling techniques exist, they are mainly designed for the execution of a single workflow. This is not the case with parameter sweep workflows, which are used for parametric study and optimisation. A parameter sweep workflow is executed numerous times with different input parameters in order to determine the effect of each parameter combination on the experiment. While executing multiple instances of a parameter sweep workflow in parallel can reduce the time required for the overall execution, this parallel execution introduces new challenges to Grid workflow scheduling. Not only is a scheduling algorithm that is able to manage multiple workflow instances required, but this algorithm also needs the ability to schedule tasks across multiple workflow instances judiciously, as tasks may require the same set of Grid resources. Without appropriate resource allocation, resource competition problem could arise.

We propose a new Grid workflow scheduling technique for parameter sweep workflow called the Besom scheduling algorithm. The scheduling decision of our algorithm is based on the resource dependencies of tasks in the workflow, as well as conventional Grid resource-performance metrics. In addition, the proposed technique is extended to handle loop structures in scientific workflows without using existing loop-unrolling techniques. The Besom algorithm is evaluated using simulations with a variety of scenarios. A comparison between the simulation results of the Besom algorithm and of the three existing Grid workflow scheduling algorithms shows that the Besom algorithm is able to perform better than the existing algorithms for workflows that have complex structures and that involve overlapping resource dependencies of tasks.

Highlights

► We propose a new Grid workflow scheduling algorithm for parameter sweep workflows. ► The Besom algorithm uses resource competition to schedule multiple workflow instances. ► A single-level feedback loop can be handled during workflow execution. ► The Besom algorithm performs better for workflows with complex structure and setting.

Introduction

Computational science that involves large data sets and intensive computation is performed on distributed computing networks. Such scientific applications necessitate the use and management of a supercomputing infrastructure. The Grid [1], which offers supercomputing power through shared distributed resources, can be used to fulfil this necessity. However, several different functions, such as resource discovery and security management [1], [2], are required in order to manage and provide access to distributed Grid resources. To address this need, software tools have been developed for these functionalities and are implemented in multilayered architecture [1], [3]. Software at a lower layer provides services to the upper layer with end-user applications–specifically, e-Science applications–at the top layer.

To facilitate e-Science applications, workflow technology has been introduced into scientific domains and has become a necessary tool to help scientists perform their work in the Grid environment [4], [5], [6], [7]. Traditionally, workflow is a representation of business processes that consists of a set of tasks and the dependencies between them, specified in control structures or patterns [8]. Workflow has the ability to orchestrate services from heterogeneous sources, such as web services and existing software packages, to streamline business processes. In the same manner, scientific workflows can be utilised to orchestrate heterogeneous computation procedures developed by different parties to represent scientific processes in e-Science. Scientific workflow management systems such as Kepler [9] and Pegasus [10] (which usually provide a graphical user interface for composing workflows) can then be used to automatically execute scientific processes in the Grid.

In order to execute a scientific workflow in the Grid, a workflow scheduler is required. The workflow scheduler employs various Grid workflow scheduling algorithms to decide which task in the workflow is to be executed by which Grid resource so that the execution of the scientific workflow can meet constraints or criteria such as performance and cost. However, scheduling workflows in the Grid can be challenging. Unlike traditional scheduling problems in which resources are likely to be static [11], the distributed resources in the Grid are heterogeneous and may change their availability. Because the Grid has less centralised control [1], [12], a resource may join the Grid and make itself available to run applications, but may also become unavailable due to the decision of its owner or hardware failure. Scheduling in the Grid therefore usually needs to be done rapidly in order to make use of the immediate resources.

In some scientific experiments, the same process is repeated several times with varying input parameters to study or to optimise the parameters in the experiments [13]. For example, quantum chemical calculations optimise four parameters to give the best pseudo-atom surface [14]. This process is called parameter sweep and can also be automated using a workflow. Abramson, Enticott, and Altintas [14] proposed a technique to parallelise the execution of a parameter sweep workflow running in the Grid environment by cloning the repeated tasks to improve the overall execution time. This can be viewed as concurrently executing multiple workflow instances of the same workflow specification. As this results in that every instance competes for the same set of resources, it is therefore necessary to make sure that the scheduling of tasks and the allocation of Grid resources allow the execution of these multiple workflow instances to meet execution constraints.

Although many Grid workflow scheduling techniques exist, most of these scheduling algorithms only deal with a single Grid workflow at a time and assume that Grid workflows contain no loop structure [15], [16]. These techniques neglect the issues that may arise when multiple instances of a scientific workflow are executed in parallel and in iteration. Another issue is that existing Grid workflow scheduling algorithms do not take into account the dependency of tasks on specialised resources; nor do they consider the popularity of a resource [16], [17], [18]. For example, assigning a lengthy task to a resource that is shared by many tasks might delay the overall execution. This limitation is amplified with parallel execution of parameter sweep workflows due to the fact that every instance of a parameter sweep workflow, as mentioned earlier, requires the same set of resources. The situation may be exacerbated if there is competition for a scarce special resource.

In this paper, we propose a scheduling technique for multiple scientific workflow instances, based on the resource dependencies of tasks, that is also able to handle loop structure. The proposed algorithm aims to minimise the makespan of the parameter sweep workflow execution. We use parameter sweep workflows as a case study to illustrate the execution of workflow with high resource competition.

The rest of this paper is structured as follows. Section 2 describes the related work in the area of Grid workflow scheduling. The proposed parameter sweep workflow scheduling technique is explained Section 3, and the extension for loop handling is explained in Section 4. The scheduler that implements the proposed technique is briefly described in Section 5 while the simulation results and the evaluation are presented in Section 6. Finally, Section 7 concludes this paper along with potential future research directions.

Section snippets

Related work

To facilitate the understanding of the context of this research, this section presents existing work related to scientific workflows and workflow scheduling. The workflow concept is first explained along with scientific workflow which is the workflow used in science and research communities. Parameter sweep workflow, a special class of scientific workflow used in parametric study, is subsequently described. Lastly, several existing workflow scheduling techniques are described.

Scheduling parallel execution of parameter sweep workflow

This section describes our proposed scheduling technique named “Besom”; the word means a type of broom that is associated with witches in order to imply that the proposed algorithm can cleverly “sweep” parameters. To elaborate the technique, each of the terms involved is first explained, followed by the explanation of the Besom scheduling algorithm. For ease in understanding, in this section we assume that the parameter sweep workflow does not contain loop structures. The support for loop

Scheduling cyclic workflow graph

Usually, Grid workflow scheduling algorithms such, as the Min–Min and HEFT algorithm, assume that workflows lack loop structures [18], [36], [38], and leave the handling of loops as a separate concern that can be addressed by methods such as the loop unrolling technique suggested by Prodan and Fahringer in [24]. However, to unroll a loop, the number of loop iteration must be known or predicted [24]. This information might not be available or the prediction might not be reliable in every

Scheduler implementation

In this section, we briefly describe the prototype design of the Grid workflow scheduler that uses the Besom scheduling algorithm. The prototype has been developed as a parameter sweep workflow scheduler in Nimrod/K [13], [14], which is built on the Kepler system [9] and provides workflow schedulers with access to the underlying features of Grid middleware tools as well as Nimrod’s parameter exploration tools. It gives a scheduler access to all the file transfers, compute resource scheduling

Performance evaluation

To evaluate and analyse the behaviour of the Besom scheduling algorithm, we compare its performance with the three existing batch-mode scheduling algorithms: the Min–Min algorithm, the Max–Min algorithm and the XSufferage algorithm. In order to control the resource availability and network conditions, we use a simulation environment, implemented into Nimrod/K [14], and create dummy actors to represent tasks in parameter sweep workflow. These actors simulate task execution by going into sleep

Conclusion and future work

In this research, we proposed a new Grid workflow scheduling algorithm which aims to minimise the makespan of the entire execution of parameter sweep workflows. In addition, the algorithm is designed to reduce bottlenecks in workflow executions in the Grid (by considering resource competition); to manage multiple workflow instances; and to schedule workflows incorporating a loop structure.

The Besom algorithm works by retrieving information about the resource requirements of the tasks in the

Sucha Smanchat is currently a lecturer at the Faculty of Information Technology, King Mongkut’s University of Technology North Bangkok, Thailand. He obtained his Ph.D. at Monash University in Melbourne, Australia. During his study, he was involved with the development of a prototype scheduler for the Nimrod/K system. His current research interests are in Grid and Cloud workflow scheduling techniques.

References (60)

  • W. Sudholt et al.

    Application of grid computing to parameter sweeps and optimizations in molecular modeling

    Future Generation Computer Systems

    (2005)
  • D. Abramson et al.

    Embedding optimization in computational science workflows

    Journal of Computational Science

    (2010)
  • Z. Shi et al.

    Scheduling workflow applications on processors with different capabilities

    Future Generation Computer Systems

    (2006)
  • I.T. Foster, The anatomy of the grid: enabling scalable virtual organizations, in: Proceedings of the 7th International...
  • I. Foster et al.

    The Grid 2: Blueprint for A New Computing Infrastructure

    (2004)
  • I. Foster, Globus toolkit version 4: software for service-oriented systems, in: IFIP International Conference on...
  • Z. Jianting, Ontology-driven composition and validation of scientific grid workflows in Kepler: a case study of...
  • J. Zhang et al.

    Using web services and scientific workflow for species distribution prediction modeling

  • D.P. Deana, Supporting large-scale science with workflows, in: Proceedings of the 2nd Workshop on Workflows in support...
  • Y. Gil et al.

    Examining the challenges of scientific workflows

    Computer

    (2007)
  • W.M.P.v.d. Aalst

    The application of petri nets to workflow management

    The Journal of Circuits, Systems and Computers

    (1998)
  • B. Ludäscher et al.

    Scientific workflow management and the Kepler system

    Concurrency and Computation: Practice and Experience

    (2006)
  • E. Deelman et al.

    Pegasus: a framework for mapping complex scientific workflows onto distributed systems

    Scientific Programming

    (2005)
  • M.L. Pinedo

    Scheduling: Theory, Algorithms, and Systems

    (2008)
  • I. Foster

    What is the grid?—a three point checklist

    GRIDtoday

    (2002)
  • D. Abramson et al.

    Parameter exploration in science and engineering using many-task computing

    IEEE Transactions on Parallel and Distributed Systems

    (2011)
  • D. Abramson, C. Enticott, I. Altintas, Nimrod/K: towards massively parallel dynamic grid workflows, in: Proceedings of...
  • M. Wieczorek et al.

    Taxonomies of the multi-criteria grid workflow scheduling problem

  • J. Yu et al.

    Workflow scheduling algorithms for grid computing

  • H. Topcuoglu et al.

    Performance-effective and low-complexity task scheduling for heterogeneous computing

    IEEE Transactions on Parallel and Distributed Systems

    (2002)
  • M. Maheswaran, S. Ali, H.J. Siegel, D. Hensgen, R.F. Freund, Dynamic matching and scheduling of a class of independent...
  • A. Barker et al.

    Scientific workflow: a survey and research directions

  • W.M.P.V.D. Aalst et al.

    Workflow patterns

    Distributed and Parallel Databases

    (2003)
  • J. Rao et al.

    A survey of automated web service composition methods

  • T. Oinn et al.

    Taverna: a tool for the composition and enactment of bioinformatics workflows

    Bioinformatics

    (2004)
  • Y. Gil et al.

    Wings: intelligent workflow-based design of computational experiments

    IEEE Intelligent Systems

    (2011)
  • R. Prodan et al.

    Scientific grid workflows

  • Taverna project. Available:...
  • W. Sudholt et al.

    Parameter scan of an effective group difference pseudopotential using grid computing

    New Generation Computing

    (2004)
  • W. Sudholt, K.K. Baldridge, D. Abramson, C. Enticott, S. Garic, Applying grid computing to the parameter sweep of a...
  • Cited by (14)

    • Taxonomies of workflow scheduling problem and techniques in the cloud

      2015, Future Generation Computer Systems
      Citation Excerpt :

      Scheduling criteria influence the design and approach of scheduling techniques. In contrast to grid workflow scheduling where minimizing makespan (or time) is dominant [1,14–17], most of the cloud workflow scheduling techniques are multi-objectives, in which time and cost are considered together in the scheduling [6,18,19]. Nevertheless, other objectives have also been considered.

    • A new optimization phase for scientific workflow management systems

      2014, Future Generation Computer Systems
      Citation Excerpt :

      Although, many design frameworks [35] have been developed, they are not accessible for scientists working with scientific workflows and workflow management systems. Instead, parameter sweeps and manual trial and error are still the methods of choice for life science researcher today [36,37], which was therefore used as comparison here. To our knowledge there is no other comprehensive approach for general workflow optimization freely available today, except Nimrod/OK [7], which is a more specialized workflow parameter optimization tool from the engineering domain and apparently not available for download.

    View all citing articles on Scopus

    Sucha Smanchat is currently a lecturer at the Faculty of Information Technology, King Mongkut’s University of Technology North Bangkok, Thailand. He obtained his Ph.D. at Monash University in Melbourne, Australia. During his study, he was involved with the development of a prototype scheduler for the Nimrod/K system. His current research interests are in Grid and Cloud workflow scheduling techniques.

    Maria Indrawan works in the Faculty of Information Technology at Monash University. Maria received her Ph.D. from Monash University in Melbourne, Australia. Her areas of interest and expertise include data management in e-science, data mining, information retrieval and pervasive computing. She has authored numerous articles in these topic areas and has served on a number of program committees of international conferences. In 2009 she spent her sabbatical at the San Diego Supercomputer Centre (SDSC) and California Institute of Telecommunication and Information Technology (CALIT2) in the USA, working with scientists from such varied fields as computer science, biochemistry and physics. Through exposure to a number of e-science projects at those institutions, she developed a greater interest in data management for e-science.

    Sea Ling works in the Faculty of Information Technology at Monash University as a Senior Lecturer. Sea received his Ph.D. from Monash University in Melbourne, Australia. He is interested in Petri net theory, workflows and software engineering. Recently, his research involves applying these theories to various research areas such as Grid computing and pervasive computing. He has supervised a number of research students in these topic areas, authored numerous articles and has served on a number of program committees of international conferences.

    Colin Enticott is currently employed as a research scientist by the MeSsAGE Lab group at Monash University, Melbourne. He has been active in the area of distributed computing for over 10 years. His research focus is in the area of Grid Computing and Workflow Management and is about to complete his Ph.D. Previous projects include the Nimrod Portal and deploying Nimrod on GrangeNet.

    David Abramson is a full professor of computer science at Monash University, Clayton, Australia, where he was the department chair from 1997 to 2002. He is currently the Director of the Monash e-Education Centre, the Science Director of the Monash e-Research Centre and the Director of the Monash e-Science and Grid Engineering Lab (MeSsAGE).

    Before joining Monash University, in 1997, he held appointments at Griffith University, CSIRO, and RMIT. At CSIRO, he was the program leader of the Division of Information Technology High Performance Computing Program, and was also an adjunct associate professor at RMIT in Melbourne. He has held senior management positions in the Cooperative Research Centre for Intelligent Decisions Systems and the Cooperative Research Centre for Enterprise Distributed Systems.

    He has chaired a number of international conferences, and has published more than 200 papers and technical documents. He has participated in seminars and received awards around Australia and internationally, and has received more than $8 million in research grants. His current research interests are in high-performance computer systems design and software engineering tools for programming parallel and distributed supercomputers. He is a Fellow of the Association for Computing Machinery (ACM), a Fellow of the Australian Computer Society (ACS) and a senior member of the IEEE.

    View full text