Resource-efficient workflow scheduling in clouds
Introduction
As resouce capacity in clusters and more recently clouds becomes increasingly abundant with the prevalence of large-scale multi-core systems and advances in virtualization technologies, such resource abundance has been “relentlessly” exploited particularly to improve applications performance. Scientific workflows (such as Montage [1], CyberShake [2], LIGO [3], Epigenomics [4] and SIPHT [5]) in particular can take great advantage of abundant resources as they are mostly resource-intensive with a good degree of scalability. However, the exploitation of resource abundance is a double-edged sword in that the performance improvement from such exploitation is often achieved at the expense of resource efficiency.
The inefficient use of resources when executing scientific workflows results from both the excessive amount of resources provisioned and the wastage from unused resources (Fig. 1), including idle time1 among task runs. The optimization of resource efficiency is of great practical importance considering its numerous benefits in the economic and environmental sustainability of large-scale computer systems like corporate data centers and clouds.
While previous work on workflow scheduling has focused on increasing the performance (makespan) with a limited amount of resources (resource scarcity), the advent of multi-core processors and cloud computing (resource abundance) has brought much attention to resource efficiency.2 Existing workflow scheduling algorithms may be able to adapt to deal with resource abundance by limiting the number of resources to be used (resource limit) at the time of scheduling. However, it is only a partial and ad hoc solution; for example, the schedule in Fig. 1c with a resource limit of 2 shows that makespan increases by 21% compared to the schedule in Fig. 1b in return for the use of one less resource. Moreover, the efficacy of such a solution varies for different applications, and even with executions of a particular application with different inputs (e.g., data and/or parameter values). Dynamic resource provisioning with public clouds as in [7], [8] might be an alternative; however, the poor resource utilization—sourced from uneven widths of different levels in a workflow (see Fig. 1)—within the hour still remains. Since it is very difficult, if not impossible, to find the optimal resource amount for scheduling a given workflow application, and since current workflow scheduling algorithms perform quite well in terms of makespan, the post-processing of output workflow schedules may be a practical approach to optimizing resource usage.
In this paper, we consider the problem of efficient use of resource abundance for running large-scale scientific workflows. To this end, we develop a new algorithm (Maximum Effective Reduction or MER) to effectively trade makespan increase for resource usage reduction. (Maximum Effective Reduction or MER), a workflow schedule optimization algorithm that minimizes the resource usage of a workflow schedule generated by any particular scheduling algorithm. MER is a post-optimization algorithm that takes an existing workflow schedule and consolidates tasks in the schedule into a smaller amount of resources compared to that used for the input schedule. The algorithm aims to find a minimal makespan increase that reduces the resource usage the most; this reduction is defined as effective reduction (ER) in this paper. ER is used to measure resource efficiency based primarily on the difference between the resource usage reduction and makespan increase in a resultant schedule as compared to the input schedule.
In our previous work [9], we have shown the potential that the resource usage of workflow schedule can be reduced by consolidating tasks exploiting idle/inefficiency slots; however, the degree of such reduction is mostly limited due to the preservation of makespan (Fig. 1d). MER significantly extends the primitive solution in [9] by allowing makespan increase/delay to maximize resource usage reduction. In particular, MER incorporates a simple, yet effective heuristic to estimate the makespan increase given the tasks being considered for consolidation. The estimate is used for the degree of delay allowed in makespan (makespan delay limit or simply delay limit), and is based on the estimated ER. We have verified the efficacy of this heuristic by a comparison with the best empirical delay limits found in our experiments. A preliminary version of this work can be also found in [10].
Based on results obtained from our extensive simulations using scientific workflow traces, we demonstrate MER is capable of reducing the amount of actual resources used by 54% with an average makespan increase of less than 10%. The efficacy of MER is further verified by results (from a comprehensive set of experiments with varying makespan delay limits) that show the resource usage reduction, makespan increase and the trade-off between them for various workflow applications.
The rest of this paper is organized as follows. Section 2 describes our workflow schedule optimization problem. In Section 3, we detail our solution algorithm. Section 4 demonstrates the efficacy of MER with results obtained from our extensive simulations. Section 5 provides a suvery on related work followed by our conclusion in Section 6.
Section snippets
The problem of workflow schedule optimization
In this section, we define the workflow schedule optimization problem describing workflow and system models.
Maximum effective reduction algorithm
The Maximum Effective Reduction (MER) algorithm is an workflow schedule optimization technique for workflow scheduling algorithms and it optimizes the trade-off between makespan increase and resource usage reduction. MER consists of the following phases/sub-algorithms: (1) delay limit identification, (2) task consolidation and (3) resource consolidation.
In essence, MER seeks to optimize a workflow schedule by consolidating tasks in two ways: (1) filling idle time slots resulting from data
Evaluation
In this section, we evaluate MER with four different scheduling algorithms and under five different workflow applications. The scheduling algorithms we examine are Critical Path First (CPF) [9], Dynamic Critical Path (DCP) [12], Critical-Path-on-a-Processor (CPOP) [13] and a greedy algorithm based on earliest finish time (EFT). We have implemented CPOP and EFT with a slight modification in order to run with an unlimited number of resources. Since the performance of scheduling algorithms is not
Related work
In this section, we review previous studies on workflow scheduling and discuss resource efficiency.
The execution of scientific workflows is typically planned and coordinated by schedulers/resource managers (e.g., [16], [17]) particularly with distributed resources. At the core of these schedulers are scheduling algorithms.
Traditionally, workflow scheduling focuses on the minimization of makespan (i.e., high performance) within tightly coupled computer systems like compute clusters with an
Conclusion
The abundance of cloud resources and their elasticity with a pay-as-you-go pricing model provide great opportunities for scientists and engineers in terms particularly of costs and availability. However, the current public/commodity cloud solutions are neither designed explicitly taking into account scientific computing, nor levelled with the performance of traditional HPC systems. The performance and cost of a cloud cluster deployment are heavily dependent on how effectively the resource
Acknowledgement
Dr. Young Choon Lee would like to acknowledge the support of the Australian Research Council Discovery Early Career Researcher Award Grant DE140101628. Prof. Albert Zomaya would like to acknowledge the support of the Australian Research Council Discovery Grant DP130104591. Dr. Hyuck Han's work was partly supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (NRF-2014R1A1A2055032).
References (23)
- et al.
Multi-objective list scheduling of workflow applications in distributed computing infrastructures
J. Paral. Distrib. Comput.
(2014) - et al.
Montage: a grid portal and software toolkit for science-grade astronomical image mosaicking
Int. J. Comput. Sci. Eng.
(2009) - et al.
Cybershake: a physics-based seismic hazard model for southern california
Pure Appl. Geophys.
(2010) - et al.
Ligo: the laser interferometer gravitational-wave observatory
Science
(1992) - S. Bharathi, A. Chervenak, E. Deelman, G. Mehta, M.-H. Su, K. Vahi, Characterization of scientific workflows, in:...
- et al.
High-throughput, kingdom-wide prediction and annotation of bacterial non-coding RNAs
PLoS ONE
(2008) - et al.
The case for energy-proportional computing
Computer
(2007) - M. Mao, M. Humphrey, Auto-scaling to minimize cost and meet application deadlines in cloud workflows, in: Proceedings...
- C. Lin, S. Lu, Scpor: an elastic workflow scheduling algorithm for services computing, in: Proceedings of the 2011 IEEE...
- Y.C. Lee, A.Y. Zomaya, Stretch out and compact: workflow scheduling with resource abundance, in: Proceedings of the...
Cited by (88)
Energy-aware intelligent scheduling for deadline-constrained workflows in sustainable cloud computing
2023, Egyptian Informatics JournalMulti-objective workflow scheduling based on genetic algorithm in cloud environment
2022, Information SciencesCitation Excerpt :With the rapid increase number of clients, cloud service providers should design an efficient and secure system for resource allocation and scheduling [28]. Generally, a system is considered efficient if it can provide the best performance for all applications running within it under resource constraints [17]. Therefore, to meet the requirements, it is necessary to manage resources effectively and schedule tasks reasonably.
TOPSIS inspired cost-efficient concurrent workflow scheduling algorithm in cloud
2022, Journal of King Saud University - Computer and Information SciencesA survey of domains in workflow scheduling in computing infrastructures: Community and keyword analysis, emerging trends, and taxonomies
2021, Future Generation Computer SystemsINHIBITOR: An intrusion tolerant scheduling algorithm in cloud-based scientific workflow system
2021, Future Generation Computer Systems