Multi-objective scheduling of many tasks in cloud platforms

doi:10.1016/j.future.2013.09.006

Future Generation Computer Systems

Volume 37, July 2014, Pages 309-320

https://doi.org/10.1016/j.future.2013.09.006 Get rights and content

Highlights

•
We propose an ordinal optimized method for multi-objective many-task scheduling.
•
We prove the suboptimality of the proposed method through mathematical analysis.
•
Our method significantly reduces scheduling overhead by introducing a rough model.
•
Our method delivers a set of semi-optimal good-enough scheduling solutions.
•
We demonstrate the effectiveness of the method on a real-life workload benchmark.

Abstract

The scheduling of a many-task workflow in a distributed computing platform is a well known NP-hard problem. The problem is even more complex and challenging when the virtualized clusters are used to execute a large number of tasks in a cloud computing platform. The difficulty lies in satisfying multiple objectives that may be of conflicting nature. For instance, it is difficult to minimize the makespan of many tasks, while reducing the resource cost and preserving the fault tolerance and/or the quality of service (QoS) at the same time. These conflicting requirements and goals are difficult to optimize due to the unknown runtime conditions, such as the availability of the resources and random workload distributions. Instead of taking a very long time to generate an optimal schedule, we propose a new method to generate suboptimal or sufficiently good schedules for smooth multitask workflows on cloud platforms.

Our new multi-objective scheduling (MOS) scheme is specially tailored for clouds and based on the ordinal optimization (OO) method that was originally developed by the automation community for the design optimization of very complex dynamic systems. We extend the OO scheme to meet the special demands from cloud platforms that apply to virtual clusters of servers from multiple data centers. We prove the suboptimality through mathematical analysis. The major advantage of our MOS method lies in the significantly reduced scheduling overhead time and yet a close to optimal performance. Extensive experiments were carried out on virtual clusters with 16 to 128 virtual machines. The multitasking workflow is obtained from a real scientific LIGO workload for earth gravitational wave analysis. The experimental results show that our proposed algorithm rapidly and effectively generates a small set of semi-optimal scheduling solutions. On a 128-node virtual cluster, the method results in a thousand times of reduction in the search time for semi-optimal workflow schedules compared with the use of the Monte Carlo and the Blind Pick methods for the same purpose.

Introduction

Large-scale workflow scheduling demands efficient and simultaneous allocation of heterogeneous CPU, memory, and network bandwidth resources for executing a large number of computational tasks. This resource allocation problem is NP-hard [1], [2]. How to effectively schedule many dependent or independent tasks on distributed sources that could be virtualized clusters of servers in a cloud platform makes the problem even more complex and challenging to solve, with a guaranteed solution quality.

The many-task computing paradigms were treated in [3], [4], [5]. These paradigms pose new challenges to the scalability problem, because they may contain large volumes of datasets and loosely coupled tasks. The optimization requires achieving multiple objectives. For example, it is rather difficult to minimize the scheduling makespan, the total cost, to preserve fault tolerance, and the QoS at the same time. Many researchers have suggested heuristics for the aforesaid problem [6].

The execution of a large-scale workflow encounters a high degree of randomness in the system and workload conditions [7], [8], such as unpredictable execution times, variable cost factors, and fluctuating workloads that makes the scheduling problem computationally intractable [9]. The lack of information on runtime dynamicity defies the use of deterministic scheduling models, in which the uncertainties are either ignored or simplified with an observed average.

Structural information of the workflow scheduling problem sheds a light on its inner properties and opens the door to many heuristic methods. No free lunch theorems [10] suggest that all of the search algorithms for an optimum of a complex problem perform exactly the same without the prior structural knowledge. We need to dig into the prior knowledge on randomness, or reveal a relationship between scheduling policy and performance metrics applied.

The emerging cloud computing paradigm [11], [12], [13] attracts industrial, business, and academic communities. Cloud platforms appeal to handle many loosely coupled tasks simultaneously. Our LIGO [14] benchmark programs are carried out using a virtualized cloud platform with a variable number of virtual clusters built with many virtual machines on fewer physical machines and virtual nodes as shown in Fig. 1 of Section 3. However, due to the fluctuation of many task workloads in realistic and practical cloud platforms, resource profiling and simulation stage on thousands of feasible schedules are needed. An optimal schedule on a cloud may take an intolerable amount of time to generate. Excessive response time for resource provisioning in a dynamic cloud platform is not acceptable at all.

Motivated by the simulation-based optimization methods in traffic analysis and supply chain management, we extend the ordinal optimization (OO) [15], [16] for cloud workflow scheduling. The core of the OO approach is to generate a rough model resembling the life of the workflow scheduling problem. The discrepancy between the rough model and the real model can be resolved with the optimization of the rough model. We do not insist on finding the best policy but a set of suboptimal policies. The evaluation of the rough model results in much lower scheduling overhead by reducing the exhaustive searching time in a much narrowed search space. Our earlier publication [17] indicated the applicability of using OO in performance improvement for distributed computing systems.

The remainder of the paper is organized as follows. Section 2 introduces related work on workflow scheduling and ordinal optimization. Section 3 presents our model for multi-objective scheduling (MOS) applications. Section 4 proposes the algorithms for generating semi-optimal schedules to achieve efficient resource provision in clouds. Section 5 presents the LIGO workload [18] to verify the efficiency of our proposed method. Section 6 reports the experimental results using our virtualized cloud platform. Finally, we conclude with some suggestions on future research work.

Section snippets

Related work and our unique approach

Recently, we have witnessed an escalating interest in the research towards resource allocation in grid workflow scheduling problems. Many classical optimization methods, such as opportunistic load balance, minimum execution time, and minimum completion time are reported in [19], and suffrage, min–min, max–min, and auction-based optimization are reported in [20], [21].

Yu et al. [22], [23] proposed economy-based methods to handle large-scale grid workflow scheduling under deadline constraints,

Multi-objective scheduling

In this section, we introduce our workflow scheduling model. In the latter portion of the section, we will identify the major challenges in realizing the model for efficient applications.

Vectorized ordinal optimization

The OO method applies only to single objective optimization. The vector ordinal optimization (VOO) [35] method optimizes over multiple objective functions. In this section, we first specify the OO algorithm. Thereafter, we describe the MOS algorithm based on VOO as an extension of the OO algorithm.

LIGO workflow analysis

We first introduce our LIGO application background and many-task workload characterizations. Thereafter we design the details of the implementation steps to further describe the procedure of MOS.

Experimental performance results

In this section, we report and interpret the performance data based on resource allocation experiments in LIGO workflow scheduling applications.

Conclusions

In this paper, we have extended the ordinal optimization method from a single objective to multiple objectives using a vectorized ordinal optimization (VOO) approach. We are the very first research group proposing this VOO approach to achieve multi-objective scheduling (MOS) in many-task workflow applications. Many-task scheduling is often hindered by the existence of a large amount of uncertainties. Our original technical contributions are summarized below.

(1)
We proposed the VOO approach to

Acknowledgments

The authors would like to express their heartfelt thanks to all the referees who provided critical and valuable comments on the manuscript. This work was supported in part by Ministry of Science and Technology of China under National 973 Basic Research Program (grants No. 2011CB302805, No. 2013CB228206 and No. 2011CB302505), National Natural Science Foundation of China (grant No. 61233016), and Tsinghua National Laboratory for Information Science and Technology Academic Exchange Program. Fan

References (47)

M. Maheswaran et al.
Dynamic mapping of a class of independent tasks onto heterogeneous computing systems
Journal of Parallel and Distributed Computing
(1999)
S. Teng et al.
Multi-objective ordinal optimization for simulation optimization problems
Automatica
(2007)
R.N. Calheiros et al.
A coordinator for scaling elastic applications across multiple clouds
Future Generation Computer Systems
(2012)
D. Chen et al.
Hybrid modelling and simulation of huge crowd over a hierarchical Grid architecture
Future Generation Computer Systems
(2013)
D. Li et al.
Constraint ordinal optimization
Information Sciences
(2002)
Q.S. Jia et al.
Comparison of selection rules for ordinal optimization
Mathematical and Computer Modelling
(2006)
R. Duan, R. Prodan, T. Fahringer, Performance and cost optimization for multiple large-scale grid workflow...
Ioan Raicu
Many-Task Computing: Bridging the Gap between High Throughput Computing and High Performance Computing
(2009)
I. Raicu, I. Foster, Y. Zhao, Many-task computing for grids and supercomputers, in: IEEE Workshop on Many-Task...
I. Raicu, I. Foster, Y. Zhao, et al. The quest for scalable support of data intensive workloads in distributed systems,...

M. Wieczorek, R. Prodan, A. Hoheisel, Taxonomies of the multi-criteria grid workflow scheduling problem coreGRID,...

K. Hwang et al.

Scalable Parallel Computing: Technology, Architecture, Programming

(1998)

Y. Wu et al.

Adaptive workload prediction of grid performance in confidence windows

IEEE Transactions on Parallel and Distributed Systems

(2010)

S. Lee, R. Eigenmann, Adaptive tuning in a dynamically changing resource environment, in: Proceeding of IEEE In’l...

D.H. Wolpert et al.

No free lunch theorems for optimization

IEEE Transactions on Evolutionary Computation

(1997)

I. Foster, Y. Zhao, I. Raicu, S. Lu, Cloud computing and grid computing 360-degree compared, in: IEEE Grid Computing...

C. Moretti, K. Steinhaeuser, D. Thain, N.V. Chawla, Scaling up classifiers to cloud computers, in: IEEE International...

Y. Zhao, I. Raicu, I. Foster, Scientific workflow systems for 21st century, new bottle or new wine? in: Proceedings of...

E. Deelman, C. Kesselman, et al. GriPhyN and LIGO, building a virtual data grid for gravitational wave scientists, in:...

Y.C. Ho et al.

Ordinal optimization of discrete event dynamic systems

Journal of Discrete Event Dynamic Systems

(1992)

Y.C. Ho et al.

Ordinal Optimization, Soft Optimization for Hard Problems

(2007)

F. Zhang, J. Cao, L. Liu, C. Wu, Fast autotuning configurations of parameters in distributed computing systems using...

K. Xu, J. Cao, L. Liu, C. Wu, Performance optimization of temporal reasoning for grid workflows using relaxed region...

Cited by (119)

Investigation of Task Scheduling in Cloud Computing by using Imperialist Competitive and Crow Search Algorithms
2023, Procedia Computer Science
Cloud Storage is a complex method that is a method of processing and data of a cloud built by duplication of thousands of related devices in a complex manner. The main function of the data processing server is to show how many users are being investigated and provide accurate, efficient and efficient information. Important Algorithm Editing Players in the Cloud Defines the virtual machine (VM) required for this purpose. The role of editing the algorithm reduces the effect of the schedule. Naturally affected algorithms have recently been used to quickly comply with the recent traditional algorithms. Given many consumers of many cloud computing services, many researchers may have a serious explanation that many researchers take and discuss the complex themes of NP. Some sites use imperialist algorithms (ICA) and birds. The purpose of the proposed project is to develop intelligent scientific algorithms that focus on the integration of ICA and CSA to obtain data. CSA is concentrated on the corner of food habits. The crow is looking for his friends to get enough food for today's food. This will help the CSA find suitable VMs for these machines and complete the equipment. Cloud Sim is used to calculate CSA output with minmin and ant algorithms. The simulation results show that the CSA is extra powerful than the MinMin and Ant procedures.
Deadline-constrained energy-aware workflow scheduling in geographically distributed cloud data centers
2022, Future Generation Computer Systems
The energy cost of cloud data centers is increasingly concerned worldwide; the minimization of energy cost is becoming an urgent problem. Considering data centers are geographically distributed, electricity prices are different in each data center. Consequently, it is also critical to assign workflow tasks to the geographically distributed data centers because data required by tasks is usually conserved in the given data center. So, as electricity prices and data transmission times change, it becomes a big challenge to minimize energy costs when scheduling workflow tasks to heterogeneous servers in cloud data centers. A DEWS (Deadline-constrained Energy-aware Workflow Scheduling) algorithm is proposed in this paper, which consists of task sequencing, VND-based data center searches, task sequence adjustment, and VM searching with Dynamic Voltage Frequency Scaling (DVFS). The DVFS method is included in the optimization procedure to cut down the additional energy cost of service providers. The experimental results show that the proposed algorithm outperforms the compared algorithms and reduces energy cost by 5%–20%.
A Multi-Objective Approach Based on Differential Evolution and Deep Learning Algorithms for VANETs
2023, IEEE Transactions on Vehicular Technology
Cloud workflow scheduling algorithm based on multi-objective particle swarm optimisation
2023, International Journal of Grid and Utility Computing
Hybrid Approach for Resource Allocation and Task Scheduling on Cloud Computing: A Review
2023, Lecture Notes in Electrical Engineering
GSAGA: A hybrid algorithm for task scheduling in cloud infrastructure
2022, Journal of Supercomputing

View all citing articles on Scopus

Fan Zhang received his Ph.D. at the National CIMS Engineering Research Center, Tsinghua University, Beijing, China. He received a B.S. in computer science from Hubei University of Technology and an M.S. in control science and engineering from Huazhong University of Science and Technology. He has been awarded the IBM Ph.D. Fellowship. Dr. Zhang is currently a postdoctoral associate with the Kavli Institute for Astrophysics and Space Research at Massachusetts Institute of Technology. He is also a sponsored postdoctoral researcher at the Research Institute of Information Technology, Tsinghua University. His research interests include simulation-based optimization approaches, cloud computing resource provisioning, characterizing big-data scientific applications, and novel programming model for cloud computing.

Junwei Cao received his Ph.D. in computer science from the University of Warwick, Coventry, UK, in 2001. He received his bachelor and master degrees in control theories and engineering in 1996 and 1998, respectively, both from Tsinghua University, Beijing, China where he is a Professor and Vice Director of Research Institute of Information Technology. Prior to joining Tsinghua University in 2006, Dr. Cao was a research scientist at MIT LIGO Laboratory and NEC Laboratories Europe for 5 years. He has published over 130 papers with more than 3000 citations. He edited Cyberinfrastructure Technologies and Applications published by Nova Science in 2009. His research is focused on advanced computing technologies and applications. Dr. Cao is a senior member of the IEEE Computer Society and a member of the ACM and CCF.

Keqin Li is a SUNY distinguished professor of computer science and an Intellectual Ventures endowed visiting chair professor at Tsinghua University, China. His research interests are mainly in design and analysis of algorithms, parallel and distributed computing, and computer networking. Dr. Li has over 255 research publications and has received several Best Paper Awards for his research work. He is currently on the editorial boards of IEEE Transactions on Parallel and Distributed Systems and IEEE Transactions on Computers.

Samee U. Khan is an assistant professor at the North Dakota State University. He received his Ph.D. from the University of Texas at Arlington in 2007. Dr. Khan’s research interests include cloud and big-data computing, social networking, and reliability. His work has appeared in over 200 publications with two receiving best paper awards. He is a Fellow of the IET and a Fellow of the BCS.

Kai Hwang is a professor of EE/CS at the University of Southern California. He also chairs the IV-endowed visiting chair professor group at Tsinghua University in China. He received the Ph.D. from University of California, Berkeley in 1972. He has published 8 books and 220 papers, which have been cited over 11,200 times. His latest book Distributed and Cloud Computing (with G. Fox and J. Dongarra) was published by Kaufmann in 2011. An IEEE Fellow, Hwang received CFC Outstanding Achievement Award in 2004, the Founders Award from IEEE IPDPS-2011 and a Lifetime Achievement Award from IEEE Cloudcom-2012. He has served as the Editor-in-Chief of the Journal of Parallel and Distributed Computing for 28 years and delivered 35 keynote speeches in major IEEE/ACM Conferences.

View full text

Multi-objective scheduling of many tasks in cloud platforms

Highlights

Abstract

Introduction

Section snippets

Related work and our unique approach

Multi-objective scheduling

Vectorized ordinal optimization

LIGO workflow analysis

Experimental performance results

Conclusions

Acknowledgments

Journal of Parallel and Distributed Computing

Automatica

Future Generation Computer Systems

Future Generation Computer Systems

Information Sciences

Mathematical and Computer Modelling

Many-Task Computing: Bridging the Gap between High Throughput Computing and High Performance Computing

Scalable Parallel Computing: Technology, Architecture, Programming

Adaptive workload prediction of grid performance in confidence windows

IEEE Transactions on Parallel and Distributed Systems

No free lunch theorems for optimization

IEEE Transactions on Evolutionary Computation

Ordinal optimization of discrete event dynamic systems

Journal of Discrete Event Dynamic Systems

Ordinal Optimization, Soft Optimization for Hard Problems