Multi-objective scheduling of many tasks in cloud platforms

https://doi.org/10.1016/j.future.2013.09.006Get rights and content

Highlights

  • We propose an ordinal optimized method for multi-objective many-task scheduling.

  • We prove the suboptimality of the proposed method through mathematical analysis.

  • Our method significantly reduces scheduling overhead by introducing a rough model.

  • Our method delivers a set of semi-optimal good-enough scheduling solutions.

  • We demonstrate the effectiveness of the method on a real-life workload benchmark.

Abstract

The scheduling of a many-task workflow in a distributed computing platform is a well known NP-hard problem. The problem is even more complex and challenging when the virtualized clusters are used to execute a large number of tasks in a cloud computing platform. The difficulty lies in satisfying multiple objectives that may be of conflicting nature. For instance, it is difficult to minimize the makespan of many tasks, while reducing the resource cost and preserving the fault tolerance and/or the quality of service (QoS) at the same time. These conflicting requirements and goals are difficult to optimize due to the unknown runtime conditions, such as the availability of the resources and random workload distributions. Instead of taking a very long time to generate an optimal schedule, we propose a new method to generate suboptimal or sufficiently good schedules for smooth multitask workflows on cloud platforms.

Our new multi-objective scheduling (MOS) scheme is specially tailored for clouds and based on the ordinal optimization (OO) method that was originally developed by the automation community for the design optimization of very complex dynamic systems. We extend the OO scheme to meet the special demands from cloud platforms that apply to virtual clusters of servers from multiple data centers. We prove the suboptimality through mathematical analysis. The major advantage of our MOS method lies in the significantly reduced scheduling overhead time and yet a close to optimal performance. Extensive experiments were carried out on virtual clusters with 16 to 128 virtual machines. The multitasking workflow is obtained from a real scientific LIGO workload for earth gravitational wave analysis. The experimental results show that our proposed algorithm rapidly and effectively generates a small set of semi-optimal scheduling solutions. On a 128-node virtual cluster, the method results in a thousand times of reduction in the search time for semi-optimal workflow schedules compared with the use of the Monte Carlo and the Blind Pick methods for the same purpose.

Introduction

Large-scale workflow scheduling demands efficient and simultaneous allocation of heterogeneous CPU, memory, and network bandwidth resources for executing a large number of computational tasks. This resource allocation problem is NP-hard  [1], [2]. How to effectively schedule many dependent or independent tasks on distributed sources that could be virtualized clusters of servers in a cloud platform makes the problem even more complex and challenging to solve, with a guaranteed solution quality.

The many-task computing paradigms were treated in  [3], [4], [5]. These paradigms pose new challenges to the scalability problem, because they may contain large volumes of datasets and loosely coupled tasks. The optimization requires achieving multiple objectives. For example, it is rather difficult to minimize the scheduling makespan, the total cost, to preserve fault tolerance, and the QoS at the same time. Many researchers have suggested heuristics for the aforesaid problem  [6].

The execution of a large-scale workflow encounters a high degree of randomness in the system and workload conditions  [7], [8], such as unpredictable execution times, variable cost factors, and fluctuating workloads that makes the scheduling problem computationally intractable  [9]. The lack of information on runtime dynamicity defies the use of deterministic scheduling models, in which the uncertainties are either ignored or simplified with an observed average.

Structural information of the workflow scheduling problem sheds a light on its inner properties and opens the door to many heuristic methods. No free lunch theorems  [10] suggest that all of the search algorithms for an optimum of a complex problem perform exactly the same without the prior structural knowledge. We need to dig into the prior knowledge on randomness, or reveal a relationship between scheduling policy and performance metrics applied.

The emerging cloud computing paradigm  [11], [12], [13] attracts industrial, business, and academic communities. Cloud platforms appeal to handle many loosely coupled tasks simultaneously. Our LIGO  [14] benchmark programs are carried out using a virtualized cloud platform with a variable number of virtual clusters built with many virtual machines on fewer physical machines and virtual nodes as shown in Fig. 1 of Section  3. However, due to the fluctuation of many task workloads in realistic and practical cloud platforms, resource profiling and simulation stage on thousands of feasible schedules are needed. An optimal schedule on a cloud may take an intolerable amount of time to generate. Excessive response time for resource provisioning in a dynamic cloud platform is not acceptable at all.

Motivated by the simulation-based optimization methods in traffic analysis and supply chain management, we extend the ordinal optimization (OO)  [15], [16] for cloud workflow scheduling. The core of the OO approach is to generate a rough model resembling the life of the workflow scheduling problem. The discrepancy between the rough model and the real model can be resolved with the optimization of the rough model. We do not insist on finding the best policy but a set of suboptimal policies. The evaluation of the rough model results in much lower scheduling overhead by reducing the exhaustive searching time in a much narrowed search space. Our earlier publication  [17] indicated the applicability of using OO in performance improvement for distributed computing systems.

The remainder of the paper is organized as follows. Section  2 introduces related work on workflow scheduling and ordinal optimization. Section  3 presents our model for multi-objective scheduling (MOS) applications. Section  4 proposes the algorithms for generating semi-optimal schedules to achieve efficient resource provision in clouds. Section  5 presents the LIGO workload  [18] to verify the efficiency of our proposed method. Section  6 reports the experimental results using our virtualized cloud platform. Finally, we conclude with some suggestions on future research work.

Section snippets

Related work and our unique approach

Recently, we have witnessed an escalating interest in the research towards resource allocation in grid workflow scheduling problems. Many classical optimization methods, such as opportunistic load balance, minimum execution time, and minimum completion time are reported in  [19], and suffrage, min–min, max–min, and auction-based optimization are reported in  [20], [21].

Yu et al.  [22], [23] proposed economy-based methods to handle large-scale grid workflow scheduling under deadline constraints,

Multi-objective scheduling

In this section, we introduce our workflow scheduling model. In the latter portion of the section, we will identify the major challenges in realizing the model for efficient applications.

Vectorized ordinal optimization

The OO method applies only to single objective optimization. The vector ordinal optimization (VOO)  [35] method optimizes over multiple objective functions. In this section, we first specify the OO algorithm. Thereafter, we describe the MOS algorithm based on VOO as an extension of the OO algorithm.

LIGO workflow analysis

We first introduce our LIGO application background and many-task workload characterizations. Thereafter we design the details of the implementation steps to further describe the procedure of MOS.

Experimental performance results

In this section, we report and interpret the performance data based on resource allocation experiments in LIGO workflow scheduling applications.

Conclusions

In this paper, we have extended the ordinal optimization method from a single objective to multiple objectives using a vectorized ordinal optimization (VOO) approach. We are the very first research group proposing this VOO approach to achieve multi-objective scheduling (MOS) in many-task workflow applications. Many-task scheduling is often hindered by the existence of a large amount of uncertainties. Our original technical contributions are summarized below.

  • (1)

    We proposed the VOO approach to

Acknowledgments

The authors would like to express their heartfelt thanks to all the referees who provided critical and valuable comments on the manuscript. This work was supported in part by Ministry of Science and Technology of China under National 973 Basic Research Program (grants No. 2011CB302805, No. 2013CB228206 and No. 2011CB302505), National Natural Science Foundation of China (grant No. 61233016), and Tsinghua National Laboratory for Information Science and Technology Academic Exchange Program. Fan

Fan Zhang received his Ph.D. at the National CIMS Engineering Research Center, Tsinghua University, Beijing, China. He received a B.S. in computer science from Hubei University of Technology and an M.S. in control science and engineering from Huazhong University of Science and Technology. He has been awarded the IBM Ph.D. Fellowship. Dr. Zhang is currently a postdoctoral associate with the Kavli Institute for Astrophysics and Space Research at Massachusetts Institute of Technology. He is also a

References (47)

  • M. Wieczorek, R. Prodan, A. Hoheisel, Taxonomies of the multi-criteria grid workflow scheduling problem coreGRID,...
  • K. Hwang et al.

    Scalable Parallel Computing: Technology, Architecture, Programming

    (1998)
  • Y. Wu et al.

    Adaptive workload prediction of grid performance in confidence windows

    IEEE Transactions on Parallel and Distributed Systems

    (2010)
  • S. Lee, R. Eigenmann, Adaptive tuning in a dynamically changing resource environment, in: Proceeding of IEEE In’l...
  • D.H. Wolpert et al.

    No free lunch theorems for optimization

    IEEE Transactions on Evolutionary Computation

    (1997)
  • I. Foster, Y. Zhao, I. Raicu, S. Lu, Cloud computing and grid computing 360-degree compared, in: IEEE Grid Computing...
  • C. Moretti, K. Steinhaeuser, D. Thain, N.V. Chawla, Scaling up classifiers to cloud computers, in: IEEE International...
  • Y. Zhao, I. Raicu, I. Foster, Scientific workflow systems for 21st century, new bottle or new wine? in: Proceedings of...
  • E. Deelman, C. Kesselman, et al. GriPhyN and LIGO, building a virtual data grid for gravitational wave scientists, in:...
  • Y.C. Ho et al.

    Ordinal optimization of discrete event dynamic systems

    Journal of Discrete Event Dynamic Systems

    (1992)
  • Y.C. Ho et al.

    Ordinal Optimization, Soft Optimization for Hard Problems

    (2007)
  • F. Zhang, J. Cao, L. Liu, C. Wu, Fast autotuning configurations of parameters in distributed computing systems using...
  • K. Xu, J. Cao, L. Liu, C. Wu, Performance optimization of temporal reasoning for grid workflows using relaxed region...
  • Cited by (119)

    View all citing articles on Scopus

    Fan Zhang received his Ph.D. at the National CIMS Engineering Research Center, Tsinghua University, Beijing, China. He received a B.S. in computer science from Hubei University of Technology and an M.S. in control science and engineering from Huazhong University of Science and Technology. He has been awarded the IBM Ph.D. Fellowship. Dr. Zhang is currently a postdoctoral associate with the Kavli Institute for Astrophysics and Space Research at Massachusetts Institute of Technology. He is also a sponsored postdoctoral researcher at the Research Institute of Information Technology, Tsinghua University. His research interests include simulation-based optimization approaches, cloud computing resource provisioning, characterizing big-data scientific applications, and novel programming model for cloud computing.

    Junwei Cao received his Ph.D. in computer science from the University of Warwick, Coventry, UK, in 2001. He received his bachelor and master degrees in control theories and engineering in 1996 and 1998, respectively, both from Tsinghua University, Beijing, China where he is a Professor and Vice Director of Research Institute of Information Technology. Prior to joining Tsinghua University in 2006, Dr. Cao was a research scientist at MIT LIGO Laboratory and NEC Laboratories Europe for 5 years. He has published over 130 papers with more than 3000 citations. He edited Cyberinfrastructure Technologies and Applications published by Nova Science in 2009. His research is focused on advanced computing technologies and applications. Dr. Cao is a senior member of the IEEE Computer Society and a member of the ACM and CCF.

    Keqin Li is a SUNY distinguished professor of computer science and an Intellectual Ventures endowed visiting chair professor at Tsinghua University, China. His research interests are mainly in design and analysis of algorithms, parallel and distributed computing, and computer networking. Dr. Li has over 255 research publications and has received several Best Paper Awards for his research work. He is currently on the editorial boards of IEEE Transactions on Parallel and Distributed Systems and IEEE Transactions on Computers.

    Samee U. Khan is an assistant professor at the North Dakota State University. He received his Ph.D. from the University of Texas at Arlington in 2007. Dr. Khan’s research interests include cloud and big-data computing, social networking, and reliability. His work has appeared in over 200 publications with two receiving best paper awards. He is a Fellow of the IET and a Fellow of the BCS.

    Kai Hwang is a professor of EE/CS at the University of Southern California. He also chairs the IV-endowed visiting chair professor group at Tsinghua University in China. He received the Ph.D. from University of California, Berkeley in 1972. He has published 8 books and 220 papers, which have been cited over 11,200 times. His latest book Distributed and Cloud Computing (with G. Fox and J. Dongarra) was published by Kaufmann in 2011. An IEEE Fellow, Hwang received CFC Outstanding Achievement Award in 2004, the Founders Award from IEEE IPDPS-2011 and a Lifetime Achievement Award from IEEE Cloudcom-2012. He has served as the Editor-in-Chief of the Journal of Parallel and Distributed Computing for 28 years and delivered 35 keynote speeches in major IEEE/ACM Conferences.

    View full text