Adaptive data-aware utility-based scheduling in resource-constrained systems

doi:10.1016/j.jpdc.2009.08.006

Journal of Parallel and Distributed Computing

Volume 70, Issue 9, September 2010, Pages 871-879

https://doi.org/10.1016/j.jpdc.2009.08.006 Get rights and content

Abstract

This paper addresses the problem of the dynamic scheduling of data-intensive multiprocessor jobs. Each job requires some number of CPUs and some amount of data that needs to be downloaded into a local storage. The completion of each job brings some benefit (utility) to the system, and the goal is to find the optimal scheduling policy that maximizes the average utility per unit of time obtained from all completed jobs. A co-evolutionary solution methodology is proposed, where the utility-based policies for managing local storage and for scheduling jobs onto the available CPUs mutually affect each other’s environments, with both policies being adaptively tuned using the Reinforcement Learning (RL) methodology. The simulation results demonstrate that the performance of the scheduling policies increases significantly as a result of being tuned with RL, to the point that they significantly outperform the best scheduling algorithm suggested in the literature for jobs with soft-deadline utility functions.

Introduction

Scheduling in High Performance Computing (HPC) systems is becoming an increasingly important and difficult task. An HPC system can have as many as 10⁵ multi-threaded processors and can cost many millions of dollars [6]. Correspondingly, it is desirable to operate such systems as efficiently as possible.

Most of the HPC jobs need to access some data stored on local disk or remote storage, and some jobs need to access very large amounts of data, especially in applications such as high-energy physics, natural language processing, astronomy, and bioinformatics. As noted in [4], the amount of data processed by scientific applications has been increasing exponentially since 1990, at an even faster rate than predicted by the Moore’s law. Thus, efficient data-aware scheduling will become a critical issue in scientific computing in the very near future.

If the scheduling system does not explicitly account for the amount of data each job needs to access, then some jobs will occupy the CPU resources for much longer than necessary, since they will spend significant amounts of time idling and waiting for the data to be read from remote storage. Thread-level schedulers (in Unix, Solaris, etc.) solve this problem by placing multiple threads on a single CPU, so that while some threads are waiting for their data to come in, the CPU can work on the other threads. However, job-level schedulers in HPC systems usually have strict security requirements, where jobs usually do not share the CPU resources. Despite the potentially large losses in system’s productivity associated with ignoring the job data requirements, the data-aware scheduling problem has received little attention so far.

This paper addresses this problem in the context of utility-based optimization, where completion of each job is assumed to bring some benefit (utility) to the system (a decreasing function of the job response time) and the goal is to maximize the average benefit obtained per unit of time from all completed jobs. This utility accrual (UA) formulation was initially proposed by Jensen in [3] and is a generalization of the standard deadline-based scheduling, where the benefit received from each job is 1 if the job is completed before its deadline and 0 otherwise.

As suggested in the recent overview of the UA real-time scheduling domain [7], the only algorithm that allows jobs to share (mutually exclusively) a finite amount of a certain resource (e.g., local storage space) is presented in [11]. However, this algorithm is not forward-looking in a sense that it simply orders the existing jobs according to their expected utility upon completion divided by the time remaining to completion (potential utility density) without accounting for what might happen in the near future (ideally, an algorithm should not schedule a large job that has a low utility upon completion if the probability of a new high-utility job arriving is large enough). We use this algorithm as a benchmark in our data scheduling experiments in Section 7, calling it NDP-RUU. We also use a preemption-enabled version of this algorithm, calling it BDP-RUU. The only UA real-time scheduling algorithm we are aware of that explicitly tries to maximize the expected future utility received by the system (by statistically estimating the job arrival probabilities) is presented in [10]. However, this algorithm does not consider any data constraints, and we are not aware of any other utility-maximizing real-time scheduling algorithms that consider multi-unit resource constraints.

As noted in [5], the problem of scheduling jobs in the UA paradigm even on a single CPU with single-unit resource constraints is NP-hard. Thus, a heuristic needs to be developed for solving this problem in the “static” case of focusing only on the currently arrived jobs (as done in [5], [11]). Alternatively, an adaptive policy tuning algorithm needs to be used in the “dynamic” case when jobs arrive stochastically and the scheduler needs to learn their arrival pattern and make appropriate forward-looking scheduling decisions.

The complexity of the policy tuning process increases (usually exponentially) with the number of input variables used by the policy. In the data-aware scheduling problem, the input variables need to describe the state of the local storage, of the CPU module executing jobs as well as of the jobs waiting for their data to be downloaded and those waiting to be executed on the CPU module. Instead of using a centralized learning algorithm (which will be impractically slow given the large number of input variables it needs to consider), we propose a distributed learning approach, where the CPU module and the local storage module are self-managing and self-optimizing. In particular, each module uses the Reinforcement Learning (RL) methodology [8] to adaptively tune the policy for managing its main resource (CPUs or local storage space). The CPU management policy and the corresponding storage management policy mutually affect each other’s environments, making the self-optimization process co-evolutionary.

Co-evolutionary RL has already been used successfully in some domains (e.g., [2], [1]). However, all co-evolutionary RL frameworks that we are aware of have used agents that directly affect the global environment and receive either individual reinforcement signals from it or a single reinforcement signal for all agents. In this paper we are faced with a sequential multi-agent learning task, where the actions of the first agent (the one managing the local storage) affect the state of the second agent (the one managing the CPU resources), and the feedback signal received from the environment (the total utility of jobs executed over some period of time) is affected directly only by the actions of the second agent. In order to resolve the difficulty of the first agent learning without a direct feedback signal, we propose a new idea of letting it use the “value function” $V (x)$ of the second agent in order to define its feedback signal. This process is explained in a greater detail in Section 6. To the best of our knowledge, this is the first co-evolutionary feedback learning architecture that uses this idea.

The purpose of this paper is to present a feasibility study of applying the co-evolutionary RL solution approach described above to the data-aware scheduling problem. While this approach did outperform noticeably the best non-adaptive policy described in the literature for this domain (NDP-RUU, as was mentioned above), an extensive comparison of the co-evolutionary RL solution with the other possible solutions for the data-aware scheduling problem within the UA paradigm is outside the scope of this paper. Our hope is that the current feasibility study will inspire other researchers to apply this methodology to new domains and perform in-context comparisons with other techniques.

The rest of the paper is organized as follows. Section 2 formulates the scheduling problem to be solved. Section 3 gives an overview of the solution framework used in this paper. Section 6 explains the specific scheduling algorithms that were developed. Section 7 presents numerical simulations that demonstrate the value of the proposed algorithms and gives some intuition about the observed results. Finally, Section 8 concludes the paper.

Section snippets

Problem formulation

Consider an HPC machine that has a CPU module (with multiple CPUs) paired with a local storage module of a finite capacity, as depicted in Fig. 1. Jobs that should be executed on the CPU module arrive in a stochastic manner. We also assume the following:

-
Each job $i$ requires some ideal number of CPUs $R_{\max}^{i}$ to be executed in the minimum possible time $t_{E, \min}^{i}$ .
-
Jobs are “moldable”, meaning that a job $i$ can be executed with fewer than $R_{\max}^{i}$ CPUs at the cost of a slower execution rate. However, the

The proposed architecture

As was mentioned in the Introduction, the proposed architecture simplifies the complexity of the scheduling problem by using two separate scheduling modules. These modules will also be sometimes referred to as agents in the rest of the paper.

The CPU Scheduling Module (CPU-SM) monitors jobs ready for execution (those whose data has already been downloaded into the local storage) and the currently executing jobs. Based on this information, the module decides which jobs should execute first and

Learning a value function with RL

The standard procedure for learning a value function with RL consists first of choosing a value function approximation architecture with some tunable parameters, and then of tuning these parameters in the course of observing the system’s evolution. Since each module in the proposed architecture chooses its actions so as to maximize its current value function, the decision-making policy it uses evolves as the value function evolves. Under certain conditions, this evolution process can be proved

Instantiating the RL framework

Several crucial decisions need to be made every time the RL framework is applied to a practical problem:

•
Defining the appropriate reward signal $r$ , which should be correlated with what one ultimately wants to optimize as a result of learning. This signal can differ from the ultimate objective if that makes the reward signal more regularly observable or more correlated with agent’s actions.
•
Defining the action space for the agent. Each action should ideally have an observable impact on the next

CPU scheduling module algorithms

Several adaptive and non-adaptive scheduling algorithms for CPU-SM were evaluated in [10]. This paper studies only the scheduling algorithm based on the value function $V (x)$ tuned using the RL process in Eq. (1), which was shown to outperform all other algorithms in [10]. This algorithm starts by scheduling available jobs in a best-fit manner so as to minimize the number of idle CPUs. If some jobs are still waiting, then the algorithm proceeds by using the following RL preemption (RLP) policy:

Simulation

The key parameters that affect the dynamics of the scheduling system described in Section 2 are the job arrival rate and the size of the local storage space relative to the average amount of data required by each job. If the local storage space is large enough, then job queuing will only happen among the available jobs (whose data has already been downloaded). Even if the job arrival rate is very large, the available job queue will still not “overflow” to the level of arrived jobs because jobs

Conclusion

This paper presented a novel co-evolutionary framework for solving the joint problem of managing the local storage space and the CPU resources of a computing system. The presented instantiation of this framework in the scheduling domain can apply (with suitable modifications to the state variables) to any computing system that uses a local cache to speed up execution of some jobs. More generally, the distributed co-evolutionary aspect of this framework makes it applicable to a greater class of

David Vengerov is a staff engineer at Sun Microsystems Laboratories. He is the principal investigator for the Adaptive Optimization project developing and implementing self-managing and self-optimizing capabilities in computer systems. His primary research interests include Utility and Autonomic Computing, Reinforcement Learning Algorithms, and Multi-Agent Systems. He holds a Ph.D. in Management Science and Engineering from Stanford University, an M.S. in Engineering Economic Systems and

References (11)

J.N. Tsitsiklis et al.
Average cost temporal-difference learning
Automatica
(1999)
D. Vengerov
A reinforcement learning framework for utility-based scheduling in resource-constrained systems
Future Generation Computer Systems
(2009)
H.R. Berenji, D. Vengerov, J. Ametha, Co-evolutionary perception-based reinforcement learning for sensor allocation in...
R.H. Crites et al.
Elevator group control using multiple reinforcement learning agents
Machine Learning
(1998)
E. Jensen, C. Locke, H. Tokuda, A time driven scheduling model for real-time operating systems, in: Proceedings of the...

There are more references available in the full text version of this article.

Cited by (5)

Reinforcement learning based resource allocation in business process management
2011, Data and Knowledge Engineering
Citation Excerpt :
In order to introduce the related work of resource allocation, two background areas must be discussed: operation management and BPM context. Resource allocation is one of the classic problems studied in operation management, such as job-shop scheduling [47], grid computing [36,37], activity network [38], autonomic systems [39], and so on. In particular, the job-shop scheduling, as an important and complex activity, is similar with the resource allocation problems in business process execution.
Efficient resource allocation is a complex and dynamic task in business process management. Although a wide variety of mechanisms are emerging to support resource allocation in business process execution, these approaches do not consider performance optimization. This paper introduces a mechanism in which the resource allocation optimization problem is modeled as Markov decision processes and solved using reinforcement learning. The proposed mechanism observes its environment to learn appropriate policies which optimize resource allocation in business process execution. The experimental results indicate that the proposed approach outperforms well known heuristic or hand-coded strategies, and may improve the current state of business process management.
Value of service based resource management for large-scale computing systems
2017, Cluster Computing
Value of Service Based Task Scheduling for Cloud Computing Systems
2016, Proceedings - 2016 International Conference on Cloud and Autonomic Computing, ICCAC 2016: Co-located with the 10th IEEE International Conference on Self-Adaptive and Self-Organizing Systems, SASO 2016
Resource provisioning for staging components
2013, Proceedings - IEEE 27th International Parallel and Distributed Processing Symposium Workshops and PhD Forum, IPDPSW 2013
An adaptive middleware framework for optimal scheduling on large scale compute clusters
2011, Proceedings - 2011 8th International Conference on Information Technology: New Generations, ITNG 2011

Lykomidis Mastroleon received his Ph.D. in electrical engineering from Stanford University in 2009 and his bachelor’s degree in electrical and computer engineering from the National Technical University of Athens, Greece in 2002. His current research interests include revenue management, dynamic resource allocation, control in large scale stochastic systems and queueing.

Declan Murphy is a Senior Staff Engineer at Sun Microsystems, where he has worked since 1992. He is currently a technical lead for Sun’s “Ops Center” system management product. Prior to that he lead the Administrative Environment team for Phase 2 of Sun’s DARPA HPCS program and worked on early stages of Sun’s N1 product line. Before focusing on system management, he worked on Sun’s highly available clustering products. He was tech lead and ultimately architect for the Sun Cluster product, focused on a highly available kernel-based the ORB and other availability infrastructure. He received BA, BAI degrees in computer engineering from Trinity College, Dublin, Ireland in 1987.

Nick Bambos is a Professor at Stanford University, having a joint appointment in the Department of Electrical Engineering and the Department of Management Science & Engineering. He heads the Network Architecture and Performance Engineering research group at Stanford, conducting research in wireless network architectures, the Internet infrastructure, packet switching, network management and information service engineering, engaged in various projects of his Network Architecture Laboratory (NetLab). His current technology research interests include high-performance networking, autonomic computing, and service engineering. His methodological interests are in network control, online task scheduling, queueing systems and stochastic processing networks. He received his Ph.D. in Electrical Engineering and Computer Sciences (EECS) from the University of California at Berkeley (1989), as well as the M.S. in EECS (1987) and the M.A. in Mathematics (1989) from the same University.

^☆: This material is based upon work supported by DARPA under Contract No. NBCH3039002.

View full text

Adaptive data-aware utility-based scheduling in resource-constrained systems☆