Elsevier

Performance Evaluation

Volume 79, September 2014, Pages 306-327
Performance Evaluation

Value driven load balancing

https://doi.org/10.1016/j.peva.2014.07.019Get rights and content

Abstract

To date, the study of dispatching or load balancing in server farms has primarily focused on the minimization of response time. Server farms are typically modeled by a front-end router that employs a dispatching policy to route jobs to one of several servers, with each server scheduling all the jobs in its queue via Processor-Sharing. However, the common assumption has been that all jobs are equally important or valuable, in that they are equally sensitive to delay. Our work departs from this assumption: we model each arrival as having a randomly distributed value parameter, independent of the arrival’s service requirement (job size). Given such value heterogeneity, the correct metric is no longer the minimization or response time, but rather, the minimization of value-weighted response time. In this context, we ask “what is a good dispatching policy to minimize the value-weighted response time metric?” We propose a number of new dispatching policies that are motivated by the goal of minimizing the value-weighted response time. Via a combination of exact analysis, asymptotic analysis, and simulation, we are able to deduce many unexpected results regarding dispatching.

Introduction

Server farms are commonplace today in web servers, data centers, and in compute clusters. Such architectures are inexpensive (compared to a single fast server) and afford flexibility and scalability in computational power. However, their efficiency relies on having a good algorithm for routing incoming jobs to servers.

A typical server farm consists of a front-end router, which receives all the incoming jobs and dispatches each job to one of a collection of servers which do the actual processing, as depicted in Fig. 1. The servers themselves are “off-the-shelf” commodity servers which typically schedule all jobs in their queue via Processor-Sharing (PS); this cannot easily be changed to some other scheduling policy. All the decision-making is done at the central dispatcher. The dispatcher (also called a load balancer) employs a dispatching policy (often called a load balancing policy or a task assignment policy), which specifies to which server an incoming request should be routed. Each incoming job is immediately dispatched by the dispatcher to one of the servers (this immediate dispatching is important because it allows the server to quickly set up a connection with the client, before the connection request is dropped). Typical dispatchers used include Cisco’s Local Director  [1], IBM’s Network Dispatcher  [2], F5’s Big IP  [3], Microsoft Sharepoint  [4], etc. Since scheduling at the servers is not under our control, it is extremely important that the right dispatching policy is used.

Prior work has studied dispatching policies with the goal of minimizing mean response time, E[T]; a job’s response time is the time from when the job arrives until it completes. Several papers have specifically studied the case where the servers schedule their jobs via PS (see [5], [6], [7], [8], [9], [10], [11], [12]). Here, it has been show that the Join-the-Shortest-Queue (JSQ) policy performs very well, for general job size distributions. Even picking the shortest of a small subset of the queues, or simply trying to pick an idle queue if it exists, works very well. Interestingly, such simple policies like JSQ are superior even to policies like Least-Work-Left, which route a job to the server with the least remaining total work (sum of remaining sizes of all jobs at the queue), rather than simply looking at the number of jobs  [13]. In addition, there have been many more papers studying dispatching policies where the servers schedule jobs in First-Come–First-Served (FCFS) order (see e.g.,  [9], [14], [15], [16], [17], [18], [19], [20], [21], [22], [23], [24], [25]). Here high job size variability can play a large role, and policies like Size-Interval-Task-Assignment (SITA)  [14], which segregates jobs based on job size, or Least-Work-Left  [26], which routes job to the queue with the least total remaining work (rather than the smallest number of jobs), are far superior to JSQ.

However, all of this prior work has assumed that jobs have equal importance (value), in that they are equally sensitive to delay. This is not at all the case. Some jobs might be background jobs, which are largely insensitive to delay, while others have a live user waiting for the result of the computation. There may be other jobs that are even more important in that many users depend on their results, or other jobs depend on their completion. We assume that every job has a value, V, independent of its size (service requirement). Given jobs with heterogeneous values, the right metric to minimize is not the mean response time, E[T], but rather the mean value-weighed response time, E[VT], where jobs of higher value (importance) are given lower response times.

The problem of minimizing E[VT], where V and T are independent, is also not new, although it has almost exclusively been considered in the case of server scheduling, not in the case of dispatching (see Prior Work section). Specifically, there is a large body of work in the operations research community where jobs have a holding cost, c, independent of the job size, and the goal is to minimizing E[cT] over all jobs. Here it is well-known that the cμ rule is optimal  [27]. In the cμ rule, c refers to a job’s holding cost and μ is the reciprocal of a job’s size. The cμ rule always runs the job with the highest product c times μ; thus, jobs with high holding cost and/or small size are favored. However, there has been no cμ-like dispatching policy proposed for server farms.

In this paper, we assume a server farm with a dispatcher and PS servers. Jobs arrive according to a Poisson process and are immediately dispatched to a server. The value, V, of an arrival is known, but its size, S, is not known. Furthermore, we assume that value and size are independent, so that knowing the job’s value does not give us information about the job’s size. We assume that we know the distribution job values. Furthermore, job sizes are exponentially-distributed with unit mean. By requiring that jobs are exponentially distributed, we are consistent with the assumption that there is no way to estimate a job’s size; otherwise, we could use “age” information to update predictions on the remaining size of each job, and some of the policies of interest would become much more complex.1 Nothing else is known about future arrivals. In making dispatching decisions, we assume that we know the queue length at each server (this is the number of jobs at the PS server) as well as the values of the jobs at each server. In this context, we ask:

“What is a good dispatching policy to minimize E[VT]?”

Even in this simple setting, it is not at all obvious what makes a good dispatching policy. We consider several policies (see Section  4 for more detail):

  • The Random (RND) dispatching policy ignores job values and queue lengths. Arrivals are dispatched randomly.

  • The Join-Shortest-Queue (JSQ) dispatching policy ignores values and routes each job to the server with the fewest number of jobs. This policy is known to be optimal in the case where all values are equal  [5].

  • The Value-Interval-Task-Assignment (VITA) dispatching policy is reminiscent of the SITA policy, where this time jobs are segregated by value, with low-value jobs going to one server, medium value jobs going to the next server, higher-value jobs going to the next server, and so on. The goal of this policy is to isolate high value jobs from other jobs, so that the high value jobs can experience low delay. The distribution of V and system load ρ are used to determine the optimal threshold(s) for minimizing E[VT].

  • The C-MU dispatching policy is motivated by the cμ rule for scheduling in servers. Each arrival is dispatched so as to maximize the average instantaneous value of the jobs completing, assuming no future arrivals, where the average is taken over the servers. This policy makes use of the value of the arrival and the values of all the jobs at each server.

  • The Length-And-Value-Aware (LAVA) dispatching policy is very similar to the C-MU policy. Both policies incorporate queue length and job values in their decision. However, whereas C-MU places jobs so as to maximize the expected instantaneous value of jobs completed, LAVA places jobs so as to explicitly minimize E[VT] over jobs. Both policies make their decisions solely based on jobs already in the system.

This paper is the first to introduce the VITA, C-MU, and LAVA policies.

Via a combination of asymptotic analysis, exact analysis, and simulation we show the following in Sections  5 Simulation results and intuitions, 6 Analytic results. We find that generally RND is worse than VITA, which is worse than JSQ, which is worse than LAVA. In fact, under an asymptotic regime we prove that as system load ρ1, the ratio E[VT]RND:E[VT]V ITA:E[VT]JSQ:E[VT]LAV A approaches 4:2:2:1. The C-MU policy, on the other hand, avoids neat classification. There are value distributions and loads for which C-MU is the best policy of those we study, and others for which C-MU is the worst. In fact, C-MU can become unstable even when system load ρ<1. Finally, while VITA is generally not a great policy, we find that there are certain regimes under which VITA approaches optimality under light load (ρ<1/2), performing far better than the other policies we study.

But is it possible to do even better than the above dispatching policies? We find that under a particularly skewed value distribution, there is a policy, “Gated VITA”, which can outperform all of the aforementioned policies by an arbitrary factor. The idea behind this policy is to split high and low value jobs, while using a “gate” to place a limit on the number of low-value jobs that can interfere with high-value jobs (see Section  7 for details). If one is willing to forego simplicity in the dispatching policies, one can further use first policy iteration to significantly improve upon simple policies (see Section  8 for details).

Section snippets

Prior work on value-driven dispatching

The problem of finding dispatching policies with the aim of minimizing value-weighted response time has received very little attention in the literature. Below we discuss the few papers in this setting, which are (only tangentially) related to our own.

One paper concerned with the minimization of an E[VT]-like metric is  [7], where a constant value parameter is associated with each server. In this setting, job values are not treated as exogenous random variables determined at the time of

Model for the PS server system

The basic system, illustrated in Fig. 1, is as follows:

  • We have m servers with Processor-Sharing (PS) scheduling discipline and service rate μi. Throughout the simulation and analytic portion of the paper, we give particular attention to the case where m=2 and μ1=μ2.

  • Jobs arrive according to the Poisson process with rate λ and are immediately dispatched to one of the m servers.

  • Job j is defined by a pair (X(j),V(j)), where X(j) denotes the size of the job and V(j) is its value.

  • Job sizes obey

Description of simple dispatching policies

In describing our dispatching policies, it will be convenient to use the following terms.

Definition 1

The state of a queue consists of its queue length and the specific values of jobs at the queue.

Definition 2

A dispatching policy is called static if its decision is independent of the queue states and independent of all past placement of jobs.2

Simulation results and intuitions

In this section, we report our findings from simulation in a two-server system (with identical service rates) for the value distributions given in Table 1. We also provide intuition for the results. Formal proofs will be given in the next section. In all cases, E[V]=1, and the continuous distributions (a)–(c) are presented in increasing order of variability, as are the discrete distributions (d)–(f). The variance is particularly high for distributions (e) and (f). Note that while the fraction

Analytic results

Motivated by the observations in the previous section, we proceed to present and prove analytic results.

A (sometimes) far better policy: Gated VITA (G-VITA)

In Sections  5 Simulation results and intuitions, 6 Analytic results, we saw that the VITA policy often performs poorly relative to most of the other policies, except under sharp bimodal value distributions such as distribution (f) (cf. the definition of SBD(p) in Section  6.3). For such distributions, VITA is asymptotically optimal (as the value distribution grows increasingly sharp) for ρ<1/2. Although VITA continues to perform modestly well for these distributions when ρ>1/2, it does not

More complex policies via the First Policy Iteration (FPI)

Thus far, we have considered only simple, intuitive dispatching policies. In this section, we analyze the value-aware dispatching problem in the framework of Markov decision processes (MDP)  [34], [35], [36]. This will lead us to policies that often perform better than our existing policies, but are more complex and less intuitive.

We start with a tutorial example to explain the FPI approach. Consider a two-server system. If this system were to use the RND dispatching policy, arrivals would be

Conclusion

This paper presents the first comprehensive study of dispatching policies that aim to minimize value-weighted response times under Process-Sharing scheduling. We propose a large number of novel dispatching policies and compare these under a range of workloads, showcasing the fact that the value distribution and load can greatly impact the ranking of the policies. We also prove several intriguing results on the asymptotic behavior of these policies. Note that while we have assumed that job

Acknowledgments

The authors would like to thank the reviewers for their helpful comments. Special thanks to Bruno Gaujal and Gautam Iyer for their assistance in refining some of the paper’s technical details. The second author’s work has been supported by the Academy of Finland in TOP-Energy project (grant no. 268992). The third author’s work was funded by NSF-CMMI-1334194 as well as a Computational Thinking grant from Microsoft Research.

Sherwin Doroudi is a Ph.D. student in Operations Management at the Tepper School of Business, Carnegie Mellon University, where he is advised by Mor Harchol-Balter and Mustafa Akan. His research is focused on the analysis of queueing systems with user heterogeneity. Currently, his work involves using techniques from both queueing theory and mechanism design in order to study service systems where delay-sensitive customers behave strategically.

References (42)

  • Microsoft sharepoint 2010 load balancer,...
  • F. Bonomi

    On job assignment for a parallel system of processor sharing queues

    IEEE Trans. Comput.

    (1990)
  • E. Altman et al.

    Load balancing in processor sharing systems

    Telecommun. Syst.

    (2011)
  • H. Feng et al.

    Mixed scheduling disciplines for network flows

    SIGMETRICS Perform. Eval. Rev.

    (2003)
  • M. Bramson, Y. Lu, B. Prabhakar, Randomized load balancing with general service time distributions, in: Proceedings of...
  • A. Mukhopadhyay, R.R. Mazumdar, Analysis of load balancing in large heterogeneous processor sharing systems,...
  • M. Harchol-Balter

    Performance Modeling and Design of Computer Systems: Queueing Theory in Action

    (2013)
  • M. Harchol-Balter, Task assignment with unknown duration, J. ACM...
  • M. El-Taha et al.

    Allocation of service time in a multiserver system

    Manag. Sci.

    (2006)
  • E. Bachmat et al.

    Analysis of size interval task assigment policies

    Perform. Eval. Rev.

    (2008)
  • V. Cardellini et al.

    The state of the art in locally distributed web-server systems

    ACM Comput. Surv.

    (2002)
  • Cited by (7)

    • A multi-station system for reducing congestion in high-variability queues

      2017, European Journal of Operational Research
      Citation Excerpt :

      For example, Zhang and Fan (2008) use load balancing as an effective way to manage web traffic on a system with two web servers, while Down and Lewis (2006) analyze dynamic load balancing strategies in a more general context. In addition, Doroudi, Hyytiä, and Harchol-Balter (2014) recently discuss load balancing in a server farm and account for the heterogeneity in the job importance (value). Examples of load balancing schemes that are more related to this papers are by Harchol-Balter et al. (1999) and Ciardo, Riska, and Smirni (2001) who illustrate the advantages of load balancing for queueing systems with high variability service time.

    • On Load Management in Service Oriented Networks (Short Paper)

      2016, Proceedings - 2016 5th IEEE International Conference on Cloud Networking, CloudNet 2016
    • Simulation and selection of efficient decision rules in bank's manual underwriting process

      2016, Proceedings - 30th European Conference on Modelling and Simulation, ECMS 2016
    • A queueing model for the capacity planning of a multi-phase human services process

      2015, International Journal of Systems Science: Operations and Logistics
    View all citing articles on Scopus

    Sherwin Doroudi is a Ph.D. student in Operations Management at the Tepper School of Business, Carnegie Mellon University, where he is advised by Mor Harchol-Balter and Mustafa Akan. His research is focused on the analysis of queueing systems with user heterogeneity. Currently, his work involves using techniques from both queueing theory and mechanism design in order to study service systems where delay-sensitive customers behave strategically.

    Esa Hyytiä received the M.Sc. (Tech.) degree in engineering physics and Dr.Sc. (Tech.) degree in electrical engineering from the Helsinki University of Technology, in 1998 and 2004, respectively. In 2013, he was awarded a docentship in performance analysis of communication networks at the Aalto University School of Electrical Engineering. In 1997, he joined the Laboratory of Telecommunications of Helsinki University of Technology (TKK). From 2005 to 2006, he was with the Norwegian University of Science and Technology (NTNU), Norway as a postdoc researcher, and from 2005 to 2009, with the Telecommunication Research Center Vienna (FTW), Austria, as a senior researcher. Currently, he is working as a senior research scientist at Aalto University, Finland. His research interests include queueing theory and performance analysis and optimization of computer and communications systems.

    Mor Harchol-Balter is a Professor of Computer Science at Carnegie Mellon University, where she has been since 1999. From 2008 to 2011, she served as the Associate Department Head for Computer Science. Mor received her doctorate in Computer Science from the University of California at Berkeley in 1996 and then spent three years at MIT under the NSF Postdoctoral Fellowship in the Mathematical Sciences. Mor is heavily involved in the ACM SIGMETRICS research community, where she served as Technical Program Chair for Sigmetrics 2007 and as General Chair for Sigmetrics 2013. Mor’s work involves the design, analysis, and implementation of new resource allocation policies for distributed systems, including load balancing policies, power management policies, and scheduling policies. She has co-authored over 100 publications in top journals and conferences, including a textbook, “Performance Analysis and Design of Computer Systems,” published by Cambridge University Press in 2013.

    View full text