Elsevier

Performance Evaluation

Volume 67, Issue 11, November 2010, Pages 1123-1138
Performance Evaluation

Server farms with setup costs

https://doi.org/10.1016/j.peva.2010.07.004Get rights and content

Abstract

In this paper we consider server farms with a setup cost. This model is common in manufacturing systems and data centers, where there is a cost to turn servers on. Setup costs always take the form of a time delay, and sometimes there is additionally a power penalty, as in the case of data centers. Any server can be either on, off, or in setup mode. While prior work has analyzed single servers with setup costs, no analytical results are known for multi-server systems. In this paper, we derive the first closed-form solutions and approximations for the mean response time and mean power consumption in server farms with setup costs. We also analyze variants of server farms with setup, such as server farm models with staggered boot up of servers, where at most one server can be in setup mode at a time, or server farms with an infinite number of servers. For some variants, we find that the distribution of response time can be decomposed into the sum of response time for a server farm without setup and the setup time. Finally, we apply our analysis to data centers, where both response time and power consumption are key metrics. Here we analyze policy design questions such as whether it pays to turn servers off when they are idle, whether staggered boot up helps, how to optimally mix policies, and other questions related to the optimal data center size.

Introduction

Motivation

Server farms are ubiquitous in manufacturing systems, call centers and service centers. In manufacturing systems, machines are usually turned off when they have no work to do, in order to save on operating costs. Likewise, in call centers and service centers, employees can be dismissed when there are not enough customers to serve. However, there is usually a setup cost involved in turning on a machine, or in bringing back an employee. This setup cost is typically in the form of a time delay. Thus, an important question in manufacturing systems, call centers and service centers, is whether it pays to turn machines/employees “off”, when there is not enough work to do.

Server farms are also prevalent in data centers. In data centers, servers consume peak power when they are servicing a job, but still consume about 60% [1] of that peak power, when they are idle. Idle servers can be turned off to save power. Again, however, there is a setup cost involved in turning a server back on. This setup cost is in the form of a time delay and a power penalty, since the server consumes peak power during the entire duration of the setup time. An open question in data centers is whether it pays (from a delay perspective and a power perspective) to turn servers off when they are idle.

Model

Abstractly, we can model a server farm with setup costs using the M/M/k queueing system, with a Poisson arrival process with rate λ, and exponentially distributed job sizes, denoted by random variable SExp(μ). Let ρ=λμ denote the system load, where 0ρ<k. Thus, for stability, we require λ<kμ. In this model, a server can be in one of four states: on, idle, off, or in setup. A server is in the on state when it is serving jobs. When the server is on, it consumes power Pon. If there are no jobs to serve, the server can either remain idle, or be turned off, where there is no time delay to turn a server off. If a server remains idle, it consumes non-zero power Pidle, which is assumed to be less than Pon. If the server is turned off, it consumes zero power. So 0=Poff<Pidle<Pon.

To turn on an off server, the server must first be put in setup mode. While in setup, a server cannot serve jobs. The time it takes for a server in setup mode to turn on is called the setup time, and during that entire time, power Pon is consumed. We model the setup time as an exponentially distributed random variable, I, with rate α=1E[I].

We model our server farm using an M/M/k with a single central First Come First Served (FCFS) queue, from which servers pick jobs when they become free. Fig. 1 illustrates our server farm model. Every server is either on, idle, off, or in setup mode.

We consider the following three operating policies:

  • 1.

    ON/IDLE: Under this policy, servers are never turned off. Servers all start in the idle mode, and remain in the idle mode when there are no jobs to serve. All servers are either on or idle. We model this policy by using the M/M/k queueing system. The response time analysis is well known, and the analysis of power consumption is straightforward, since it only requires knowing the expected number of servers which are on as opposed to idle.

  • 2.

    ON/OFF: Under this policy, servers are immediately turned off when not in use. However, there is a setup cost (in terms of delay and power) for turning on an off server. At any point in time there are ik on servers, and ji jobs in the system, where k is the total number of servers in the system. The number of servers in setup is then min{ji,ki}. The above facts follow from the property that any server not in use is immediately switched off. In more detail, there are three types of jobs: those who are currently running at an on server (we call these “running” jobs), those that are currently waiting for a server to setup (we call these “setting up” jobs), and those jobs in the queue who couldn’t find a server to setup (we call these “waiting” jobs). An arriving job will always try to turn on an off server, if there is one available, by putting it into setup mode. Later arrivals may not be able to turn on a server, since all servers might already be on or in setup mode, and hence will become “waiting” jobs. Let B be denote the first (to arrive) of the “setting up” jobs, if there is one, and let C be the first of the “waiting” jobs, if there is one. When a “running” job, A, completes service, its server, sA, is transferred to B, if B exists, or else to C, if C exists, or else is turned off if neither B nor C exists. If sA was transferred to B, then B’s server, sB, is now handed over to job C, if it exists, otherwise sB is turned off. This will become clearer when we consider the Markov chain model for the ON/OFF policy.

  • 3.

    ON/OFF/STAG: This model is known as the “staggered boot up” model in data centers, or “staggered spin up” in disk farms [2], [3]. The ON/OFF/STAG policy is the same as the ON/OFF policy, except that in the ON/OFF/STAG policy, at most 1 server can be in setup at any point of time. Thus, if there are i on servers, and j jobs in the system, then under the ON/OFF/STAG policy, there will be min{1,ki} servers in setup, where k is the total number of servers in the system. The ON/OFF/STAG is believed to avoid excessive power consumption.

Of the above policies, the ON/OFF policy is the most difficult to analyze. In order to analyze this policy, it will be useful to first analyze the limiting behavior of the system as the number of servers goes to infinity. We will analyze two models with infinite servers:
  • 4.

    ON/OFF(): This model can be viewed as the ON/OFF policy model with an infinite number of servers. Thus, in this model, we can have an infinite number of servers in setup.

  • 5.

    ON/OFF()/kSTAG: The ON/OFF()/kSTAG is the same as the ON/OFF(), except that in the ON/OFF()/kSTAG, at most k servers can be in setup at any point of time.

The infinite server models are also useful for modeling large data centers, where the number of servers is usually in the thousands [4], [5]. Throughout this paper, we will use the notation Tpolicy (respectively, Ppolicy) to denote the response time (respectively, power consumption), where the placeholder “policy” will be replaced by one of the above policies, e.g., ON/OFF.

Prior work

Prior work on server farms with setup costs has focussed largely on single servers. There is very little work on multi-server systems with setup costs. In particular, no closed-form solutions are known for the ON/OFF and the ON/OFF(). For the ON/OFF/STAG, Gandhi and Harchol-Balter have obtained closed-form solutions for the mean response time [6], but no results are known for the distribution of response time.

Results

For the ON/OFF/STAG, we provide the first analysis of the distribution of response time. In particular, we prove that the distribution of response time can be decomposed into the sum of response time for the ON/IDLE and the setup time (see Section 4). For the ON/OFF(), we provide closed-form solutions for the limiting probabilities, and also observe an interesting decomposition property on the number of jobs in the system. These can then be used to derive the mean response time and mean power consumption in the ON/OFF() (see Section 5). For the ON/OFF, we come up with closed-form approximations for the mean response time which work well under all ranges of load and setup times, except the regime where both the load and the setup time are high. Understanding the ON/OFF in the regime where both the load and the setup time are high is less important, since in this regime, as we will show, it pays to leave servers on (ON/IDLE policy). Both of our approximations for the ON/OFF are based on the truncation of systems where we have an infinite number of servers (see Section 6). Finally, we analyze the limiting behavior of server farms with setup costs as the number of jobs in the system becomes very high. One would think that all k servers should be on in this case. Surprisingly, our derivations show that the limit of the expected number of on servers converges to a quantity that can be much less than k. This type of limiting analysis leads to yet another approximation for the mean response time for the ON/OFF (see Section 7).

Impact/Application

Using our analysis of server farms with setup costs, we answer many interesting policy design questions that arise in data centers. Each question is answered both with respect to mean response time and mean power consumption. These include, for example, “Under what conditions is it beneficial to turn servers off, to save power? (ON/IDLE vs. ON/OFF)”; “Does it pay to limit the number of servers that can be in setup? (ON/OFF vs. ON/OFF/STAG)”; “Can one create a superior strategy by mixing two strategies with a threshold for switching between them?”; “How are results affected by the number of servers, load, and setup time?” (see Section 8).

Section snippets

Prior work

Prior work on server farms with setup costs has focussed largely on single servers. There is very little work on multi-server systems with setup costs.

Single server with setup costs: For a single server, Welch [7] considered the M/G/1 queue with general setup times, and showed that the mean response time can be decomposed into the sum of mean response time for the M/G/1 and the mean of the residual setup time. In [8], Takagi considers a multi-class M/G/1 queue with setup times and a variety of

ON/IDLE

In the ON/IDLE model (see Section 1), servers become idle when they have no jobs to serve. Thus, the mean response time, E[TON/IDLE], and the mean power consumption, E[PON/IDLE], are given by: E[TON/IDLE]=π0ρkk!(1ρk)2kμ+1μ,where π0=[i=0k1ρii!+ρkk!(1ρk)]1E[PON/IDLE]=ρPon+(kρ)Pidle. In Eq. (2), observe that ρ is the expected number of on servers, and (kρ) is the expected number of idle servers.

ON/OFF/STAG

In data centers, it is common to turn idle servers off to save power. When a server is turned on again, it incurs a setup cost, both in terms of a time delay and a power penalty. If there is a sudden burst of arrivals into the system, then many servers might be turned on simultaneously, resulting in a huge power draw, since servers in setup consume peak power. To avoid excessive power draw, data center operators sometime limit the number of servers that can be in setup at any point of time.

ON/OFF()

Many data centers today, including those of Google, Microsoft, Yahoo and Amazon, consist of tens of thousands of servers [4], [5]. In such settings, we can model a server farm with setup costs as the ON/OFF() system, as shown in Fig. 3. For this model, we make an educated guess for the limiting probabilities.

Theorem 2

For the ON/OFF() Markov chain, as shown in Fig. 3, the limiting probabilities are given by:πi,j=π0,0ρii!l=1jiλλ+lα,i0,ji,andπ0,0=eρ(j=0l=1jλλ+lα)1=eρM(1,1+λα,λα),whereM(a,b,z)=

ON/OFF: approximations based on the ON/OFF/()

Under the ON/OFF model, we assume a fixed finite number of servers k, each of which can be either on, off, or in setup. Fig. 4 shows the ON/OFF Markov chain, with states (i,j), where i represents the number of servers on, and j represents the number of jobs in the system. Given that ji and ik, we have exactly min{ji,ki} servers in setup. Since the Markov chain for the ON/OFF (shown in Fig. 4) looks similar to the Markov chain for the ON/OFF/STAG (shown in Fig. 2), one would expect that the

ON/OFF: asymptotic approximation as the number of jobs approaches infinity

Thus far, we have approximated the ON/OFF model by using the truncated ON/OFF() model and the truncated ON/OFF()/kSTAG model, both of which have a 2-dimensional Markov chain. If we can approximate the ON/OFF model by using a simple 1-dimensional random walk, then we might get very simple closed-form expressions for the mean response time and the mean power consumption. To do this, we’ll need a definition:

Definition 1

For the ON/OFF,ON(n) denotes the expected number of on servers, given that there are n

Application

In data centers today, both response time and power consumption are important performance metrics. However, there is a tradeoff between leaving servers idle and turning them off. Leaving servers idle when they have no work to do results in excessive power consumption, since idle servers consume as much as 60% of peak power [1]. On the other hand, turning servers off when they have no work to do incurs a setup cost (in terms of both a time delay and peak power consumption during that time).

We

Conclusion

In this paper we consider server farms with a setup cost, which are common in manufacturing systems, call centers and data centers. In such settings, a server (or machine) can be turned off to save power (or operating costs), but turning on an off server incurs a setup cost. The setup cost usually takes the form of a time delay, and sometimes there is an additional power penalty as well. While the effect of setup costs is well understood for a single server, multi-server systems with setup

Anshul Gandhi is a Ph.D. student in the Computer Science Department at Carnegie Mellon University, under the direction of Mor Harchol-Balter. His research involves designing and implementing power management policies for datacenters as well as general performance modeling of computer systems.

References (18)

  • G. Choudhury

    On a batch arrival Poisson queue with a random setup time and vacation period

    Comput. Oper. Res.

    (1998)
  • S. Hur et al.

    The effect of different arrival rates on the N-policy of M/G/1 with server setup

    Appl. Math. Model.

    (1999)
  • L.A. Barroso et al.

    The case for energy-proportional computing

    Computer

    (2007)
  • I. Corporation, Serial ATA staggered spin-up, White Paper, September...
  • M.W. Storer, K.M. Greenan, E.L. Miller, K. Voruganti, Pergamum: replacing tape with energy efficient, reliable,...
  • CNET news, Google spotlights data center inner workings, 2008....
  • D.C. Knowledge, Who has the most web servers?, 2009....
  • A. Gandhi, M. Harchol-Balter, M/G/k with exponential setup, Tech. Rep. CMU-CS-09-166, School of Computer Science,...
  • P. Welch

    On a generalized M/G/1 queueing process in which the first customer of each busy period receives exceptional service

    Oper. Res.

    (1964)
There are more references available in the full text version of this article.

Cited by (139)

  • OPTIMAL CONTROL POLICIES for AN M/M/1 QUEUE with A REMOVABLE SERVER and DYNAMIC SERVICE RATES

    2021, Probability in the Engineering and Informational Sciences
  • The Online Pause and Resume Problem: Optimal Algorithms and An Application to Carbon-Aware Load Shifting

    2023, Proceedings of the ACM on Measurement and Analysis of Computing Systems
View all citing articles on Scopus

Anshul Gandhi is a Ph.D. student in the Computer Science Department at Carnegie Mellon University, under the direction of Mor Harchol-Balter. His research involves designing and implementing power management policies for datacenters as well as general performance modeling of computer systems.

Mor Harchol-Balter is an Associate Professor of Computer Science at Carnegie Mellon University and also serves as the Associate Department Head for the Computer Science Department. She is heavily involved in the ACM SIGMETRICS/Performance research community and recently served as Technical Program Chair for SIGMETRICS. Mor’s work focuses on designing new resource allocation policies (load balancing policies, power management policies, and scheduling policies) for server farms and distributed systems in general. Her research spans both queueing analysis and systems implementation.

Ivo Adan is an associate professor in the department of Mathematics and Computer Science of the Eindhoven University of Technology. Since 2009, he also works as a part-time full professor at the Operations Research and Management group at the University of Amsterdam. His current research interests are in the analysis of multi-dimensional Markov processes and queueing models, and in the performance evaluation of communication, production and warehousing systems. His email address is [email protected].

View full text