Server farms with setup costs
Introduction
Motivation
Server farms are ubiquitous in manufacturing systems, call centers and service centers. In manufacturing systems, machines are usually turned off when they have no work to do, in order to save on operating costs. Likewise, in call centers and service centers, employees can be dismissed when there are not enough customers to serve. However, there is usually a setup cost involved in turning on a machine, or in bringing back an employee. This setup cost is typically in the form of a time delay. Thus, an important question in manufacturing systems, call centers and service centers, is whether it pays to turn machines/employees “off”, when there is not enough work to do.
Server farms are also prevalent in data centers. In data centers, servers consume peak power when they are servicing a job, but still consume about 60% [1] of that peak power, when they are idle. Idle servers can be turned off to save power. Again, however, there is a setup cost involved in turning a server back on. This setup cost is in the form of a time delay and a power penalty, since the server consumes peak power during the entire duration of the setup time. An open question in data centers is whether it pays (from a delay perspective and a power perspective) to turn servers off when they are idle.
Model
Abstractly, we can model a server farm with setup costs using the queueing system, with a Poisson arrival process with rate , and exponentially distributed job sizes, denoted by random variable . Let denote the system load, where . Thus, for stability, we require . In this model, a server can be in one of four states: on, idle, off, or in setup. A server is in the on state when it is serving jobs. When the server is on, it consumes power . If there are no jobs to serve, the server can either remain idle, or be turned off, where there is no time delay to turn a server off. If a server remains idle, it consumes non-zero power , which is assumed to be less than . If the server is turned off, it consumes zero power. So .
To turn on an off server, the server must first be put in setup mode. While in setup, a server cannot serve jobs. The time it takes for a server in setup mode to turn on is called the setup time, and during that entire time, power is consumed. We model the setup time as an exponentially distributed random variable, , with rate .
We model our server farm using an with a single central First Come First Served (FCFS) queue, from which servers pick jobs when they become free. Fig. 1 illustrates our server farm model. Every server is either on, idle, off, or in setup mode.
We consider the following three operating policies:
- 1.
: Under this policy, servers are never turned off. Servers all start in the idle mode, and remain in the idle mode when there are no jobs to serve. All servers are either on or idle. We model this policy by using the queueing system. The response time analysis is well known, and the analysis of power consumption is straightforward, since it only requires knowing the expected number of servers which are on as opposed to idle.
- 2.
: Under this policy, servers are immediately turned off when not in use. However, there is a setup cost (in terms of delay and power) for turning on an off server. At any point in time there are on servers, and jobs in the system, where is the total number of servers in the system. The number of servers in setup is then . The above facts follow from the property that any server not in use is immediately switched off. In more detail, there are three types of jobs: those who are currently running at an on server (we call these “running” jobs), those that are currently waiting for a server to setup (we call these “setting up” jobs), and those jobs in the queue who couldn’t find a server to setup (we call these “waiting” jobs). An arriving job will always try to turn on an off server, if there is one available, by putting it into setup mode. Later arrivals may not be able to turn on a server, since all servers might already be on or in setup mode, and hence will become “waiting” jobs. Let be denote the first (to arrive) of the “setting up” jobs, if there is one, and let be the first of the “waiting” jobs, if there is one. When a “running” job, , completes service, its server, , is transferred to , if exists, or else to , if exists, or else is turned off if neither nor exists. If was transferred to , then ’s server, , is now handed over to job , if it exists, otherwise is turned off. This will become clearer when we consider the Markov chain model for the policy.
- 3.
: This model is known as the “staggered boot up” model in data centers, or “staggered spin up” in disk farms [2], [3]. The policy is the same as the policy, except that in the policy, at most 1 server can be in setup at any point of time. Thus, if there are on servers, and jobs in the system, then under the policy, there will be servers in setup, where is the total number of servers in the system. The is believed to avoid excessive power consumption.
- 4.
: This model can be viewed as the policy model with an infinite number of servers. Thus, in this model, we can have an infinite number of servers in setup.
- 5.
: The is the same as the , except that in the , at most servers can be in setup at any point of time.
Prior work
Prior work on server farms with setup costs has focussed largely on single servers. There is very little work on multi-server systems with setup costs. In particular, no closed-form solutions are known for the and the . For the , Gandhi and Harchol-Balter have obtained closed-form solutions for the mean response time [6], but no results are known for the distribution of response time.
Results
For the , we provide the first analysis of the distribution of response time. In particular, we prove that the distribution of response time can be decomposed into the sum of response time for the and the setup time (see Section 4). For the , we provide closed-form solutions for the limiting probabilities, and also observe an interesting decomposition property on the number of jobs in the system. These can then be used to derive the mean response time and mean power consumption in the (see Section 5). For the , we come up with closed-form approximations for the mean response time which work well under all ranges of load and setup times, except the regime where both the load and the setup time are high. Understanding the in the regime where both the load and the setup time are high is less important, since in this regime, as we will show, it pays to leave servers on ( policy). Both of our approximations for the are based on the truncation of systems where we have an infinite number of servers (see Section 6). Finally, we analyze the limiting behavior of server farms with setup costs as the number of jobs in the system becomes very high. One would think that all servers should be on in this case. Surprisingly, our derivations show that the limit of the expected number of on servers converges to a quantity that can be much less than . This type of limiting analysis leads to yet another approximation for the mean response time for the (see Section 7).
Impact/Application
Using our analysis of server farms with setup costs, we answer many interesting policy design questions that arise in data centers. Each question is answered both with respect to mean response time and mean power consumption. These include, for example, “Under what conditions is it beneficial to turn servers off, to save power? ( vs. )”; “Does it pay to limit the number of servers that can be in setup? ( vs. )”; “Can one create a superior strategy by mixing two strategies with a threshold for switching between them?”; “How are results affected by the number of servers, load, and setup time?” (see Section 8).
Section snippets
Prior work
Prior work on server farms with setup costs has focussed largely on single servers. There is very little work on multi-server systems with setup costs.
Single server with setup costs: For a single server, Welch [7] considered the queue with general setup times, and showed that the mean response time can be decomposed into the sum of mean response time for the and the mean of the residual setup time. In [8], Takagi considers a multi-class queue with setup times and a variety of
In the model (see Section 1), servers become idle when they have no jobs to serve. Thus, the mean response time, , and the mean power consumption, , are given by: In Eq. (2), observe that is the expected number of on servers, and is the expected number of idle servers.
In data centers, it is common to turn idle servers off to save power. When a server is turned on again, it incurs a setup cost, both in terms of a time delay and a power penalty. If there is a sudden burst of arrivals into the system, then many servers might be turned on simultaneously, resulting in a huge power draw, since servers in setup consume peak power. To avoid excessive power draw, data center operators sometime limit the number of servers that can be in setup at any point of time.
Many data centers today, including those of Google, Microsoft, Yahoo and Amazon, consist of tens of thousands of servers [4], [5]. In such settings, we can model a server farm with setup costs as the system, as shown in Fig. 3. For this model, we make an educated guess for the limiting probabilities.
Theorem 2 For the Markov chain, as shown in Fig. 3, the limiting probabilities are given by:
: approximations based on the
Under the model, we assume a fixed finite number of servers , each of which can be either on, off, or in setup. Fig. 4 shows the Markov chain, with states , where represents the number of servers on, and represents the number of jobs in the system. Given that and , we have exactly servers in setup. Since the Markov chain for the (shown in Fig. 4) looks similar to the Markov chain for the (shown in Fig. 2), one would expect that the
: asymptotic approximation as the number of jobs approaches infinity
Thus far, we have approximated the model by using the truncated model and the truncated model, both of which have a 2-dimensional Markov chain. If we can approximate the model by using a simple 1-dimensional random walk, then we might get very simple closed-form expressions for the mean response time and the mean power consumption. To do this, we’ll need a definition: Definition 1 For the denotes the expected number of on servers, given that there are
Application
In data centers today, both response time and power consumption are important performance metrics. However, there is a tradeoff between leaving servers idle and turning them off. Leaving servers idle when they have no work to do results in excessive power consumption, since idle servers consume as much as 60% of peak power [1]. On the other hand, turning servers off when they have no work to do incurs a setup cost (in terms of both a time delay and peak power consumption during that time).
We
Conclusion
In this paper we consider server farms with a setup cost, which are common in manufacturing systems, call centers and data centers. In such settings, a server (or machine) can be turned off to save power (or operating costs), but turning on an off server incurs a setup cost. The setup cost usually takes the form of a time delay, and sometimes there is an additional power penalty as well. While the effect of setup costs is well understood for a single server, multi-server systems with setup
Anshul Gandhi is a Ph.D. student in the Computer Science Department at Carnegie Mellon University, under the direction of Mor Harchol-Balter. His research involves designing and implementing power management policies for datacenters as well as general performance modeling of computer systems.
References (18)
On a batch arrival Poisson queue with a random setup time and vacation period
Comput. Oper. Res.
(1998)- et al.
The effect of different arrival rates on the -policy of with server setup
Appl. Math. Model.
(1999) - et al.
The case for energy-proportional computing
Computer
(2007) - I. Corporation, Serial ATA staggered spin-up, White Paper, September...
- M.W. Storer, K.M. Greenan, E.L. Miller, K. Voruganti, Pergamum: replacing tape with energy efficient, reliable,...
- CNET news, Google spotlights data center inner workings, 2008....
- D.C. Knowledge, Who has the most web servers?, 2009....
- A. Gandhi, M. Harchol-Balter, M/G/k with exponential setup, Tech. Rep. CMU-CS-09-166, School of Computer Science,...
On a generalized queueing process in which the first customer of each busy period receives exceptional service
Oper. Res.
(1964)
Cited by (139)
Performance of the Gittins policy in the G/G/1 and G/G/k, with and without setup times
2024, Performance EvaluationOPTIMAL CONTROL POLICIES for AN M/M/1 QUEUE with A REMOVABLE SERVER and DYNAMIC SERVICE RATES
2021, Probability in the Engineering and Informational SciencesPlanning of Computing Power and Electric Power Resources in Data Center Parks for Electricity, Frequency Regulation and Capacity Markets
2024, Dianli Xitong Zidonghua/Automation of Electric Power SystemsThe Online Pause and Resume Problem: Optimal Algorithms and An Application to Carbon-Aware Load Shifting
2023, Proceedings of the ACM on Measurement and Analysis of Computing SystemsA resource scheduling method for cloud data centers based on thermal management
2023, Journal of Cloud Computing
Anshul Gandhi is a Ph.D. student in the Computer Science Department at Carnegie Mellon University, under the direction of Mor Harchol-Balter. His research involves designing and implementing power management policies for datacenters as well as general performance modeling of computer systems.
Mor Harchol-Balter is an Associate Professor of Computer Science at Carnegie Mellon University and also serves as the Associate Department Head for the Computer Science Department. She is heavily involved in the ACM SIGMETRICS/Performance research community and recently served as Technical Program Chair for SIGMETRICS. Mor’s work focuses on designing new resource allocation policies (load balancing policies, power management policies, and scheduling policies) for server farms and distributed systems in general. Her research spans both queueing analysis and systems implementation.
Ivo Adan is an associate professor in the department of Mathematics and Computer Science of the Eindhoven University of Technology. Since 2009, he also works as a part-time full professor at the Operations Research and Management group at the University of Amsterdam. His current research interests are in the analysis of multi-dimensional Markov processes and queueing models, and in the performance evaluation of communication, production and warehousing systems. His email address is [email protected].