How many servers are best in a dual-priority system?
Introduction
The fundamental question of “how many servers are best” has been asked by a stream of research [20], [30], [17], [27], [26], [28] discussed in Section 2.2. All of this work considers the system where jobs are serviced in first-come-first-served (FCFS) order, and asks both whether a single fast server is preferable to slow servers (of equal total capacity) and how the optimal number of servers varies across workloads. We address these questions in the context of Poisson arrivals and phase-type (PH) service times and find that, when load and service demand variability are high, multiple slow servers are preferable. We compute the optimal number of servers as a function of the load and service demand variability, and show that using more slow servers can sometimes reduce mean response time (sojourn time) by an order of magnitude, whereas using too many slow servers can have the opposite effect.Is it preferable to use a single fast server of speed , or slow servers each of speed ? What is the optimal ?
The above result is not all-encompassing, however, since real-world systems are often not simply FCFS, where all jobs have equal importance. Rather, there is often inherent scheduling in the system to allow high priority jobs to move ahead of low priority jobs. For example, the high priority jobs may carry more importance, e.g., representing users who pay more. Alternatively, high priority jobs may simply consist of those jobs with small service demand, since giving priority to short jobs reduces mean response time overall. Either way, priorities are fundamental to many modern systems.
This motivates the question of what the optimal system configuration is (a few fast servers or many slow servers) in a setting where there are different priority classes. In this paper we specifically assume two priority classes: a PH service demand distribution for each class, Poisson arrivals, and FCFS service order within each class.
The difficulty in answering the above question under the dual-priority setting is that the exact analysis of mean response time for the with two priority classes is computationally difficult. In the first half of our paper, we present an approach for analyzing the with dual-priority classes. Our approach extends a technique used, for example, in [25], which we call “dimensionality reduction”. Dimensionality reduction allows one to track two job classes using only a one-dimensionally infinite Markov chain rather than the standard two-dimensionally infinite Markov chain. Unfortunately, the existing dimensionality reduction does not apply to the dual-priority queue because of complications arising from the PH service times (see Section 4.1). A major part of our extension of dimensionality reduction involves invoking a method of Neuts [21] for finding moments of passage times in semi-Markov processes. We will use this method to derive specialized busy period durations, opening up a whole new class of problems that dimensionality reduction can now solve. Although our analysis is an approximation, it can be made as accurate as desired, and throughout the paper the accuracy is within a few percent of simulation.
Armed with a near-exact analysis of the dual-priority queue, for the remainder of the paper we focus directly on questions involving choosing the optimal resource configuration. In particular, we are interested in the following questions:
- 1.
Under what conditions are multiple slow servers preferable to a single fast server? Is the optimal number of servers sensitive to changes in the relative arrival rates of the priority classes and changes in the variability of the service time distributions?
- 2.
Does the answer to “how many servers are optimal” differ for the different priority classes? For example, does the lower priority class prefer more or fewer servers than that preferred by the higher priority class?
- 3.
How does the optimal number of servers in a dual priority system differ from the case when all jobs have been aggregated into a single priority class?
- 4.
If one chooses a non-optimal number of servers, how does that affect the overall mean response time and the per-class mean response time?
The rest of the paper is organized as follows. In Section 2, we describe prior methods for analyzing the dual-priority queue. We also discuss prior work dealing with our question of “how many servers are best”, most of which assumes exponential service demands, and none of which deals with multiple priority classes. In Section 3, we answer question 1 above in the case of an queue with a single priority class. Section 4 presents our analysis of the queue with dual-priority classes and the validation of our analysis against simulation. In Section 5, we address all four questions above for the case of an dual-priority queue. Finally, we conclude in Section 6.
Section snippets
Prior work
Section 2.1 will discuss prior work on multiserver systems with two priority classes. While the literature is vast, almost all is limited to exponential service times. Section 2.2 discusses prior work dealing with questions related to the optimal number of servers. Here, too, the literature is vast, although all deals with only a single priority class, and the majority focuses on exponential service times.
Single priority class: How many servers are best?
In this section, we consider the simplified problem of determining the number of servers that minimizes the mean response time under just one priority class. The queue is easily analyzable via matrix analytic methods [15], as its Markov chain has a state space infinite in only one dimension. Fig. 1 shows a picture of the Markov chain that we use for analyzing an M/PH/2/FCFS queue using matrix analytic methods.
Fig. 2(a) shows the optimal number of servers as a function of the load
Analysis of the with dual priorities
In this section, we describe our analysis of the mean response time in queues having two priority classes (high priority and low priority), where high priority jobs have preemptive-resume priority over low priority jobs. Since the mean response time of the high priority jobs can be analyzed as an queue with a single priority class (as in the previous section), we concentrate here on the mean response time of the low priority jobs. Our goal in this section is to present a
How many servers are best?
In the introduction to this paper, we set out to answer four questions. We have already addressed the first of these. In answer to question 1, we have seen that, in both the case of the single priority class (Section 3) and in the case of dual-priority classes (Section 4.3), multiple slow servers can be preferable to a single fast server. Further, we have seen that the preference depends on service time variability and system load. For the case of a single priority class this dependence is
Summary and future work
We have presented the first accurate (within a few percent of simulation), computationally efficient analysis of the queue with dual priorities, which allows for phase-type service time distributions. Our method is conceptually simple: we approximate a 2D-infinite Markov chain with a 1D-infinite Markov chain by leveraging QBD passage time results to compute the various types of busy period durations needed by our approximation. Furthermore, our method is fast — requiring less than one
Acknowledgements
This work was supported by NSF Grant CCR-0311383 and grant sponsorship from IBM Corporation.
Adam Wierman is currently a doctoral student at Carnegie Mellon University. He received a BS with University Honors in Computer science and Mathematics with minors in Psychology and Statistics from Carnegie Mellon University in 2001. He is a recipient of the NSF Graduate Research Fellowship, the best student paper award at the ACM Sigmetrics conference, and multiple teaching awards, including the Alan J. Perlis Student Teaching Award. He currently works on the analysis of scheduling policies
References (30)
- et al.
Analysis of nonpreemptive priority queues with multiple servers and two priority classes
European Journal of Operational Research
(1999) - et al.
Analysis of a finite capacity nonpreemptive priority queue
Computers and Operations Research
(1984) - et al.
Multiprocessor systems with preemptive priorities
Performance Evaluation
(1981) Approximate analysis for heterogeneous multiprocessor systems with priority jobs
Performance Evaluation
(1992)- et al.
Analysis of cycle stealing with switching costs and thresholds
Performance Evaluation
(2005) - et al.
Systems with multiple servers under heavy-tailed workloads
Performance Evaluation
(2005) - A. Bondi, J. Buzen, The response times of priority classes under preemptive resume in M/G/m queues, in: Proceedings of...
- et al.
Calculating the equilibrium distribution in level dependent quasi-birth-and-death processes
Stochastic Models
(1995) - et al.
The response times of priority classes under preemptive resume in M/M/m queues
Operations Research
(1983) Optimal workload allocation in open queueing networks in multiserver queues
Management Science
(1992)
Several results on the design of queueing systems
Operations Research
Waiting-time distribution of a multi-server, priority queueing system
Operations Research
Analysis of a multiserver queue with two priority classes and (M,N)-threshold service schedule ii: preemptive priority
Asia-Pacific Journal of Operations Research
Analysis of a non-preemptive priority multiserver queue
Advances in Applied Probability
On a preemptive Markovian queues with multiple servers and two priority classes
Mathematics of Operations Research
Cited by (16)
An efficient computation algorithm for a multiserver feedback retrial queue with a large queueing capacity
2010, Applied Mathematical ModellingMulti-class resource sharing with preemptive priorities
2018, Probability in the Engineering and Informational SciencesDifferential approximation and sprinting for multi-priority big data engines
2019, Middleware 2019 - Proceedings of the 2019 20th International Middleware Conference
Adam Wierman is currently a doctoral student at Carnegie Mellon University. He received a BS with University Honors in Computer science and Mathematics with minors in Psychology and Statistics from Carnegie Mellon University in 2001. He is a recipient of the NSF Graduate Research Fellowship, the best student paper award at the ACM Sigmetrics conference, and multiple teaching awards, including the Alan J. Perlis Student Teaching Award. He currently works on the analysis of scheduling policies for queueing systems. His main focus is on understanding the impact of scheduling heuristics on efficiency and fairness.
Takayuki Osogami is a researcher at IBM Tokyo Research Laboratory. He received a B.Eng. degree in Electronic Engineering from the University of Tokyo in 1998 and a Ph.D. in Computer Science from Carnegie Mellon University in August 2005. His current research interest includes modeling, analysis, simulation, and optimization of stochastic systems, with an emphasis on an analysis of stochastic processes and its applications to the performance analysis and optimization of multi-server systems. In 1998–2001, he was also at IBM Tokyo Research Laboratory, where the principal project was development of optimization algorithms.
Mor Harchol-Balter is an Associate Professor of Computer Science at Carnegie Mellon University. She received her doctorate from the Computer Science department at the University of California at Berkeley under the direction of Manuel Blum. She is a recipient of the McCandless Chair, the NSF CAREER award, the NSF Postdoctoral Fellowship in the Mathematical Sciences, multiple best paper awards, and several teaching awards, including the Herbert A. Simon Award for Teaching Excellence.
Professor Harchol-Balter is heavily involved in the ACM SIGMETRICS research community. Her work focuses on designing new scheduling/resource allocation policies for various distributed computer systems including web servers, distributed supercomputing servers, networks of workstations, and database systems. Her work spans both queueing analysis and implementation and emphasizes integrating measured workload distributions into the problem solution.
Alan Scheller-Wolf teaches in the Operations Management area at Tepper School of Business of Carnegie Mellon University. He earned a B.S. in Mathematics and Computational Sciences and a B.A. in Art History from Stanford University. He did his doctoral studies in Operations Research at Columbia University, where he earned M.S., M.Phil., and a Ph.D. degrees. Between his undergraduate and graduate work, he served as a mathematics instructor in Botswana, Africa with the United States Peace Corps.
Professor Scheller-Wolf’s research focuses on stochastic processes, and how they can be used to estimate and improve the performance of manufacturing, service and communications systems. He is particularly interested in how systems operate when there are capacity allocation choices or constraints, alternate sourcing or delivery options, perishable inventory, or highly variable input. Professor Scheller-Wolf has or is currently working on operations consulting projects for Caterpillar, The American Red Cross, John Deere, Intel, Equitable Resources, and Air Products International. He currently serves on the editorial boards of Operations Research Letters, IIE Transactions, Production and Operations Management, Management Science, and Operations Research.