Dynamic thread assignment in web server performance optimization

doi:10.1016/j.peva.2008.11.001

Performance Evaluation

Volume 66, Issue 6, June 2009, Pages 301-310

https://doi.org/10.1016/j.peva.2008.11.001 Get rights and content

Abstract

Popular web sites are expected to handle huge number of requests concurrently within a reasonable time frame. The performance of these web sites is largely dependent on effective thread management of their web servers. Although the implementation of static and dynamic thread policies is common practice, remarkably little is known about the implications on performance. Moreover, the commonly used policies do not take into account the complex interaction between the threads that compete for access to a shared resource.

We propose new dynamic thread-assignment policies that minimize the average response time of web servers. The web server is modeled as a two-layered tandem of multi-threading queues, where the active threads compete for access to a common resource. This type of two-layered queueing model, which occurs naturally in the performance modeling of systems with intensive software–hardware interaction, are on the one hand appealing from an application point of view, but on the other hand are challenging from a methodological point of view. Our results show that the optimal dynamic thread-assignment policies yield strong reductions in the response times. Validation on an Apache web server shows that our dynamic thread policies confirm our analytical results.

Introduction

The rise of Internet and broadband communication technology have boosted the use of web-based services that combine and integrate information from geographically distributed information systems. As a consequence, popular web sites are expected to handle huge numbers of requests simultaneously without noticeable degradation of the response-time performance. Moreover, web servers must perform significant CPU- and disk I/O-intensive processing, caused by the emergence of server-side scripting technologies (e.g., Java servlets, Active Server Pages, PHP). Furthermore, web pages involving recent and personalized information (location information, headline news, hotel reservations) are created dynamically on-the-fly and hence are not cacheable. This limits the effectiveness of caching infrastructures that are usually implemented to boost the response-time performance of commercial web sites and limit bandwidth consumption. At the same time, as a result of the recent advances in wired networking technology, there is usually ample core network bandwidth available at reasonable prices. As a consequence of these developments, web servers tend to become performance bottlenecks in many cases. These observations raise the need for web-based service providers to control the performance of their web servers.

Web servers are typically equipped with a pool of threads. In many cases, a request is composed of a number of processing steps that are performed in sequential order. For example, see Fig. 1, an HTTP GET request may require processing in several steps: a document-retrieval step and a sequence of script-processing steps to create dynamic content. Similarly, an HTTP POST request may require a document-processing step and several database update queries. To handle the incoming requests, web servers usually implement a number of thread pools that are dedicated to process a specific processing step [1], [2].

The performance of the web server is largely dependent on the thread-management policy. This policy may be either static (i.e., with a fixed number of threads—possibly of different types) or dynamic (i.e., where threads may be created or killed depending on the state of the server). Traditionally, many web servers implement a simple static thread-assignment policy, where the size of the thread pool (i.e., the maximum number of threads that can simultaneously execute processing steps) is a configurable system parameter. This leads to a trade-off regarding the proper dimensioning of thread pools to optimize performance: on the one hand, assigning too few threads may lead to relative starvation of processing power, creating a performance bottleneck that may increase the average response time of requests, particularly when the workload increases. On the other hand, if the total number of threads running on a single hardware component is too large, performance degradation may occur due to superfluous context switching overhead and memory or disk I/O activity. Nowadays, more efficient thread policies are widely implemented. In order to effectively react to sudden bursts of transaction requests, many web servers implement simple dynamic thread-management algorithms that allow threads to be created or killed, depending on the actual number of active threads. However, even though the implementation of these thread policies is common practice, a thorough understanding of the implications of the proper choice of thread-assignment policies and the settings of the parameters on the performance of the web server is mostly lacking. In particular, the trade-off between relative starvation of processing power in the case of too few threads and the performance degradation in the case of too many threads is not fully understood (see [3] for recent results on software bottlenecks). Moreover, the commonly used thread policies do not take into account the probability distribution of the service times required by the different requests, while significant performance improvements can be obtained by doing so.

A key feature of multi-threaded web servers is that the threads typically share a common hardware (e.g., a CPU and disk) with a limited amount of capacity. This naturally leads to the formulation of a two-layered tandem of multi-server queues, where the active threads share the processor capacity in a processing sharing (PS) fashion; i.e., when there are $k$ threads active at some moment in time, then each of these $k$ threads receives a fair share $1 / k$ of the total processor capacity [4]. In this model, transaction requests are represented by customers, threads are represented by servers, and response times are represented by the sojourn times of the customers. To identify optimal thread-assignment policies, we describe the evolution of the system as a Markov decision model and derive optimal thread policies from the properties of the relative value function. In doing so, we show that the structure of the optimal thread policy strongly depends on the service-time distributions of the different processing steps in the web server; in practice, these distributions can be monitored and updated on-the-fly.

An interesting feature of this model is that it has a two-layered structure, modeling the complex interaction between contention at the hardware (CPU, disk, memory) layer and the software entities (threads) layer. At the software layer, the processing steps, comprising a request, are processed by different (say $N$ ) types of threads. However, the active threads effectively share the underlying resource: the more threads that are active, the smaller the processor capacity that is assigned to each thread. In this way, the thread is no longer an autonomous entity operating at a fixed rate; instead, the processing rate of each thread continuously changes over time. Evidently, for $N = 1$ , the model coincides with the classical processor-sharing discipline; but for $N > 1$ , the processing speed of one thread pool depends on the state of the other thread pools. This type of interaction makes the model rather complicated, and highly challenging from a methodological point of view.

In this paper our objective is to construct a model that, on the one hand captures key features of web servers, but on the other hand is simple enough to allow for an analytic analysis. To this end, we make a number of simplifying assumptions: (1) each request traverses a fixed and known path, and (2) the active threads effectively share a single bottleneck hardware resource (CPU, disk) in a PS fashion (see also page 20 of [5]). Although these assumptions may not be entirely realistic in some practical situations (see also Section 5 for a discussion about the model assumptions and Section 6 for model extensions) the results presented in this paper provide important initial insights into the structure of optimal thread assignment policies for web servers (see also Section 6). In this context, the contribution of the paper should be viewed as a first important step to the development of thread policies for more complex web server models.

Although the theory of job scheduling with autonomous independent servers is well-matured, in the literature only a few papers deal with scheduling of web servers. Harchol-Balter et al. [6], [7], [8] and Crovella et al. [9] study scheduling policies for web servers to reduce the response-time performance of web servers with static web pages, provided the size of a web page is known a-priori; for this type of model, the results show that the classical Shortest Remaining Processing Time (SRPT) policy is very effective [10]. In contrast to the present paper, it should be noted that the results in [6] are based on the assumption that the network interface, rather than the web server itself, is the performance bottleneck; this leads to fundamentally different performance models than the one considered in the present paper. In this context, the contribution of the present paper complements the results obtained in the above references. Menascé [11] gives an overview of issues involved in modeling web servers. Cao et al. [12] propose to model a web server by a simple M/G/1/K/PS-queue, and validate the model through lab experiments. Detailed performance models for web servers, explicitly including the interaction between software and hardware contention, were proposed in [4], [1]; these modeling efforts naturally led to the formulation of two-layered queueing models.

Several other papers also focus on queueing networks with a layered structure. Rolia and Sevcik [13] propose the Method of Layers (MoL), i.e., a closed queueing-network model based on the responsiveness of client–server applications. Woodside et al. [14] propose the so-called Stochastic Rendez-Vous Network (SRVN) model to analyze the performance of application software with client–server synchronization. Ramesh and Perros [15] model a web server system where clients and servers communicate via synchronous and asynchronous communication, and where the servers form a multi-layered hierarchical structure. They propose an approximate method for calculating the mean response time based on a decomposition approach. Dilley et al. [5] describe custom instrumentation to collect workload metrics and model parameters from large-scale web servers. They develop a layered queueing model (LQM) of a web server and use the model to predict the impact of a single web server thread pool size on the server and client response times. Franks et al. [3] focus on the correct definition and detection of bottlenecks in the context of layered queueing models. Related models are the so-called coupled-processor models, i.e., multi-server models where the speed of a server at a queue depends on the number of servers at the other queues (see [16], [17], [18]). For a two-layered network of two multi-server queues with processor sharing, remarkable results on the per-queue stability were obtained in [19].

Although a lot of progress has been made in understanding and improving the performance of web servers in the references outlined above, to the best of the authors’ knowledge, the problem of dynamic control of threads in layered queueing networks has not been addressed in the literature.

In this paper, we model a web server by a two-layered queueing network with a single processor-shared resource. We describe the evolution of the system as a Markov decision process from which we obtain simple and readily implementable dynamic thread-assignment policies that minimize the expected response time of the requests. The service-time distributions are modeled by the class of phase-type distributions, which is a broad class of distributions and also allows one to study the impact of heavy-tailed distributions. The results show not only that, but also how the optimal policy depends on the service-time distributions at each of the processing steps. The proposed policy uses monitored information on both the number of active threads and the probability distribution of the required service time per request. Our results show that the optimal dynamic thread-assignment policies yield strong reductions in the response times. To validate the model, we have tested the performance of our policies in an experimental setting on an Apache web server. The experimental results show that our policies indeed lead to significant reductions of the response time, which demonstrates the practical usefulness of the results.

The contribution of this paper is of both methodological and practical interest. First, from an application point of view, we derive explicit dynamic optimal thread-assignment policies for web servers, and show by experiments with an Apache web server that these polices indeed lead to significant reductions of the response times of the web server. Second, from a methodological point of view, the optimal thread-assignment policies derived in this paper are among the few exact detailed results for queueing networks with interacting servers; a class of queueing models for which hardly any exact results are known today. As such, the results derived in this paper can be seen as pioneering analytical contributions in the field of multi-layered queueing models. These observations make the contribution of this paper evident.

The remainder of this paper is organized as follows. In Section 2 we formulate the model. Section 3 derives optimal dynamic thread-assignment policies. In Section 4 we consider numerical experiments and evaluate them on an Apache web server. In Section 5 we discuss the model assumptions and the computational complexity of the thread-assignment policies. We conclude in Section 6 and give ideas for further research directions.

Section snippets

Model description

In this section we model the problem of dynamic thread assignment in the context of a multi-layered queueing system with a shared PS resource. For this purpose, consider a network of $N$ queues in tandem with a common shared processor for serving arriving requests. Requests arrive according to a Poisson process with rate $λ$ to the first queue. At each queue, threads can be spawned which may be assigned to a request. When a request is assigned to a thread at queue $i$ , it receives service $S_{i}$ with

Dynamic thread management

In this section, we focus on dynamic thread assignment. We determine, using dynamic programming, optimal policies minimizing the expected response time per request. The performance of the optimal policies is compared to the performance of policies that only serve requests based on the number of threads outstanding. A specific example of the latter case is the policy that serves one request with only one outstanding thread until it leaves the system, resulting in a first-come-first-served (FCFS)

Numerical experiments

In the previous section, we determined optimal policies for general phase-type service distributions. In this section, we compare these policies with other thread-assignment rules that are frequently used. First, for various parameter settings, we analytically show that the optimal policies outperform the simple thread-assignment rules. Then, we compare the theoretically obtained improvements with those that are obtained in an experimental setting on an Apache web server.

Discussion

In this section we discuss the computational complexity of the optimal policy derived in Theorem 3.1 and discuss possible model extensions.

Remark 5.1 Computational Complexity

The optimal policy of Theorem 3.1 is explicit for the case of exponentially and Erlang distributed service times. For other service distributions, the optimal policy can be computed efficiently by a recursive scheme starting with queue $N$ and working backwards to queue 1. Thus, decision rule $φ_{i}$ does not depend on the states $(n^{(j)}, k^{(j)})$ for $j < i$ . This

Conclusions and further research

We have considered the problem of dynamic thread assignment in web servers such that the expected response time is minimized. This problem can be translated into a Markov decision process problem for multi-layered queueing networks, a class of queueing networks for which hardly any exact detailed results have been obtained so far. We show that for phase-type service-time distributions, the optimal policy spawns a thread for a request if the resulting expected sojourn time of that request

Acknowledgments

The authors would like to the thank the anonymous referees for their useful suggestions, which have led to significant improvements of the paper.

Wemke van der Weij (1982) received her B.Sc. degree in Econometrics and Operations Research and her M.Sc. in Operations Research, both from the University of Amsterdam. In 2005 she started a Ph.D. program at the Center for Mathematics and Computer Science (CWI), also being a member of the Optimization of Business Processes group at the VU University, Amsterdam. Her research interests are performance analysis of stochastic networks, and in particular the analysis of queueing networks with shared

References (23)

J. Dilley et al.
Web server performance measurement and modeling techniques
Performance Evaluation
(1998)
R. Hariharan, W.K. Ehrlich, P.K. Reeser, R.D. van der Mei, Performance of web servers in a distributed computing...
R.D. van der Mei et al.
A decision support system for tuning Web servers in distributed object-oriented network architectures
ACM Performance Evaluation Review
(2000)
G. Franks, D.C. Petriu, C.M. Woodside, J. Xu, P. Tregunno, Layered bottlenecks and their mitigation, in: Proc. of 3rd...
R.D. van der Mei et al.
Web server performance modeling
Telecommunication Systems
(2001)
M. Harchol-Balter et al.
Size-based scheduling to improve web performance
ACM Transactions on Computer Systems
(2003)
N. Bansal, M. Harchol-Balter, Analysis of SRPT scheduling: Investigating unfairness, in: Proceedings of ACM Sigmetrics...
M. Harchol-Balter et al.
SRPT scheduling for web servers
M.E. Crovella, R. Frangioso, M. Harchol-Balter, Connection scheduling in web servers, in: Proceedings USENIX symposium...
L.E. Schrage
The queue M/G/1 with the shortest remaining processing time discipline
Operation Research
(1966)

D.A. Menascé

Web performance modeling issues

International Journal of High Performance Computing Applications

(2000)

Cited by (15)

Product-form results for two-station networks with shared resources
2012, Performance Evaluation
Citation Excerpt :
Initial results presented in [44] show that significant performance gains can be obtained by these dynamic schemes compared to state-independent schemes.
Queueing networks are studied with two stations: either in tandem or in parallel, and with a common service resource shared among the two stations. First, a necessary and sufficient criterion, called adjoint reversibility, is provided to decide whether the system possesses a product form or not. This criterion unifies both the parallel (a reversible) and the tandem (a non-reversible) system in one product-form theorem. Next, the criterion is applied separately for the parallel and tandem system to obtain a number of new product-form examples which also includes non-balanced capacity sharing. Despite, but also due to, the different parallel and tandem mechanisms we observe that for certain examples the product form has the same structure, while for others there are essential differences. In addition, it is also proven that several models cannot have a product-form result. The results provide new insights and a step forward in understanding the behavior of multi-layered queueing networks in which resources are shared among stations.
A weighted-fair-queuing (WFQ)-based dynamic request scheduling approach in a multi-core system
2012, Future Generation Computer Systems
Citation Excerpt :
Many dynamic requests include some personalized information (such as location and personal data), so the contents of the dynamic requests cannot be known in advance and must be retrieved from the web servers. They must be generated dynamically each time and cannot be fully cached [10]. In general, many dynamic requests are very simple, and they do not require intensive server resources, such as sum of bill items, but some dynamic requests are very complex, and they require intensive use of web server resources, such as the content of an e-commerce secure site which require Secure Socket Layer Protocol (SSL) processing with intensive CPU use.
A popular website is expected to simultaneously deal with a large number of dynamic requests in the reasonable mean response time. The performance of websites mainly depends on hardware performance and the processing strategy of dynamic requests. In order to improve the hardware performance, more and more web servers are adopting multi-core CPUs. Moreover, the scheduling algorithm of requests on the first-come–first-served (FCFS) basis is still utilized. Although FCFS is a reasonable and fair strategy for request sequences, it takes into account neither the distribution of the dynamic request service times nor the characteristics of multi-core CPUs. In the present paper, in order to solve the above-mentioned problems, a new dynamic request scheduling approach is proposed. The new scheduling approach, according to the distribution of the dynamic request service time, schedules the dynamic requests based on a weighted-fair-queuing (WFQ) system, and exploits the performance of multi-core CPUs by means of the hard affinity method in the O/S. Simulation experiments have been done to evaluate the new scheduling approach, and the results obtained prove that the new scheduling approach could eliminate the ping-pong effect and efficiently reduce the mean response time.
A fully Bayesian approach to inference for Coxian phase-type distributions with covariate dependent mean
2009, Computational Statistics and Data Analysis
Citation Excerpt :
For example, phase-type models have been used in web site performance optimisation (Van der Weij et al., 2009), wireless communication system control (Tan et al., in press), line transect sampling (Skaug, 2006), gene finding (Munch and Krogh, 2006) and ion channel modelling (Ball et al., 2000).
Phase-type distributions represent the time to absorption for a finite state Markov chain in continuous time, generalising the exponential distribution and providing a flexible and useful modelling tool. We present a new reversible jump Markov chain Monte Carlo scheme for performing a fully Bayesian analysis of the popular Coxian subclass of phase-type models; the convenient Coxian representation involves fewer parameters than a more general phase-type model. The key novelty of our approach is that we model covariate dependence in the mean whilst using the Coxian phase-type model as a very general residual distribution. Such incorporation of covariates into the model has not previously been attempted in the Bayesian literature. A further novelty is that we also propose a reversible jump scheme for investigating structural changes to the model brought about by the introduction of Erlang phases. Our approach addresses more questions of inference than previous Bayesian treatments of this model and is automatic in nature. We analyse an example dataset comprising lengths of hospital stays of a sample of patients collected from two Australian hospitals to produce a model for a patient’s expected length of stay which incorporates the effects of several covariates. This leads to interesting conclusions about what contributes to length of hospital stay with implications for hospital planning. We compare our results with an alternative classical analysis of these data.
Performance evaluation of thread pool configurations in the run-time systems of integration platforms
2021, International Journal of Business Process Integration and Management
Bounds and limit theorems for a layered queueing model in electric vehicle charging
2019, Queueing Systems
Bounds and Limit Theorems for a Layered Queueing Model in Electric Vehicle Charging
2018, arXiv

View all citing articles on Scopus

Sandjai Bhulai (1976) received his M.Sc. degrees in Mathematics and in Business Mathematics and Informatics, both from the VU University Amsterdam, The Netherlands. He carried out his Ph.D. research on “Markov decision processes: the control of high-dimensional systems” at the same university for which he received his Ph.D. degree in 2002. After that he has been a postdoctoral researcher at Lucent Technologies, Bell Laboratories as NWO Talent Stipend fellow. In 2003 he joined the Mathematics department at the VU University Amsterdam, where he is an assistant professor in Applied Probability and Operations Research. His primary research interests are in the general area of stochastic modeling and optimization, in particular, the theory and applications of Markov decision processes. His favorite application areas include telecommunication networks and call centers. He is currently involved in the control of time-varying systems, partial information models, dynamic programming value functions, and reinforcement learning.

Rob van der Mei (1966) received his M.Sc. degrees in Mathematics and in Econometrics, both from the VU University Amsterdam. In 1995 he received his Ph.D. degree from University of Tilburg, The Netherlands. After that he has been working as a consultant and researcher in the telecommunication industry, for The Royal Dutch PTT, AT & T Labs and TNO for over a decade. In 2004 he joined the Centre for Mathematics and Computer Science (CWI), where he is currently the head of the Department of Probability and Stochastic Networks, and the leader of the research theme Societal Logistics. He also has a part-time assignment as a full professor at the VU University, Amsterdam, The Netherlands, where he is responsible for the research and education in the field of communication networks, with a particular focus on performance aspects. He is a member of the editorial board of Performance Evaluation and the AEUE Journal on Electronics and Communications. His research interests include performance analysis of communication networks, health care logistics and queueing theory, and more recently, he has started several projects in Revenue Management, Grid computing and sensor networks. He is teaching a variety of industry-oriented courses on Performance Management and Design of ICT systems for system architects, and is consulting for ICT companies on a regular basis. He has published over 80 refereed papers in the field.

View full text

Dynamic thread assignment in web server performance optimization

Abstract

Introduction

Section snippets

Model description

Dynamic thread management

Numerical experiments

Discussion

Conclusions and further research

Acknowledgments

Performance Evaluation

A decision support system for tuning Web servers in distributed object-oriented network architectures

ACM Performance Evaluation Review

Web server performance modeling

Telecommunication Systems

Size-based scheduling to improve web performance

ACM Transactions on Computer Systems

SRPT scheduling for web servers

The queue M/G/1 with the shortest remaining processing time discipline

Operation Research

Web performance modeling issues

International Journal of High Performance Computing Applications