Evaluating IA-32 web servers through simics: a practical experience

https://doi.org/10.1016/j.sysarc.2004.09.003Get rights and content

Abstract

Nowadays, the use of multiprocessor systems is not just limited to typical scientific applications, but these systems are increasingly being used for executing commercial applications, such as databases and web servers. Therefore, it becomes essential to study the behavior of multiprocessor architectures under commercial workloads. To accomplish this, we need simulators able to model not only the CPU, memory and interconnection network but also other aspects that are critical in the execution of commercial workloads, such as I/O subsystem and operating system. In this paper, we present our first experiences using Simics, a simulator which allows full-system simulation of multiprocessor architectures covering all the topics previously mentioned. Using Simics we carry out a detailed performance study of a static web content server, showing how changes in some architectural parameters, such as number of processors and cache size, affect final performance. The results we have obtained corroborate the intuition of increasing performance of a dual-processor web server opposite to a single-processor one, and at the same time, allow us to check out Simics limitations. Finally, we compare these results with those that are obtained on real machines.

Introduction

Multiprocessor systems have been traditionally used in scientific areas: weather study and modelling, universe modelling, molecular algorithms, etc. This kind of problems can be easily reproduced and studied using user-level simulators like RSIM [8] and scientific benchmarks as those provided by the SPLASH-2 suite [17].

However, multiprocessor systems are also currently being used for executing other kind of applications, usually known as commercial, among which we can find web servers, for example. It is well-known that the importance of the Internet has grown exponentially in the last years, to the point of becoming a part of our every day lives. Nowadays, all the medium-sized and large-scale companies, even the small ones, have a web portal that is suitable as a “shop window” for customers around the world. This situation can be extended to all types of organizations: governments, academic institutions, etc. Large organizations, which expect to receive a huge number of user connections everyday, need to have a powerful server, which usually is implemented as a multiprocessor.

As a consequence of the increasing use of multiprocessors in this field, simulating multiprocessor architectures running web servers accurately becomes important. Opposite to scientific applications, there are some characteristics of commercial workloads that make challenging their simulation. In particular, the activity of the operating system is very important in these applications, as well as the interaction with memory hierarchy, storage system and communication network. It is thus required that the simulators used in these studies model all these aspects if accurate simulation results want to be obtained. Simics [10] is a full-system simulator which allows to simulate operating system, memory hierarchy, storage, buses, microprocessor, communication network, and so on. Simics is increasingly being used as a platform for simulating multiprocessor architectures running commercial applications, and it is currently used in more than 300 universities all over the world.

In this paper, we study the possibilities Simics offers to characterize web servers. First of all, we describe the main characteristics of the simulator. Secondly, using Simics we evaluate the performance of a dual-processor web server and we compare it with the performance obtained when using a single-processor one. The results that are obtained corroborate the intuition that a dual-processor web server obtain higher performance. Finally, we repeat the experiments using real machines which allows us to highlight some of the current limitations of Simics.

The rest of the paper is organized as follows. Next Section presents some related work in the evaluation of commercial applications with Simics. Section 3 deals with the simulator’s main characteristics. Section 4 describes the commercial workload we have used: a static web content server, being Apache the web server and httperf the utility which places the workload at the server. Section 5 contains the evaluation results. Section 6 compares the results obtained using Simics with those obtained using real web servers. Finally, Section 7 concludes the paper.

Section snippets

Related work

Up to not long ago, the methodology used for evaluating commercial workloads in multiprocessors consisted in generating firstly memory references of applications, and then, using these references to feed a user-level simulator such as RSIM [8]. For example, in [13] Ranganathan et al. study the performance of On-Line Transaction Processing (OLTP) and Decision Support Systems (DSS) based on this methodology.

The appearance of full-system simulators like SimOS [14] or Simics [10] has significantly

Simics simulator

Simics is a platform which allows us to develop both hardware and software, providing the necessary components for the simulation of both elements in the same context. The different functionalities offered by Simics are grouped into modules. A module is a file written in C which implements a class that defines an object type.

This tool allows simulating several architectures (single-processor and SMPs), as well as to execute upon them operating systems and commercial applications, which can vary

Apache web server

Apache [5] is a static and dynamic web content server, although in the evaluations performed in this paper it has been used only as static server. The server has been compiled including all the options indicated by the server development group in order to increase performance [6]. Among these options, it is recommended the use of the worker multiprocessing module instead of the prefork one, which is the default option. After compiling the server with the worker module, the incoming connections

Simulation results and analysis

In this Section, we present the results that have been obtained using Simics. First of all, we show the variations in Apache response time as a function of the number of requests that are simultaneously sent to the server. Using this metric, we compare the three hardware configurations presented. Then we provide detailed statistics of the CPU and caches for each one of the configurations.

Clearly, this comparison could have been carried out using real computers (as we will see later); however,

Comparing simulation results to real servers

Once we have seen how Simics can help us to analyse the behavior of a commercial web server, we want to check how accurate are the results that the simulator provides. In order to accomplish this, we have repeated the experiments presented in Section 5, but this time we have employed real computers. In particular, we have evaluated only the single-processor architecture with an L2 cache of 512 KB, and the dual-processor server. Hardware configurations of the computers used in these experiments

Conclusions

Multiprocessor systems are increasingly being used for executing commercial applications, such as databases and web servers, so it becomes essential to study the behavior of multiprocessor architectures under commercial workloads. For this, we need simulators able to model not only the CPU, memory and interconnection network but also other aspects that are critical in the execution of commercial workloads, such as I/O subsystem and operating system.

In this paper we have introduced the

Acknowledgments

The authors would like to thank the anonymous referees for their comments and suggestions, which have helped to improve the quality of the paper. This work has been supported in part by the Spanish Ministry of Ciencia y Tecnología and the European Union (Feder Funds) under grant TIC2003-08154-C06-03.

Francisco J. Villa received the MS degree in Computer Science in 2003 from the University of Murcia in Spain. Since 2003 he is a PhD student at the Research Group on Parallel Computing Architecture, working on evaluating and designing high-performance memory hierarchies for multiprocessor-on-a-chip architectures. His research interests are multiprocessor memory systems, chip-multiprocessor architectures, and power-aware cache-coherence protocol design.

References (18)

  • M.E. Acacio, J. González, J.M. García, J. Duato, A novel approach to reduce l2 miss latency in shared-memory...
  • A.R. Alameldeen et al.

    Simulating a $2M Commercial Server on a $2K PC

    IEEE Computer

    (2003)
  • A.R. Alameldeen, C.J. Mauer, M. Xu, P.J. Harper, M.M.K. Martin, D.J. Sorin, M.D. Hill, D.A. Wood, Evaluating...
  • A.R. Alameldeen, D.A. Wood, Variability in architectural simulations of multi-threaded workloads, in: 9th International...
  • Apache HTTP Server Project. Available from...
  • Apache Performance Notes. Available from...
  • Y. Hu, A. Nanda, Q. Yang, Measurement, analysis and performance improvement of the apache web server, in: 18th IEEE...
  • C.J. Hughes et al.

    RSIM: simulating shared-memory multiprocessors with ILP proccesors

    IEEE Computer

    (2002)
  • M. Karlsson, K.E. Moore, E. Hagersten, D.A. Wood, Memory system behavior of java-based middleware, in: 9th Annual...
There are more references available in the full text version of this article.

Cited by (4)

  • Protocol offload analysis by simulation

    2009, Journal of Systems Architecture
    Citation Excerpt :

    Nevertheless, Simics is a functional simulator in itself and does not provide an accurate timing model. In [47], some limitations are reported with respect to the capabilities of Simics in the model of x86 processors (out-of-order microarchitectural issues such as branch prediction, reorder buffer, functional units, etc. are not properly modeled) and in the simulation of cc-NUMA computers with accurate cache miss models. In these cases, the functionality of Simics should be extended to allow accurate evaluations of some commercial workloads.

  • Analyzing the benefits of protocol offload by full-system simulation

    2007, Proceedings - 15th EUROMICRO International Conference on Parallel, Distributed and Network-Based Processing, PDP 2007
  • Protocol offload evaluation using simics

    2006, Proceedings - IEEE International Conference on Cluster Computing, ICCC

Francisco J. Villa received the MS degree in Computer Science in 2003 from the University of Murcia in Spain. Since 2003 he is a PhD student at the Research Group on Parallel Computing Architecture, working on evaluating and designing high-performance memory hierarchies for multiprocessor-on-a-chip architectures. His research interests are multiprocessor memory systems, chip-multiprocessor architectures, and power-aware cache-coherence protocol design.

José M. García received the MS and the PhD degrees in electrical engineering from the Technical University of Valencia (Spain), in 1987 and 1991, respectively. At present, Dr. García is a Full Professor at the Computer Engineering Department of the Universidad de Murcia (Spain), and also the Head of the Research Group on Parallel Computing Architecture. He specializes in computer architecture, parallel processing and interconnection networks. He has developed several courses on computer structure, peripheral devices, computer architecture, and multicomputer design. Dr. García served as Vice-dean of the School of Computer Science from 1995 to 1997, and also as Director of the Computer Engineering Department from 1998 to 2004. His current research interests lie in high-performance coherence protocols for shared-memory multiprocessor systems, and high-speed interconnection networks. He has published more than 60 refereed papers in different journals and conferences in these fields. Dr. García is member of several international associations such as the IEEE and ACM, and also member of some European associations (Euromicro and ATI).

Manuel E. Acacio received the MS and PhD degrees in computer science from the Universidad de Murcia, Spain, in 1998 and 2003, respectively. He joined the Computer Engineering Department, Universidad de Murcia, in 1998, where he is currently an Assistant Professor of computer architecture and technology. His research interests include prediction and speculation in multiprocessor memory systems, multiprocessor-on-a-chip architectures, and power-aware cache-coherence protocol design.

View full text