Skip to main content
Log in

Characterization and Evaluation of Cache Hierarchies for Web Servers

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

As Internet usage continues to expand rapidly, careful attention needs to be paid to the design of Internet servers for achieving high performance and end-user satisfaction. Currently, the memory system continues to remain a significant performance bottleneck for Internet servers employing multi-GHz processors. In this paper, our aim is two-fold: (1) to characterize the cache/memory performance of web server workloads and (2) to propose and evaluate cache design alternatives for future web servers. We chose SPECweb99 as the representative web server workload and our entire characterization and evaluation methodology is based on our CASPER simulation framework. We begin by exploring the processor cache design space for single and dual-processor servers. Based on our observations, we then evaluate other cache hierarchy alternatives such as chipset caches, coherence filters and decompressed page stores. We show the sensitivity of these components to basic organization parameters such as cache size, line size and degree of associativity. We also present the performance implications of routing memory requests initiated by I/O devices through these caches. Based on detailed simulation data and its implications on system level performance, this paper shows that chipset caches have significant potential for improving future web server performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. G. Astfalk and T. Brewer, “An overview of the HP/convex exemplar hardware,” http://www. convex.com/tech_cache/ps/hw_ov.ps, accessed in December 1997.

  2. J. L. Baer and W. H. Wang, “On the inclusion properties for multi-level cache hierarchies,” in 15th Annual Symposium on Computer Architecture, Silver Springs, 1988.

  3. F. Dahlgren and P. Stenstrom, “Effectiveness of hardware-based stride and sequential prefetching in shared-memory multiprocessors,” in 1st Symposium on High Performance Comput. Architecture, January 1994.

  4. Y. Hu, A. Nand et al. “Measurement, analysis and performance improvement of the Apache web server,” in IEEE International Performance, Computing & Communications Conference, February 1999.

  5. C. Huitema, “Network vs. server issues in end-to-end performance,” in Workshop on Performance and Architecture on Web Servers (PAWS), June 2000.

  6. R. Iyer, “Exploring the cache design space for web servers,” in Proceedings of the 15th International Parallel and Distributed Processing Symposium (IPDPS’01), May 2001, invited paper.

  7. R. Iyer, “CASPER: Cache architecture simulation and performance exploration using refstreams,” in Proceedings of the Intel Design and Test Technology Conference (DTTC), July 2002.

  8. D. Joseph and D. Grunwald, “Prefetching using Markov predictors,” in Proceedings of the 24th International Symposium on Computer Architecture (ISCA), 1997.

  9. N. Jouppi, “Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers,” in Proceedings of the 17th Annual International Symposium on Computer Architecture, 1990, pp. 364–373.

  10. K. Kant, R. Iyer, and P. Mohapatra, “Architectural impact of secure socket layer in Internet servers,” in Proceedings of the International Conference on Computer Design, 2000.

  11. K. Kant and C. Sundaram, “A server performance model for static web workloads,” in Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS 2000), April 2000.

  12. K. Kant and Y. Won, “Performance impact of uncached file accesses in SPECweb99,” in Second Annual Workshop on Workload Characterization (WWC), October 1999.

  13. D. Lenoski, J. Laudon, K. Gharachorloo, W. Weber, A. Gupta et al., “The Stanford DASH multiprocessor,” Computer 25(3) (1992) 63–79.

    Article  Google Scholar 

  14. W. F. Lin, S. K. Reinhardt, and D. C. Burger, “Designing a modern memory hierarchy with hardware prefetching,” IEEE Transactions on Computers 50(11), Special Issue onMemory Systems, November 2001.

  15. T. Lovett and R. Clapp, “STiNG: A CC-NUMA computer system for the commercial marketplace,” in Proceedings of the 23rd International Symposium on Computer Architecture, May 1996, pp. 308–317.

  16. M. Michael and A. K. Nanda, “Design and performance of directory caches for scalable shared memory multiprocessors,” in Proceedings of the 5th International Symposium on High Performance Comput. Architecture, January 1999.

  17. A. Moga and M. Dubois, “The effectiveness of SRAM network caches on clustered DSMs,” in Proceedings of the Fourth International Symposium on High Performance Computer Architecture, February 1998, pp. 103–112.

  18. P. Mohapatra, H. Thanthry, and K. Kant, “Characterization of bus transactions for SPECweb96 benchmark,” in 2nd Workshop on Workload Characterization (WWC), October 1999.

  19. B. Nayfeh, K. Olukotun et al., “The impact of shared cache clustering in small-scale shared memory multiprocessors,” in Proceedings of the International Conference on High Performance Computer Architecture (HPCA-1), February 1996.

  20. M. Pirvu and L. Bhuyan, “Hardware spatial forwarding for widely shared data,” in Proceedings of 14th International Conference on Supercomputing (ICS), May 2000.

  21. R. Radhakrishnan and L. John, “A performance study of modern Web applications,” in Proceedings of the EuroPar’99.

  22. “SPECweb99 Design Document,” available online on the SPEC website at http://www.specbench. org/osg/web99/docs/whitepaper.html

  23. “TPC-C Design Document,” available online on the TPC website at www.tpc.org/tpcc/

  24. “TPC-W Design Document,” available online on the TPC website at www.tpc.org/tpcw/

  25. R. B. Tremaine, P. A. Franaszek, J. T. Robinson, C. O. Schulz, T. B. Smith, M. Wazlowski, and P. M. Bland, “IBM memory expansion technology (MXT),” IBM Journal of Research and Development 45(2) (2001) 271–285.

    Google Scholar 

  26. D. M. Tullsen and S. J. Eggers, “Limitations of cache prefetching on a bus-based multiprocessor,” in Proceedings of the 20th Annual International Symposium on Computer Architecture, 1993, pp. 278–288.

  27. U. Vallamsetty, P. Mohapatra, R. Iyer, and K. Kant, “Improving the cache performance of network intensive workloads,” in Proceedings of the International Conference on Parallel Processing, 2001.

  28. WebBench, PC Magazine benchmarks, http://www.veritest.com/benchmarks/webbench/webbench.asp

  29. WebStone, Mindcraft benchmarks, http://www.mindcraft.com/webstone/

  30. S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta, “The SPLASH-2 programs: characterization and methodological considerations,” in Proceedings of the 22nd International Symposium on Computer Architecture, June 1995, pp. 24–36.

  31. Z. Zhang and J. Torellas, “Reducing remote conflict misses: NUMA with remote cache versus COMA,” in Proceedings of the Third International Symposium on High Performance Computer Architecture, January 1997, pp. 272–281.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Iyer, R. Characterization and Evaluation of Cache Hierarchies for Web Servers. World Wide Web 7, 259–280 (2004). https://doi.org/10.1023/B:WWWJ.0000028180.97418.53

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1023/B:WWWJ.0000028180.97418.53

Navigation