Abstract
It is well known that the increasing gap between processor and main-memory speeds is one of the primary bottlenecks to good overall computer-system performance. The traditional solution to this problem is to build small, fast memories (caches) to hold recently-used data and instructions close to the processor for quicker access [64]. During the past decade, microprocessor clock rates have increased at a rate of 40% per year, while main-memory (DRAM) speeds have increased at a rate of only about 11% per year [76]. This trend has made modern computer systems increasingly dependent on caches. A case in point: disabling the cache of the VAX 11/780, a machine introduced in the late 1970’s, would have increased its workload run times by a factor of only 1.6 [32], while disabling the cache of the HP 9000/735, a more recent machine introduced in the early 1990’s, would cause workloads to slow by a factor of 15 [76].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Agarwal, A.: Analysis of cache performance for operating systems and multiprogramming. Ph.D. dissertation, Stanford. 1989
Agarwal, A., Horowitz, M., and Hennessy, J.: An analytical cache model. ACM Transactions on Computer Systems 7(2): 184–215, 1989
Agarwal, A. and Huffman, M.: Blocking: Exploiting spatial locality for trace compaction. In Proc. of the 1990 SIGMETRICS Conf. on Measurement and Modeling of Computer Systems, Boulder, CO, ACM, 48–57, 1990.
Baker, M.: Cluster Computing Review. Northeast Parallel Architectures Center (NPAC) Technical Report SCCS-748, November, 1995
Becker, J. and Park, A.: An analysis of the information content of address and data reference streams. In Proc. of the 1993 SIGMETRICS Conf. on the Measurement and Modeling of Computer Systems, Santa Clara, CA, 262–263, 1993
Bedichek, R.: The Meerkat multicomputer: Tradeoffs in multicomputer architecture. Ph.D. dissertation, University of Washington Department of Computer Science Technical Report 94-06-06, August 1994
Bedichek, R.: Talisman: fast and accurate multicomputer simulation. In Proc. of the 1995 SIGMETRICS Conf. on Measurement and Modeling of Computer Systems, 14–24, 1995
Borg, A., Kessler, R., Lazana, G., and Wall, D.: Long address traces from RISC machines: generation and analysis. DEC Western Research Lab Technical Report 89/14, 1989
Borg, A., Kessler, R., and Wall, D.: Generation and analysis of very long address traces. In Proc. of the 17th Ann. Int. Symp. on Computer Architecture, IEEE, 1990
Chen, B.: Software methods for system address tracing. In Proc. of the Fourth Workshop on Workstation Operating Systems, Napa, California, 1993
Chen, B. and Bershad, B.: The impact of operating system structure on memory system performance. In Proc. of the 14th Symp. on Operating System Principles, 1993
Chen, B.: Memory behavior of an X11 window system. In Proc. of the USENIX Winter 1994 Technical Conf., 1994
Clark, D. W., Bannon, P. J., and Keller, J. B.: Measuring VAX 8800 performance with a histogram hardware monitor. In Proc. of the 15th Ann. Int. Symp. on Computer Architecture, Honolulu, Hawaii, IEEE, 176–185, 1985
Cmelik, R. and Keppel, D.: Shade: A fast instruction-set simulator for execution profiling. University of Washington Technical Report UWCSE 93-06-06. 1993
Cmelik, B. and Keppel, D.: Shade: A fast instruction-set simulator for execution profiling. In Proc. of the 1994 SIGMETRICS Conf. on Measurement and Modeling of Computer Systems, Nashville, TN, ACM, 128–137, 1994
Cvetanovic, Z. and Bhandarkar, D.: Characterization of Alpha AXP performance using TP and SPEC Workloads. In Proc. of the 21st Ann. Int. Symp. on Computer Architecture, Chicago, IL, IEEE, 1994
Davies, P., Lacroute, P., Heinlein, J., Horowitz, M.: Mable: A technique for efficient machine simulation. Stanford University Technical Report CSL-TR-94-636, October, 1994
Davis, H., Goldschmidt, S., and Hennessy, J.: Multiprocessor simulation and tracing using Tango. In Proc. of the 1991 Int. Conf. on Parallel Processing, 99–107, 1991
Digital: Alpha Architecture Handbook. USA, Digital Equipment Corporation, 1992
Eggers, S., Keppel, D., Koldinger, E., and Levy, H.: Techniques for eficient inline tracing on a shared-memory multiprocessor. In Proc. of the 1990 SIGMETRICS Conf. on Measurement and Modeling of Computer Systems, Boulder, CO, 37–47, 1990
Emer, J. and Clark, D.: A characterization of processor performance in the VAX-11/780. In Proc. of the 11th Ann. Symp. on Computer Architecture, Ann Arbor, MI, 301–309, 1984
Eustace, A. and Srivastava, A.: ATOM: a flexible interface for building high performance program analysis tools. In Proc. of the USENIX Winter 1995 Technical Conf. on UNIX and Advanced Computing Systems, New Orleans, Louisiana, 303–314, January, 1995
Flanagan, J. K., Nelson, B. E., Archibald, J. K., and Grimsrud, K.: BACH: BYU address collection hardware, the collection of complete traces. In Proc. of the 6th Int. Conf. on Modelling Techniques and Tools for Computer Performance Evaluation, 128–137, 1992
Gee, J., Hill, M., Pnevmatikatos, D., and Smith, A. J.: Cache performance of the SPEC92 benchmark suite. IEEE Micro (August): 17–27, 1993
Goldschmidt, S. and Hennessy, J.: The accuracy of trace-driven simulation of multiprocessors. Stanford University Technical Report CSL-TR-92-546, September 1992
Goldschmidt, S. and Hennessy, J.: The accuracy of trace-driven simulation of multiprocessors. In Proc. of the 1993 ACM SIGMETRICS Conf. on Measurement and Modeling of Computer Systems, 146–157, May 1993
Hammerstrom, D. and Davidson, E.: Information content of CPU memory referencing behavior. In Proc. of the 4th Int. Symp. on Computer Architecture, 184–192, 1977
Hill, M.: Aspects of cache memory and instruction buffer performance. Ph.D. dissertation, The University of California at Berkeley. 1987
Hill, M. and Smith, A.: Evaluating associativity in CPU caches. IEEE Transactions on Computers 38(12): 1612–1630, 1989
Holliday, M.: Techniques for cache and memory simulation using address reference traces. Int. Journal in Computer Simulation 1: 129–151, 1991
IBM: IBM RISC System/6000 Technology. Austin, TX, IBM, 1990
Jouppi, N.: Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In Proc. of the 17th Ann. Int. Symp. on Computer Architecture, Seattle, WA, IEEE, 364–373, 1990
Kaeli, D.: Issues in trace-driven simulation. In Proc. of the 22rd Ann. Pittsburgh Modeling and Simulation Conf., Vol. 22, Part 5, May, 2533–2540, 1991
Kessler, R.: Analysis of multi-megabyte secondary CPU cache memories. Ph.D. dissertation, University of Wisconsin-Madison. 1991
Laha, S., Patel, J., and Iyer, R.: Accurate low-cost methods for performance evaluation of cache memory systems. IEEE Transactions on Computers 37(11): 1325–1336, 1988
Larus, J. R.: Abstract execution: A technique for efficiently tracing programs. Software Practice and Experience, 20(12):1241–1258, December, 1990
Larus, J.: SPIM S20: A MIPS R2000 Simulator. University of Wisconsin-Madison Technical Report, Revision 9. 1991
Larus, J. R.: Efficient program tracing. IEEE Computer, May: 52–60, 1993
Larus, J. R. and Schnorr, E.: EEL: Machine independent executable editing. In Proc. SIGPLAN Conf. on Programming Language Design and Implementation, June, 1995
Lebeck, A. and Wood, D.: Fast-Cache: A new abstraction for memory-system simulation. University of Wisconsin-Madison Technical Report 1211, 1994
Lebeck, A. and Wood, D.: Active Memory: A new abstraction for memory-system simulation. In Proc. of the 1995 SIGMETRICS Conf. on the Measurement and Modeling of Computer Systems, May, 220–230, 1995
Lee, C.-C.: A case study of a hardware-managed TLB in a multi-tasking environment. University of Michigan Technical Report. 1994
Magnusson, P.: A design for efficient simulation of a multiprocessor. In Proc. of the 1993 Western Simulation Multiconference on Int. Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, 69–78, La Jolla, California, 1993
Martonosi, M., Gupta, A., and Anderson, T.: MemSpy: Analyzing memory system bottlenecks in programs. In Proc. of the 1992 SIGMETRICS Conf. on the Measurement and Modeling of Computer Systems, ACM, 1992
Martonosi, M., Gupta, A., and Anderson, T.: Effectiveness of trace sampling for performance debugging tools. In Proc. of the 1993 SIGMETRICS Conf. on the Measurement and Modeling of Computer Systems, Santa Clara, California, ACM, 248–259, 1993
Mattson, R. L., Gecsei, J., Slutz, D. R., and Traiger, I. L.: Evaluation techniques for storage hierarchies. IBM Systems Journal 9(2): 78–117, 1970
Maynard, A. M., Donnelly, C., and Olszewski, B.: Contrasting characteristics and cache performance of technical and multi-user commercial workloads. In Proc. of the Sixth Int. Conf. on Architectural Support for Programming Languages and Operating Systems, San Jose, CA, ACM, 145–156, 1994
MIPS: RISCompiler Languages Programmer’s Guide. MIPS, 1988
Mogul, J. C. and Borg, A.: The effect of context switches on cache performance. In Proc. of the 4th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Santa Clara, California, ACM, 75–84, 1991
Nagle, D., Uhlig, R., and Mudge, T.: Monster: A tool for analyzing the interaction between operating systems and computer architectures. University of Michigan Technical Report CSE-TR-147-92. 1992
Nagle, D., Uhlig, R., Stanley, T., Sechrest, S., Mudge, T., and Brown, R.: Design tradeoffs for software-managed TLBs. In Proc. of the 20th Ann. Int. Symp. on Computer Architecture, San Diego, California, IEEE, 27–38, 1993
Nagle, D., Uhlig, R., Mudge, T., and Sechrest, S.: Optimal allocation of on-chip memory for multiple-API operating systems. In Proc. of the 21st Int. Symp. on Computer Architecture, Chicago, IL, 1994
Pierce, J. and Mudge, T.: IDtrace — A tracing tool for i486 simulation. University of Michigan Technical Report CSE-TR-203-94. 1994
Pierce, J., Smith, M. D., and Mudge, T.: “Instrumentation tools,” in Fast Simulation of Computer Architectures (T. M. Conte and C. E. Gimarc, eds.), Kluwer Academic Publishers: Boston, MA, 1995
Pleszkun, A.: Techniques for compressing program address traces. Technical Report, Department of Electrical and Computer Engineering, University of Colorado-Boulder. 1994
Puzak, T.: Analysis of cache replacement algorithms. Ph.D. dissertation, University of Massachusetts. 1985
Reinhardt, S., Hill, M., Larus, J., Lebeck, A., Lewis, J., and Wood, D.: The Wisconsin Wind Tunnel: Virtual prototyping of parallel computers. In Proc. of the 1993 SIGMETRICS Int. Conf. on Measurement and Modeling of Computer Systems, Santa Clara, CA, ACM, 48–60, 1993
Reinhardt, S., Pfile, R., and Wood, D.: Decoupled hardware support for distributed shared memory. To appear in Proc. of the 23rd Ann. Int. Symp. on Computer Architecture, 1996
Romer, T., Lee, D., Voelker, G., Wolman, A., Wong, W., Baer, J., Bershad, B., and Levy, H.: The structure and performance of interpreters. To appear in the Proc. of the 7th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, Cambridge, MA, October, 1996
Rosenblum, M., Herrod, S., Witchel, E., and Gupta, A.: Complete computer simulation: the SimOS approach, In IEEE Parallel and Distributed Technology, Fall 1995
Samples, A.: Mache: no-loss trace compaction. In Proc. of 1989 SIGMETRICS Conf. on Measurement and Modeling of Computer Systems, ACM, 89–97, 1989
Sites, R., Chernoff, A., Kirk, M., Marks, M., and Robinson, S.: Binary translation. Digital Technical Journal 4(4): 137–152, 1992
Smith, A. J.: Two methods for the efficient analysis of memory address trace data. IEEE Transactions on Software Engineering SE-3(1): 94–101, 1977
Smith, A. J.: Cache memories. Computing Surveys 14(3): 473–530, 1982
Smith, M. D.: Tracing with pixie. Technical Report, Stanford University, Stanford, CA. 1991
Srivastava, A. and Eustace, A.: ATOM: A system for building customized program analysis tools. In Proc. of the SIGPLAN’ 94 Conf. on Programming Language Design and Implementation, 196–205, June 1994
Stephens, C., Cogswell, B., Heinlein, J., Palmer, G., and Shen, J.: Instruction level profiling and evaluation of the IBM RS/6000. In Proc. of the 18th Ann. Int. Symp. on Computer Architecture, Toronto, Canada, ACM, 180–189, 1991
Stunkel, C. and Fuchs, W.: TRAPEDS: producing traces for multicomputers via execution-driven simulation. In Proc. of the 1989 SIGMETRICS Conf. on Measurement and Modeling of Computer Systems, Berkeley, CA, ACM, 70–78, 1989
Stunkel, C., Janssens, B., and Fuchs, W. K.: Collecting address traces from parallel computers. In Proc. of the 24th Ann. Hawaii Int. Conf. on System Sciences, Hawaii, 373–383, 1991
Sugumar, R.: Multi-configuration simulation algorithms for the evaluation of computer designs. Ph.D. dissertation, University of Michigan. 1993
Talluri, M. and Hill, M.: Surpassing the TLB performance of superpages with less operating system support. In Proc. of the 6th Int. Conf. on Architectural Support for Programming Languages and Operating Systems, San Jose, CA, ACM, 1994
Thompson, J. and Smith, A.: Efficient (stack) algorithms for analysis of write-back and sector memories. ACM Transactions on Computer Systems 7(1): 78–116, 1989
Uhlig, R., Nagle, D., Mudge, T., and Sechrest, S.: Trap-driven simulation with Tapeworm II. In Proc. of the Sixth Int. Conf. on Architectural Support for Programming Languages and Operating Systems, San Jose, California, ACM Press (SIGARCH), 132–144, 1994
Uhlig, R., Nagle, D., Mudge, T. Sechrest, S., and Emer, J.: Instruction fetching: coping with code bloat. To appear in Proc. of the 22nd Int. Symp. on Computer Architecture, Santa Margherita Ligure, Italy, June, 1995
Uhlig, R., and Mudge, T.: Trace driven memory simulation: A survey. Computing Surveys 29(2) 128–170, 1997.
Upton, M. D.: Architectural trade-offs in a latency tolerant gallium arsenide microprocessor. Ph.D. Dissertation, The University of Michigan, 1994
Veenstra, J. and Fowler, R.: MINT: A front end for efficient simulation of shared-memory multiprocessors. In Proc. of the 2nd Int. Workshop on Modeling, Analysis, and Simulation of Computer and Telecommunication systems (MASCOTS), 201–207, 1994
Wall, D.: Link-time code modification. DEC Western Research Lab Technical Report 89/17. 1989
Wall, D.: Systems for late code modification. DEC Western Research Lab Technical Report 92/3. 1992
Wang, W.-H. and Baer, J.-L.: Efficient trace-driven simulation methods for cache performance analysis. In Proc. of the 1990 SIGMETRICS Conf. on Measurement and Modeling of Computer Systems, Boulder, CO, ACM, 27–36, 1990
Witchel, E. and Rosenblum, M.: Embra: fast and flexible machine simulation, In Proc. of the 1996 SIGMETRICS Conf. on Measurement and Modeling of Computer Systems, Philadelphia, May, 1996
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Uhlig, R.A., Mudge, T.N. (2000). Trace-Driven Memory Simulation: A Survey. In: Haring, G., Lindemann, C., Reiser, M. (eds) Performance Evaluation: Origins and Directions. Lecture Notes in Computer Science, vol 1769. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46506-5_5
Download citation
DOI: https://doi.org/10.1007/3-540-46506-5_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67193-0
Online ISBN: 978-3-540-46506-5
eBook Packages: Springer Book Archive