Abstract
Modern architecture research relies heavily on detailed pipeline simulation. Simulating the full execution of an industry standard benchmark can take weeks to months. Statistical sampling and sample techniques like SimPoint that pick small sets of execution samples have been shown to provide accurate results while significantly reducing simulation time. The inefficiencies in sampling are (a) needing the correct memory image to execute the sample, and (b) needing a warm architecture state when simulating the sample.
In this paper we examine efficient Sampling Startup techniques addressing two issues: how to represent the correct memory image during simulation, and how to deal with warmup. Representing the correct memory image ensures the memory values consumed during the sample’s simulation are correct. Warmup techniques focus on reducing error due to the architecture state not being fully representative of the complete execution that proceeds the sample to be simulated. This paper presents several Sampling Startup techniques and compares them against previously proposed techniques. The end result is a practical sampled simulation methodology that provides accurate performance estimates of complete benchmark executions in the order of minutes.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Conte, T.M., Hirsch, M.A., Menezes, K.N.: Reducing state loss for effective trace sampling of superscalar processors. In: ICCD 1996 (1996)
Lafage, T., Seznec, A.: Choosing representative slices of program execution for microarchitecture simulations: A preliminary application to the data stream. In: WWC-3 (2000)
Sherwood, T., Perelman, E., Hamerly, G., Calder, B.: Automatically characterizing large scale program behavior. In: ASPLOS-X (2002)
Wunderlich, R.E., Wenisch, T.F., Falsafi, B., Hoe, J.C.: SMARTS: Accelerating microarchitecture simulation via rigorous statistical sampling. In: ISCA-30 (2003)
Eeckhout, L., Eyerman, S., Callens, B., De Bosschere, K.: Accurately warmed-up trace samples for the evaluation of cache memories. In: HPC 2003, pp. 267–274 (2003)
Haskins, J., Skadron, K.: Memory reference reuse latency: Accelerated sampled microarchitecture simulation. In: ISPASS 2003 (2003)
Haskins, J., Skadron, K.: Accelerated warmup for sampled microarchitecture simulation. ACM Transactions on Architecture and Code Optimization (TACO) 2, 78–108 (2005)
Burger, D.C., Austin, T.M.: The SimpleScalar tool set, version 2.0. Technical Report CS-TR-97-1342, University of Wisconsin, Madison (1997)
Lau, J., Sampson, J., Perelman, E., Hamerly, G., Calder, B.: The strong correlation between code signatures and performance. In: ISPASS 2005 (2005)
Patil, H., Cohn, R., Charney, M., Kapoor, R., Sun, A., Karunanidhi, A.: Pinpointing representative portions of large Intel Itanium programs with dynamic instrumentation. In: MICRO-37 (2004)
Yi, J.J., Kodakara, S.V., Sendag, R., Lilja, D.J., Hawkins, D.M.: Characterizing and comparing prevailing simulation techniques. In: HPCA-11 (2005)
Szwed, P.K., Marques, D., Buels, R.M., McKee, S.A., Schulz, M.: SimSnap: Fast-forwarding via native execution and application-level checkpointing. In: INTERACT-8 (2004)
Durbhakula, M., Pai, V.S., Adve, S.: Improving the accuracy vs. speed tradeoff for simulating shared-memory multiprocessors with ILP processors. In: HPCA-5 (1999)
Fujimoto, R.M., Campbell, W.B.: Direct execution models of processor behavior and performance. In: Proceedings of the 1987 Winter Simulation Conference, pp. 751–758 (1987)
Mukherjee, S.S., Reinhardt, S.K., Falsafi, B., Litzkow, M., Huss-Lederman, S., Hill, M.D., Larus, J.R., Wood, D.A.: Wisconsin wind tunnel II: A fast and portable parallel architecture simulator. In: PAID 1997, Huss-Lederman, S (1997)
Schnarr, E., Larus, J.R.: Fast out-of-order processor simulation using memoization. In: ASPLOS-VIII (1998)
Witchel, E., Rosenblum, M.: Embra: Fast and flexible machine simulation. In: SIGMETRICS 1996, pp. 68–79 (1996)
Nohl, A., Braun, G., Schliebusch, O., Leupers, R., Meyr, H., Hoffmann, A.: A universal technique for fast and flexible instruction-set architecture simulation. In: DAC-41 (2002)
Reshadi, M., Mishra, P., Dutt, N.: Instruction set compiled simulation: A technique for fast and flexible instruction set simulation. In: DAC-40 (2003)
Ringenberg, J., Pelosi, C., Oehmke, D., Mudge, T.: Intrinsic checkpointing: A methodology for decreasing simulation time through binary modification. In: ISPASS 2005 (2005)
Eeckhout, L., Luo, Y., De Bosschere, K., John, L.K.: Blrl: Accurate and efficient warmup for sampled processor simulation. The Computer Journal 48, 451–459 (2005)
Conte, T.M., Hirsch, M.A., Hwu, W.W.: Combining trace sampling with single pass methods for efficient cache simulation. IEEE Transactions on Computers 47, 714–720 (1998)
Kessler, R.E., Hill, M.D., Wood, D.A.: A comparison of trace-sampling techniques for multi-megabyte caches. IEEE Transactions on Computers 43, 664–675 (1994)
Luo, Y., John, L.K., Eeckhout, L.: Self-monitored adaptive cache warm-up for microprocessor simulation. In: SBAC-PAD 2004, pp. 10–17 (2004)
Nguyen, A.T., Bose, P., Ekanadham, K., Nanda, A., Michael, M.: Accuracy and speed-up of parallel trace-driven architectural simulation. In: IPPS 1997, pp. 39–44 (1997)
Laha, S., Patel, J.H., Iyer, R.K.: Accurate low-cost methods for performance evaluation of cache memory systems. IEEE Transactions on Computers 37, 1325–1336 (1988)
Wood, D.A., Hill, M.D., Kessler, R.E.: A model for estimating trace-sample miss ratios. In: SIGMETRICS 1991, pp. 79–89 (1991)
Lauterbach, G.: Accelerating architectural simulation by parallel execution of trace samples. In: Hawaii International Conference on System Sciences (1994)
Barr, K.C., Pan, H., Zhang, M., Asanovic, K.: Accelerating multiprocessor simulation with a memory timestamp record. In: ISPASS 2005 (2005)
Wenisch, T.F., Wunderlich, R.E., Falsafi, B., Hoe, J.C.: TurboSMARTS: Accurate microarchitecture simulation sampling in minutes. In: SIGMETRICS (2005)
Narayanasamy, S., Pokam, G., Calder, B.: Bugnet: Continuously recording program execution for deterministic replay debugging. In: ISCA (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Van Biesbrouck, M., Eeckhout, L., Calder, B. (2005). Efficient Sampling Startup for Sampled Processor Simulation. In: Conte, T., Navarro, N., Hwu, Wm.W., Valero, M., Ungerer, T. (eds) High Performance Embedded Architectures and Compilers. HiPEAC 2005. Lecture Notes in Computer Science, vol 3793. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11587514_5
Download citation
DOI: https://doi.org/10.1007/11587514_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30317-6
Online ISBN: 978-3-540-32272-6
eBook Packages: Computer ScienceComputer Science (R0)