article

The RASE (Rapid, Accurate Simulation Environment) for chip multiprocessors

Authors:

James LaudonAuthors Info & Claims

ACM SIGARCH Computer Architecture News, Volume 33, Issue 4

Pages 14 - 23

https://doi.org/10.1145/1105734.1105738

Published: 01 November 2005 Publication History

Abstract

We present RASE, a full system high performance simulation methodology for simulating complex server applications and server class chip multiprocessors enabled with fine-grain multithreading (CMTs). RASE combines application knowledge, operating system information, and data access patterns with an instruction stream from a highly-tuned, scalable steady-state benchmark [5] [22] to generate multiple representative instruction streams that can be mapped to a variety of CMT configurations. We use execution-driven simulation to generate instruction streams for M processors and store them as instruction trace files (several billion instructions per processor) that can be post-processed and augmented for larger than M processor system simulation. We use SPEC JBB2000, TPC-C, and an XML server benchmark to compare the performance estimates of RASE to a reference prototype CMT system. By varying M, we find that our trace-driven simulation methodology predicts within 5% of the instructions per cycle (IPC) of the reference hardware for the applications. Without post-processing the traces, in the best cases, the performance prediction accuracy degrades to 20-40% of the real IPC for instruction traces that require a high replication factor.

References

[1]

K. Aingaran, P. Kongetira, et al., "A 32-way Multirhead SPARC® Processor," 16th Hot Chips Symposium, Aug. 2004.

[2]

A. R. Alameldeen, C. J. Mauer, et. al., "Evaluating Non-deterministic Multi-threaded Commerical Workloads," Computer Architecuter Evaluation using Commerical Workloads (CAECW), February 2002.

[3]

A. R. Alameldeen and D. A. Wood, "Variability in Architectural Simulations of Multi-threaded Workloads," 9th Int'l Symp. on High Performance Computer Architecture (HPCA), Feb. 2003.

Digital Library

[4]

R. Alameldeen; M. M. K. Martin, et al., "Simulating a $2M commercial server on a $2K PC," Computer, Volume: 36, Issue: 2, Feb. 2003 Pp:50 -- 57

Digital Library

[5]

M. Annavaram, et al. "The Fuzzy Correlation between Code and Performance Predictability," MICRO-37, Dec. 2004

Digital Library

[6]

L. Barroso, K. Gharachorloo, and E. Bugnion, "Memory System Characterization of Commercial Workloads," Proc. of the 25th Annual Int'l Symp. on Computer Architecture (ISCA), June 1998, pp: 3--14.

Digital Library

[7]

L. Barroso, K. Gharachorloo, R. McNamara,. et al., "Piranha: a scalable architecture based on single-chip multiprocessing," ISCA-27, June 2000, pp: 282--293.

Digital Library

[8]

S. Basu, S. Roy, R. Kumar, T. Fisher, B. E. Blaho, "Peppermint and Sled: tools for evaluating SMP systems based on IA-64 (IPF) processors," International Proceedings on Parallel and Distributed Processing Symposium, IPDPS 2002, 15--19 April 2002, pp: 54--63

Digital Library

[9]

R. Bedichek, "SimNow#8482;: Fast Platform Simulation Purely in Software," 16th Hot Chips Symp., August 2004.

[10]

J. D. Davis, J. Laudon, and K. Olukotun, "Maximizing CMP Throughput with Mediocre Cores," Int'l Conference on Parallel Architectures and Compilation Techniques (PACT), Sept. 2005, pp. 51--62.

Digital Library

[11]

F. Eskesen, et al., "Performance Analysis of Simultaneous Multithreading in a PowerPC-based Processor," IBM Research Report, May 2002, RC22454.

[12]

J. Gibson, R. Kunz, et al. "FLASH vs. (Simulated) FLASH: Closing the Simulation Loop", In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), pp: 49--58, November 2000.

Digital Library

[13]

J. Huh, S. W. Keckler and D. Burger, "Exploring the Design Space of Future CMPs," PACT, Sept. 2001 pp. 199--210.

Digital Library

[14]

H. Khalid, "Validating trace-driven microarchitectural simulations," Micro, IEEE, Vol: 20, Issue: 6, Nov.-Dec. 2000, Page(s):76--82

Digital Library

[15]

S. Kunkel, B. Armstrong, P. Vitale, "System optimization for OLTP workloads," Micro, IEEE, Volume: 19 Issue: 3, May-June 1999 Page(s): 56--64.

Digital Library

[16]

S. Kunkel, et al., "A performance methodology for commercial servers," IBM Journal of Research and Development, Vol. 44, Number 6, 2000.

Digital Library

[17]

J. Laudon, A. Gupta, and M. Horowitz, "Interleaving: A Multithreading Technique Targeting Multiprocessors and Workstations," Proc. of the 6th Int'l Symp. on Architectural Support for Parallel Languages and Operating Systems (ASPLOS), Oct. 1994, pp: 308--318.

Digital Library

[18]

J. Lo, L Barroso, S. Eggers, K. Gharachorloo, et al., "An Analysis of Database Workload Performance on Simultaneous Multithreaded Processors," ISCA-25, Jun 1998, pp: 39--50.

Digital Library

[19]

P. Magnusson, M. Christensson, et al., "Simics: A Full System Simulation Platform," Computer, February 2002, pp: 50--58.

Digital Library

[20]

P. Ranganathan, K. Gharachorlooet al., "Performance of Database Workloads on Shared-Memory Systems with Out-of-Order Processors," ASLPOS-8, Oct.1998, pp: 307--318.

Digital Library

[21]

T. Sherwood, S. Sair, and B. Calder, "Phase Tracking and Prediction," ISCA-30, June 2003.

Digital Library

[22]

L. Spracklen and S. Abrahan, "Chip Multithreading: Opportunities and Challenges," HPCA-11, Feb. 2005

Digital Library

[23]

R. Stets, L. A. Barroso, et al., "A Detailed Comparison of TPC-C versus TPC-B," Third Workshop on CAECW, January 2000.

[24]

Standard Performance Evaluation Corporation, SPEC*, http://www.spec.org, Warrenton, VA

[25]

Sun Microsystems Inc., "XML Processing Performance in Java and .Net," http://java.sun.com/performance/reference/whitepapers/XML_Test-1_0.pdf

[26]

TransactionProcessing Performance Council, TPC-*, http://www.tpc.org, San Francisco, CA

[27]

R. E. Wunderlich, T. F. Wenisch, B. Falsafi, J. C. Hoe, "SMARTS: Accelerating Microarchitecture Simulation via Rigorous Statistical Sampling," ISCA-30, June 2003

Digital Library

[28]

Qin Xiaohan; J. L. Baer, "A comparative study of conservative and optimistic trace-driven simulations," Simulation Symposium, 1995. Proceedings of the 28th Annual, 9-13 April 1995 Page(s): 42--50

Digital Library

Cited By

Klyatis L(2011)ReferencesAccelerated Reliability and Durability Testing Technology10.1002/9780470541609.refs(393-405)Online publication date: 7-Dec-2011
https://doi.org/10.1002/9780470541609.refs
Lin YLin YLai Y(2010)Thread allocation in CMP-based multithreaded network processorsParallel Computing10.1016/j.parco.2010.01.00136:2-3(104-116)Online publication date: 1-Feb-2010
https://dl.acm.org/doi/10.1016/j.parco.2010.01.001
Lin YLin YTseng KLai Y(2009)Modeling and analysis of core-centric network processorsACM Transactions on Embedded Computing Systems10.1145/1457255.14572608:2(1-15)Online publication date: 9-Feb-2009
https://dl.acm.org/doi/10.1145/1457255.1457260
Show More Cited By

Index Terms

The RASE (Rapid, Accurate Simulation Environment) for chip multiprocessors

Recommendations

A Unitable Computing Architecture for Chip Multiprocessors

This paper proposes a unitable multi-core architecture, called hyperscalar, that can dynamically unite many scalar cores as a larger superscalar processor to accelerate a thread. To accomplish this, this paper proposes the virtual shared register files (...
Instruction Level Parallelism through Microthreading---A Scalable Approach to Chip Multiprocessors

Most microprocessor chips today use an out-of-order instruction execution mechanism. This mechanism allows superscalar processors to extract reasonably high levels of instruction level parallelism (ILP). The most significant problem with this approach ...
Chip multiprocessors with speculative multithreading: design for performance and energy efficiency

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News

ACM SIGARCH Computer Architecture News Volume 33, Issue 4

Special issue: dasCMP'05

November 2005

130 pages

ISSN:0163-5964

DOI:10.1145/1105734

Issue’s Table of Contents

Copyright © 2005 Authors.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 November 2005

Published in SIGARCH Volume 33, Issue 4

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
312
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Klyatis L(2011)ReferencesAccelerated Reliability and Durability Testing Technology10.1002/9780470541609.refs(393-405)Online publication date: 7-Dec-2011
https://doi.org/10.1002/9780470541609.refs
Lin YLin YLai Y(2010)Thread allocation in CMP-based multithreaded network processorsParallel Computing10.1016/j.parco.2010.01.00136:2-3(104-116)Online publication date: 1-Feb-2010
https://dl.acm.org/doi/10.1016/j.parco.2010.01.001
Lin YLin YTseng KLai Y(2009)Modeling and analysis of core-centric network processorsACM Transactions on Embedded Computing Systems10.1145/1457255.14572608:2(1-15)Online publication date: 9-Feb-2009
https://dl.acm.org/doi/10.1145/1457255.1457260
Han WYi YMuir MNousias IArslan TErdogan A(2009)Multicore architectures with dynamically reconfigurable array processors for wireless broadband technologiesIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2009.203236128:12(1830-1843)Online publication date: 1-Dec-2009
https://dl.acm.org/doi/10.1109/TCAD.2009.2032361
Wu JPan XLiu GYang X(2009)SEMCSProceedings of the 2009 WASE International Conference on Information Engineering - Volume 0210.1109/ICIE.2009.220(192-196)Online publication date: 10-Jul-2009
https://dl.acm.org/doi/10.1109/ICIE.2009.220
Lin YLin YLai YTseng K(2008)Modeling and analysis of core-centric network processorsACM Transactions on Embedded Computing Systems10.1145/1376804.13768097:4(1-15)Online publication date: 1-Aug-2008
https://dl.acm.org/doi/10.1145/1376804.1376809
Wei Han Ying Yi Muir MNousias IArslan TEdorgan A(2008)MRPSIM: A TLM based simulation tool for MPSOCS targeting dynamically reconfigurable processors2008 IEEE International SOC Conference10.1109/SOCC.2008.4641476(41-44)Online publication date: Sep-2008
https://doi.org/10.1109/SOCC.2008.4641476
Lin YLin YLai Y(2008)Thread Allocation in Chip Multiprocessor Based Multithreaded Network ProcessorsProceedings of the 22nd International Conference on Advanced Information Networking and Applications10.1109/AINA.2008.50(718-725)Online publication date: 25-Mar-2008
https://dl.acm.org/doi/10.1109/AINA.2008.50

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents