skip to main content
research-article

ParTejas: A Parallel Simulator for Multicore Processors

Published: 02 August 2017 Publication History

Abstract

In this article, we present the design of a novel parallel architecture simulator called ParTejas. ParTejas is a timing simulation engine that gets its execution traces from instrumented binaries using a fast shared-memory-based mechanism. Subsequently, the waiting threads simulate the execution of multiple pipelines and an elaborate memory system with support for multilevel coherent caches. ParTejas is written in Java and primarily derives its speedups from the use of novel data structures. Specifically, it uses lock-free slot schedulers to design an entity called a parallel port that effectively models the contention at shared resources in the CPU and memory system. Parallel ports remove the need for fine-grained synchronization and allow each thread to use its local clock. Unlike conventional simulators that use barriers for synchronization at epoch boundaries, we use a sophisticated type of barrier, known as a phaser. A phaser allows threads to perform additional work without waiting for other threads to arrive at the barrier. Additionally, we use a host of Java-specific optimizations and use profiling to effectively schedule the threads. With all our optimizations, we demonstrate a speedup of 11.8× for a multi-issue in-order pipeline and 10.9× for an out-of-order pipeline with 64 threads, for a suite of seven Splash2 and Parsec benchmarks. The simulation error is limited to 2% to 4% as compared to strictly sequential simulation

References

[1]
P. Aggarwal and S. R. Sarangi. 2013. Lock-free and wait-free slot scheduling algorithms. In IEEE Transactions on Parallel and Distributed Systems 27, 5 (2013), 1387--1400.
[2]
E. K. Ardestani and J. Renau. 2013. ESESC: A fast multicore simulator using time-based sampling. In 2010 IEEE 16th International Symposium on High Performance Computer Architecture (HPCA'10). IEEE, 448--459.
[3]
E. Argollo, A. Falcón, P. Faraboschi, M. Monchiero, and D. Ortega. 2009. COTSon: Infrastructure for full system simulation. SIGOPS Oper. Syst. Rev. 43, 1 (2009), 52--61.
[4]
T. Austin, E. Larson, and D. Ernst. 2002. SimpleScalar: An infrastructure for computer system modeling. IEEE Comput., 35, 2 (2002), 59--67.
[5]
C. Bienia, S. Kumar, J. P. Singh, and K. Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. ACM, 72--81.
[6]
N. L. Binkert, R. G. Dreslinski, L. R. Hsu, K. T. Lim, A. G. Saidi, and S. K. Reinhardt. 2006. The M5 simulator: Modeling networked systems. IEEE Micro, 26, 4 (2006), 52--60.
[7]
T. E. Carlson, W. Heirman, and L. Eeckhout. 2011. Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation. In 2011 IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC). IEEE, 1--12.
[8]
J. Chen, M. Annavaram, and M. Dubois. 2009. SlackSim: A platform for parallel simulations of CMPs on CMPs. SIGARCH Comput. Archit. News, 37, 2 (2009), 20--29.
[9]
T. David, R. Guerraoui, and V. Trigonakis. 2013. Everything you always wanted to know about synchronization but were afraid to ask. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). 33--48.
[10]
D. Genbrugge, S. Eyerman, and L. Eeckhout. 2010. Interval simulation: Raising the level of abstraction in architectural simulation. In 2010 IEEE 16th International Symposium on High Performance Computer Architecture (HPCA’10). IEEE, 1--12.
[11]
K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy. 1990. Memory Consistency and Event Ordering in Scalable Shared-memory Multiprocessors. Vol. 18, No. 2SI, ACM, 15--26.
[12]
C. J. Hughes, V. S. Pai, P. Ranganathan, and S. V. Adve. 2002. Rsim: Simulating shared-memory multiprocessors with ilp processors. Computer, 35, 2 (2002), 40--49.
[13]
S. Kanaujia, I. E. Papazian, J. Chamberlain, and J. Baxter. 2006. FastMP: A multi-core simulation methodology. In The Annual Workshop on Modeling, Benchmarking and Simulation (MOBS).
[14]
C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. 2005. Pin: Building customized program analysis tools with dynamic instrumentation. In ACM Sigplan Notices. Vol. 40. ACM, 190--200.
[15]
P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, J. Hogberg, F. Larsson, A. Moestedt, and B. Werner. 2002. Simics: A full system simulation platform. IEEE Comput. 35, 2 (2002), 50--58.
[16]
G. Malhotra, P. Aggarwal, A. Sagar, and S. R. Sarangi. 2014. ParTejas: A parallel simulator for multicore processors. In ISPASS (Poster).
[17]
M. M. K. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, A. R. Alameldeen, K. E. Moore, M. D. Hill, and D. A. Wood. 2005. Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. SIGARCH Comput. Archit. News 33, 4 (2005), 92--99.
[18]
J. E. Miller, H. Kasture, G. Kurian, C. Gruenwald, N. Beckmann, C. Celio, J. Eastep, and A. Agarwal. 2010. Graphite: A distributed parallel simulator for multicores. In HPCA. 1--12.
[19]
S. S. Mukherjee, S. K. Reinhardt, B. Falsafi, M. Litzkow, M. D. Hill, D. A. Wood, S. Huss-Lederman, and J. R. Larus. 2000. Wisconsin wind tunnel II: A fast, portable parallel architecture simulator. IEEE Concurrency, 8, 4 (2000), 12--20.
[20]
G. L. Peterson. 1981. Myths about the mutual exclusion problem. Inform. Process. Lett. 12, 3 (1981), 115--116.
[21]
S. K. Reinhardt, M. D. Hill, J. R. Larus, A. R. Lebeck, J. C. Lewis, and D. A. Wood. 1993. The Wisconsin wind tunnel: Virtual prototyping of parallel computers. SIGMETRICS Perform. Eval. Rev. 21, 1 (1993), 48--60.
[22]
M. Rosenblum, S. A. Herod, E. Witchel, and A. Gupta. 1995. Complete computer system simulation: The SimOS approach. IEEE Parallel Distrib. Technol. Syst. Appl. 3, 4 (1995), 34--43.
[23]
P. Sack. 2004. SESC: SuperESCalar SimulatorRetrieved from http://iacoma.cs.uiuc.edu/ paulsack/sescdoc/.
[24]
D. Sanchez and C. Kozyrakis. 2013. ZSim: Fast and accurate microarchitectural simulation of thousand-core systems. In ACM SIGARCH Computer Architecture News 41, 3 (2013), 475--486.
[25]
S. R. Sarangi, R. Kalayappan, P. Kallurkar, S. Goel, and E. Peter. 2015. Tejas: A java based versatile micro-architectural simulator. In 2015 25th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS). IEEE, 47--54.
[26]
R. Ubal, B. Jang, P. Mistry, D. Schaa, and D. R. Kaeli. 2012. Multi2Sim: A simulation framework for CPU-GPU computing. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques. ACM, 335--344.
[27]
T. F. Wenisch, R. E. Wunderlich, M. Ferdman, A. Ailamaki, B. Falsafi, and J. C. Hoe. 2006. SimFlex: Statistical sampling of computer system simulation. IEEE Micro, 26, 4 (2006), 18--31.
[28]
S. C. Woo, M. Ohara, E. Torrie, J. P. Singh, and A. Gupta. 1995. The SPLASH-2 programs: Characterization and methodological considerations. SIGARCH Comput. Archit. News, 23 (May 1995), 24--36.
[29]
M. T. Yourst. 2007. PTLsim: A cycle accurate full system x86-64 microarchitectural simulator. In ISPASS. 23--34.
[30]
G. Zheng, G. Kakulapati, and L. V. Kale. 2004. BigSim: A parallel simulator for performance prediction of extremely large parallel machines. In 18th International Parallel and Distributed Processing Symposium. IEEE, 78.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Modeling and Computer Simulation
ACM Transactions on Modeling and Computer Simulation  Volume 27, Issue 3
July 2017
117 pages
ISSN:1049-3301
EISSN:1558-1195
DOI:10.1145/3130329
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 August 2017
Accepted: 01 April 2017
Revised: 01 March 2017
Received: 01 March 2016
Published in TOMACS Volume 27, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. ParTejas
  2. Parallel simulation
  3. Tejas
  4. architectural simulator
  5. parallel ports
  6. phasers
  7. slot scheduling

Qualifiers

  • Research-article
  • Research
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)1
Reflects downloads up to 16 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)PInTE: Probabilistic Induction of Theft Evictions2022 IEEE International Symposium on Workload Characterization (IISWC)10.1109/IISWC55918.2022.00011(1-13)Online publication date: Nov-2022
  • (2020)Data FarmingACM Transactions on Modeling and Computer Simulation10.1145/342539830:4(1-30)Online publication date: 25-Nov-2020
  • (2020)Algorithm 1012ACM Transactions on Mathematical Software10.1145/342281846:4(1-20)Online publication date: 7-Nov-2020
  • (2020)Error Analysis and Improving the Accuracy of Winograd Convolution for Deep Neural NetworksACM Transactions on Mathematical Software10.1145/341238046:4(1-33)Online publication date: 7-Nov-2020
  • (2020)A Shift Selection Strategy for Parallel Shift-invert Spectrum Slicing in Symmetric Self-consistent Eigenvalue ComputationACM Transactions on Mathematical Software10.1145/340957146:4(1-31)Online publication date: 16-Oct-2020
  • (2020)Variable Step-Size Control Based on Two-Steps for Radau IIA MethodsACM Transactions on Mathematical Software10.1145/340889246:4(1-24)Online publication date: 16-Oct-2020
  • (2020)Parallel Tree Algorithms for AMR and Non-Standard Data AccessACM Transactions on Mathematical Software10.1145/340199046:4(1-31)Online publication date: 7-Nov-2020
  • (2020)Knowledge Discovery in Simulation DataACM Transactions on Modeling and Computer Simulation10.1145/339129930:4(1-25)Online publication date: 25-Nov-2020
  • (2020)Algorithm 1005ACM Transactions on Mathematical Software10.1145/338219146:1(1-20)Online publication date: 20-Mar-2020
  • (2020)Toward a Theory of Superdense Time in Simulation ModelsACM Transactions on Modeling and Computer Simulation10.1145/337948930:3(1-13)Online publication date: 31-May-2020
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media