Skip to main content
Log in

A Parallel Dynamic Binary Translator for Efficient Multi-Core Simulation

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

In recent years multi-core processors have seen broad adoption in application domains ranging from embedded systems through general-purpose computing to large-scale data centres. Simulation technology for multi-core systems, however, lags behind and does not provide the simulation speed required to effectively support design space exploration and parallel software development. While state-of-the-art instruction set simulators (Iss) for single-core machines reach or exceed the performance levels of speed-optimised silicon implementations of embedded processors, the same does not hold for multi-core simulators where large performance penalties are to be paid. In this paper we develop a fast and scalable simulation methodology for multi-core platforms based on parallel and just-in-time (Jit) dynamic binary translation (Dbt). Our approach can model large-scale multi-core configurations, does not rely on prior profiling, instrumentation, or compilation, and works for all binaries targeting a state-of-the-art embedded multi-core platform implementing the ARCompact instruction set architecture (Isa). We have evaluated our parallel simulation methodology against the industry standard Splash-2 and Eembc MultiBench benchmarks and demonstrate simulation speeds up to 25,307 Mips on a 32-core x86 host machine for as many as 2,048 target processors whilst exhibiting minimal and near constant overhead, including memory considerations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Argollo E., Falcón A., Faraboschi P., Monchiero M., Ortega D.: COTSon: infrastructure for full system simulation. SIGOPS Oper. Syst. Rev. 43, 52–61 (2009). doi:10.1145/1496909.1496921

    Article  Google Scholar 

  2. August D., Chang J., Girbal S., Gracia-Perez D., Mouchard G., Penry D.A., Temam O., Vachharajani N.: Unisim: an open simulation environment and library for complex architecture design and collaborative development. IEEE Comput. Archit. Lett. 6, 45–48 (2007). doi:10.1109/L-CA.2007.12

    Article  Google Scholar 

  3. Austin T., Larson E., Ernst D.: SimpleScalar: an infrastructure for computer system modeling. Computer 35, 59–67 (2002). doi:10.1109/2.982917

    Article  Google Scholar 

  4. Aycock J.: A brief history of just-in-time. ACM Comput. Surv. 35, 97–113 (2003)

    Article  Google Scholar 

  5. Bellard, F.: QEMU, a fast and portable dynamic translator. In: Proceedings of the 2005 USENIX Annual Technical Conference, ATEC ’05, pp. 41–41. USENIX Association, Berkeley, CA, USA (2005)

  6. Böhm, I., Franke, B., Topham, N.P.: Cycle-accurate performance modelling in an ultra-fast just-in-time dynamic binary translation instruction set simulator. In: Kurdahi, F.J., Takala J. (eds.) ICSAMOS, pp. 1–10. IEEE (2010)

  7. Böhm, I., Edler von Koch, T.J., Kyle, S., Franke, B., Topham, N.: Generalized just-in-time trace compilation using a parallel task farm in a dynamic binary translator. In: Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’11), ACM (2011)

  8. Chen J., Annavaram M., Dubois M.: SlackSim: a platform for parallel simulations of CMPs on CMPs. SIGARCH Comput. Archit. News 37, 20–29 (2009). doi:10.1145/1577129.1577134

    Article  Google Scholar 

  9. Chidester M., George A.: Parallel simulation of chip-multiprocessor architectures. ACM Trans. Model. Comput. Simul. 12, 176–200 (2002). doi:10.1145/643114.643116

    Article  Google Scholar 

  10. Chiou D., Angepat H., Patil N., Sunwoo D.: Accurate functional-first multicore simulators. IEEE Comput. Archit. Lett. 8, 64–67 (2009). doi:10.1109/L-CA.2009.44

    Article  Google Scholar 

  11. Chiou, D., Sunwoo, D., Angepat, H., Kim, J., Patil, N., Reinhart, W., Johnson, D.: Parallelizing computer system simulators. In: Parallel and Distributed Processing, 2008, IPDPS 2008. IEEE International Symposium on, pp. 1–5 (2008). doi:10.1109/IPDPS.2008.4536407

  12. Chung E.S., Nurvitadhi E., Hoe J.C., Falsafi B., Mai K.: PROToFLEX: FPGA-accelerated hybrid functional simulator. Parallel Distrib. Process. Symp. Int. 0, 326 (2007). doi:10.1109/IPDPS.2007.370516

    Google Scholar 

  13. Chung E.S., Papamichael M.K., Nurvitadhi E., Hoe J.C., Mai K., Falsafi B.: ProtoFlex: towards scalable, full-system multiprocessor simulations using FPGAs. ACM Trans. Reconfigurable Technol. Syst. 2, 15–11532 (2009). doi:10.1145/1534916.1534925

    Article  Google Scholar 

  14. Covington, R., Dwarkada, S., Jump, J.R., Sinclair, J.B., Madala, S.: The efficient simulation of parallel computer systems. Int. J. Comput. Simul. 1(1), 31–58 (1991)

    Google Scholar 

  15. EnCore embedded processor. URL: http://groups.inf.ed.ac.uk/pasta/hw_encore.html

  16. Hardavellas N., Somogyi S., Wenisch T.F., Wunderlich R.E., Chen S., Kim J., Falsafi B., Hoe J.C., Nowatzyk A.G.: SimFlex: a fast, accurate, flexible full-system simulation framework for performance evaluation of server architecture. SIGMETRICS Perform. Eval. Rev. 31, 31–34 (2004). doi:10.1145/1054907.1054914

    Article  Google Scholar 

  17. Hughes, C., Pai, V., Ranganathan, P., Adve, S.: RSIM: simulating shared-memory multiprocessors with ILP processors. Computer (2002)

  18. Kanaujia, S., Papazian, I.E., Chamberlain, J., Baxter, J.: FastMP: a multi-core simulation methodology. In: Proceedings of the Workshop on Modeling, Benchmarking and Simulation (MoBS 2006), Boston, Massachusetts (2006)

  19. Lantz, R.: Parallel SimOS: scalability and performance for large system simulation (2007). http://www-cs.stanford.edu

  20. Lantz, R.: Fast functional simulation with parallel Embra. In: Proceedings of the 4th Annual Workshop on Modeling, Benchmarking and Simulation (2008)

  21. Magnusson P.S., Christensson M., Eskilson J., Forsgren D., Hållberg G., Högberg J., Larsson F., Moestedt A., Werner B.: Simics: a full system simulation platform. Computer 35, 50–58 (2002). doi:10.1109/2.982916

    Article  Google Scholar 

  22. Martin M.M.K., Sorin D.J., Beckmann B.M., Marty M.R., Xu M., Alameldeen A.R., Moore K.E., Hill M.D., Wood D.A.: Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. SIGARCH Comput. Archit. News 33, 92–99 (2005). doi:10.1145/1105734.1105747

    Article  Google Scholar 

  23. Miller, J.E.M., Kasture, H., Kurian, G., Gruenwald III, C., Beckmann, N., Celio, C., Eastep, J., Agarwal, A.: Graphite: a distributed parallel simulator for multicores. In: The 16th IEEE International Symposium on High-Performance Computer Architecture (HPCA) (2010)

  24. Monchiero M., Ahn J.H., Falcón A., Ortega D., Faraboschi P.: How to simulate 1000 cores. SIGARCH Comput. Archit. News 37, 10–19 (2009). doi:10.1145/1577129.1577133

    Article  Google Scholar 

  25. Mukherjee S.S., Reinhardt S.K., Falsafi B., Litzkow M., Hill M.D., Wood D.A., Huss-Lederman S., Larus J.R.: Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator. IEEE Concurr. 8, 12–20 (2000). doi:10.1109/4434.895100

    Google Scholar 

  26. PCSX2. URL: http://pcsx2.net/

  27. Penry, D.A., Fay, D., Hodgdon, D., Wells, R., Schelle, G., August, D.I., Connors, D.: Exploiting parallelism and structure to accelerate the simulation of chip multi-processors. In: in Proceedings of the Twelfth International Symposium on High-Performance Computer Architecture, pp. 29–40 (2006)

  28. Reinhardt, S.K., Hill, M.D., Larus, J.R., Lebeck, A.R., Lewis, J.C., Wood, D.A.: The wisconsin wind tunnel: virtual prototyping of parallel computers. In: Proceedings of the 1993 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, SIGMETRICS ’93, pp. 48–60. ACM, New York, NY, USA (1993). doi:10.1145/166955.166979

  29. Sui, X., Wu, J., Yin, W., Zhou, D., Gong, Z.: MALsim: a functional-level parallel simulation platform for CMPs. In: 2nd International Conference on Computer Engineering and Technology (ICCET) 2010, vol. 2, p. V2, IEEE (2010)

  30. Synopsys Inc.: ARCompact instruction set architecture. URL: http://www.synopsys.com

  31. Tan, Z., Waterman, A., Avizienis, R., Lee, Y., Cook, H., Patterson, D., Asanović K.: RAMP gold: an FPGA-based architecture simulator for multiprocessors. In: Proceedings of the 47th Design Automation Conference, DAC ’10, pp. 463–468. ACM, New York, NY, USA (2010). doi:10.1145/1837274.1837390

  32. Tan, Z., Waterman, A., Cook, H., Bird, S., Asanović, K., Patterson, D.: A case for FAME: FPGA architecture model execution. In: Proceedings of the 37th Annual International Symposium on Computer Architecture, ISCA ’10, pp. 290–301. ACM, New York, NY, USA (2010). doi:10.1145/1815961.1815999

  33. The Embedded Microprocessor Benchmark Consortium: MultiBench 1.0 Multicore Benchmark Software (02 February 2010)

  34. Wang K., Zhang Y., Wang H., Shen X.: Parallelization of IBM mambo system simulator in functional modes. ACM SIGOPS Oper. Syst. Rev. 42(1), 71–76 (2008)

    Article  MathSciNet  Google Scholar 

  35. Wawrzynek J., Patterson D., Oskin M., Lu S.L., Kozyrakis C., Hoe J.C., Chiou D., Asanovic K.: RAMP: research accelerator for multiple processors. IEEE Micro 27, 46–57 (2007). doi:10.1109/MM.2007.39

    Article  Google Scholar 

  36. Wentzlaff, D., Agarwal, A.: Constructing virtual architectures on a tiled processor. In: Proceedings of the International Symposium on Code Generation and Optimization, CGO ’06, pp. 173–184. IEEE Computer Society, Washington, DC, USA (2006). doi:10.1109/CGO.2006.11

  37. Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 programs: characterization and methodological considerations. In: Proceedings of the 22nd Annual International Symposium on Computer Architecture, ISCA ’95, pp. 24–36. ACM, New York, NY, USA (1995). doi:10.1145/223982.223990

  38. Zheng, G., Kakulapati, G., Kalé, L.V.: BigSim: a parallel simulator for performance prediction of extremely large parallel machines. In: Parallel and Distributed Processing Symposium, International, vol. 1, p. 78b (2004). doi:10.1109/IPDPS.2004.1303013

  39. Zhong R., Zhu Y., Chen W., Lin M., Wong W.F.: An inter-core communication enabled multi-core simulator based on simplescalar. Advanced Information Networking and Applications Workshops, International Conference 1, 758–763 (2007). doi:10.1109/AINAW.2007.87

    Google Scholar 

  40. Zhu, X., Malik, S.: Using a communication architecture specification in an application-driven retargetable prototyping platform for multiprocessing. In: Proceedings of the Conference on Design, Automation and Test in Europe, DATE ’04, vol. 2, pp. 21–244. IEEE Computer Society, Washington, DC, USA (2004)

  41. Zhu, X., Wu, J., Sui, X., Yin, W., Wang, Q., Gong, Z.: PCAsim: a parallel cycle accurate simulation platform for CMPs. In: Proceedings of the 2010 International Conference on Computer Design and Applications (ICCDA), pp. V1-597–V1-601 (2010)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Volker Seeker.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Almer, O., Böhm, I., von Koch, T.E. et al. A Parallel Dynamic Binary Translator for Efficient Multi-Core Simulation. Int J Parallel Prog 41, 212–235 (2013). https://doi.org/10.1007/s10766-012-0222-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-012-0222-9

Keywords

Navigation