Skip to main content
Log in

ArchSim: A System-Level Parallel Simulation Platform for the Architecture Design of High Performance Computer

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

High performance computer (HPC) is a complex huge system, of which the architecture design meets increasing difficulties and risks. Traditional methods, such as theoretical analysis, component-level simulation and sequential simulation, are not applicable to system-level simulations of HPC systems. Even the parallel simulation using large-scale parallel machines also have many difficulties in scalability, reliability, generality, as well as efficiency. According to the current needs of HPC architecture design, this paper proposes a system-level parallel simulation platform: ArchSim. We first introduce the architecture of ArchSim simulation platform which is composed of a global server (GS), local server agents (LSA) and entities. Secondly, we emphasize some key techniques of ArchSim, including the synchronization protocol, the communication mechanism and the distributed checkpointing/restart mechanism. We then make a synthesized test of some main performance indices of ArchSim with the phold benchmark and analyze the extra overhead generated by ArchSim. Finally, based on ArchSim, we construct a parallel event-driven interconnection network simulator and a system-level simulator for a small scale HPC system with 256 processors. The results of the performance test and HPC system simulations demonstrate that ArchSim can achieve high speedup ratio and high scalability on parallel host machine and support system-level simulations for the architecture design of HPC systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Zheng G, Kakulapati G, Kale L V. BigSim: A parallel simulator for performance prediction of extremely large parallel machines. In Proc. the 18th International Parallel and Distributed Processing Symposium, Santa Fe, USA, April 26–30, 2004, p.78.

  2. Saboo N, Singla A K, Unger J M, and Kale L V. Emulating petaflops machines and Blue Gene. In Proc. the 15th International Parallel and Distributed Processing Symposium, San Francisco, USA, April 23–27, 2001, pp.2048–2091.

  3. Caudell T P, Summers K L, Zhou C. à la carte — A Los Alamos computer architecture toolkit for extreme-scale architecture simulation, 2003, http://wwwc3.lanl.gov/parsim.

  4. Moss N. PARSIM: Parallel architecture simulation tool. In Proc. Los Alamos National Laboratory Student Symposium. Aug. 2002.

  5. Springer P L, Brodowicz M, Brunett S et al. Performance analysis of blue Gene/L using parallel discrete event simulation. Technical Report, California Institute of Technology, 2004.

  6. Ceze L, Strauss K, Almasi G et al. Full circle: Simulating Linux clusters on Linux clusters. In Proc. the Fourth LCI International Conference on Linux Clusters: the HPC Revolution 2003, San Jose USA, June 24–26, 2003.

  7. Fujimoto R M, Das S R, Panesar K S. Georgia Tech Time Warp (GTW Version 2.3) programmer’s manual. 1994, http://www.cc.gatech.edu/computing/pads/PAPERS/gtw.ps.

  8. Steinman J S. SPEEDES: Synchronous parallel environment for emulation and discrete-event simulation. Advance in Parallel and Distributed Simulation, SCS Simulation Series, January, 1991, 23(1): 95–103.

    Google Scholar 

  9. Rao D M, Wilsey P A. An ultra-large-scale simulation framework. Journal of Parallel and Distributed Computing, 2002, 62(11): 1670–1693.

    Article  MATH  Google Scholar 

  10. Wilmarth T L. POSE: Scalable general-purpose parallel discrete event simulation. Technical Report, Department of Computer Science, University of Illinois at Urbana-Champaign, 2005.

  11. Liu J, Nicol D M. Dartmouth scalable simulation framework user’s manual version 3.0. 2001, http://www.crhc.uiuc.edu/~jasonliu/projects/ssf/papers/dassf-manual-3.0.ps.

  12. Perumalla K S. μSik: A micro-kernel for parallel/distributed simulation systems. In Proc. the 19th Workshop on Principles of Advanced and Distributed Simulation, Washington DC, USA, June 1–3, 2005, pp.59–68.

  13. Perumalla K S. Scaling time warp-based discrete event execution to 104 processors on a BlueGene supercomputer. In Proc. the 4th International Conference on Computing Frontiers, Ischia, Italy, May 7–9, 2007, pp.69–76.

  14. Dahmann J S, Fujimoto R M, Weatherly R M. The department of defense high level architecture. In Proc. the 29th Conference on Winter Simulation, Atlanta, USA, December 7–10, 1997, pp.142–149.

  15. Fujimoto R M. Distributed simulation systems. In Proc. 2003 Winter Simulation Conference, Atlanta, USA, December 7–10, 2003, pp.124–134.

  16. Perumalla K S. Parallel and distributed simulation: Traditional techniques and recent advance. In Proc. the 38th Conference on Winter Simulation, Monterey, USA, December 12–16, 2006, pp.84–95.

  17. Duell J. The design and implementation of Berkeley Lab’s Linux checkpoint/restart. Technical Report, Lawrence Berkeley National Laboratory, 2002, http://www.nersc.gov/research/FTG/checkpoint/reports.html.

  18. Roman E. A survey of checkpoint/restart implementations. Technical Report. Lawrence Berkeley National Laboratory Berkeley, 2002.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hong-Liang Li.

Additional information

This work is supported by the National High Technology Research and Development 863 Program of China under Grant No. 2007AA01Z117, and the National Basic Research 973 Program of China under Grant No. 2007CB310900.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Huang, YQ., Li, HL., Xie, XH. et al. ArchSim: A System-Level Parallel Simulation Platform for the Architecture Design of High Performance Computer. J. Comput. Sci. Technol. 24, 901–912 (2009). https://doi.org/10.1007/s11390-009-9281-9

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-009-9281-9

Keywords

Navigation