Detailed and clock-driven simulation for HPC interconnection network

Zhou, Wenhao; Chen, Juan; Cui, Chen; Wang, Qian; Dong, Dezun; Tang, Yuhua

doi:10.1007/s11704-016-5035-3

Detailed and clock-driven simulation for HPC interconnection network

Research Article
Published: 13 July 2016

Volume 10, pages 797–811, (2016)
Cite this article

Frontiers of Computer Science Aims and scope Submit manuscript

Wenhao Zhou¹,
Juan Chen¹,
Chen Cui¹,
Qian Wang¹,
Dezun Dong² &
…
Yuhua Tang¹

119 Accesses
4 Citations
Explore all metrics

Abstract

Performance and energy consumption of high performance computing (HPC) interconnection networks have a great significance in the whole supercomputer, and building up HPC interconnection network simulation platform is very important for the research on HPC software and hardware technologies. To effectively evaluate the performance and energy consumption of HPC interconnection networks, this article designs and implements a detailed and clock-driven HPC interconnection network simulation platform, called HPC-NetSim. HPC-NetSim uses applicationdriven workloads and inherits the characteristics of the detailed and flexible cycle-accurate network simulator. Besides, it offers a large set of configurable network parameters in terms of topology and routing, and supports router’s on/off states.We compare the simulated execution time with the real execution time of Tianhe-2 subsystem and the mean error is only 2.7%. In addition, we simulate the network behaviors with different network structures and low-power modes. The results are also consistent with the theoretical analyses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

In-Network Monitoring Strategies for HPC Cloud

HPC-Smart Infrastructures: A Review and Outlook on Performance Analysis Methods and Tools

SONAR: Automated Communication Characterization for HPC Applications

References

Dongarra J J, Meuer HW, Strohmaier E. TOP500 supercomputer sites. Supercomputer, 1997, 13: 89–111
Google Scholar
Pang Z B, Xie M, Zhang J, Zheng Y, Wang G B, Dong D Z, Suo G. The TH Express high performance interconnect networks. Frontiers of Computer Science, 2014, 8(3): 357–366
Article MathSciNet Google Scholar
Raponi P G, Petrini F, Walkup R, Checconi F. Characterization of the communication patterns of scientific applications on Blue Gene/P. In: Proceedings of 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum (IPDPSW). 2011: 1017–1024
Google Scholar
Kogge P M. Architectural challenges at the exascale frontier (invited talk). Simulating the Future: Using One Million Cores and Beyond, 2008
Google Scholar
Abts D, Marty MR, Wells PM, Klausler P, Liu H. Energy proportional datacenter networks. In: Proceedings of the 37th Annual International Symposium on Computer Architecture. 2010, 338–347
Google Scholar
Shalf J, Dosanjh S, Morrison J. Exascale Computing Technology Challenges. In: Palma JMLM, Daydé M, Marques O, Lopes J C, eds. High Performance Computing for Computational Science lCVECPAR 2010. Berkeley, CA: Springer Berlin Heidelberg, 2011, 1–25
Chapter Google Scholar
Alonso M, Coll S, Martinez J M, Santonja V, Duato J. Dynamic power saving in fat-tree interconnection networks using on/off links. In: Pro ceedings of the 20th International. IEEE Parallel and Distributed Processing Symposium. 2006
Google Scholar
Raghunathan V, Srivastava M B, Gupta R K. A survey of techniques for energy ecient on-chip communication. In: Proceedings of the 40th Annual Design Automation Conference. 2003, 900–905
Google Scholar
Deveci M, Rajamanickam S, Leung V J, Pedretti K, Olivier S L, Bunde D P, Çatalyürek U V, Devine K. Exploiting geometricpartitioning in task mapping for parallel computers. In: Proceedings of the 28th International IEEE Parallel and Distributed Processing Symposium. 2014, 27–36
Google Scholar
Zhang P, Gao Y, Fierson J, Deng Y F. Eigenanalysis-based task mapping on parallel computers with cellular networks. Mathematics of Computation, 2014, 83(288): 1727–1756
Article MathSciNet MATH Google Scholar
Jiang N, Balfour J, Becker D U, Towies B, Dally W J, Michelogiannakis G, Kim J. A detailed and flexible cycle-accurate Network-on- Chip simulator. In: Proceedings of 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 2013, 86–96
Chapter Google Scholar
Agarwal N, Krishna T, Peh L S, Jha N K. GARNET: A detailed onchipnetwork model inside a full-system simulator. In: Proceedings of 2009 IEEE International Symposium on Performance Analysis of Systems and Software. 2009, 33–42
Chapter Google Scholar
Zhai J D, Chen W G, Zheng W M. PHANTOM: predicting performance of parallel applications on large-scale parallel machines using a single node. In: Proceedings of the 15th ACM SIGPLAN Symposium on Principles of Parellel Programming. 2010, 305–314
Google Scholar
Denzel WE, Li J, Walker P, Jin Y. A framework for end-to-end simulation of high-performance computing systems. Simulation, 2010, 86(5–6): 331–350
Article Google Scholar
Wilke J J, Kenny J P. Using discrete event simulation for programming model exploration at extreme-scale: macroscale components for the structural simulation toolkit (SST). Sandia Report SAND2015-1027, Sandia National Laboratories, 2015
Google Scholar
Binkert N, Beckmann B, Black G, Reinhardt S T, Saidi A, Basu A, Hestness J, Hower D R, Krishna T, Sardashti S, Sen R, Sewell K, Shoaib M, Vaish N, Hill M D, Wood D A. The gem5 simulator. ACMSIGARCH Computer Architecture News, 2011, 39(2): 1–7
Article Google Scholar
Peno B, Wagner A, Tuxen M, Rüngeler I. MPI-NeTSim: a network simulation module for MPI. In: Proceedings of the 15th IEEE International Conference on Parallel and Distributed Systems (ICPADS). 2009, 464–471
Google Scholar
Zheng G, Kakulapati G, Kale L V. BigSim: a parallel simulator for performance prediction of extremely large parallel machines. In: Proceedings of the 18th International IEEE Parallel and Distributed Processing Symposium. 2004
Google Scholar
Dally WJ, Towles B P. Principles and Practices of Interconnection Networks. San Francisco, CA: Elsevier, 2004
Google Scholar
Culler D, Karp R, Patterson D, Sahay A, Schauser K E, Santos E, Subramonian R, von Eicken T. LogP: towards a realistic model of parallel computation. In: Proceedings of the 4th ACMSIGPLAN Symposium on Principles and Practice of Parallel Programming. 1993, 1–12
Google Scholar
Alexandrov A, Ionescu MF, Schauser K E, Scheiman C. LogGP: incorporating long messages into the LogP modela—one step closer towards a realisticmodel for parallel computation. In: Proceedings of the 7th Annual ACM symposium on Parallel Algorithms and Architectures. 1995, 95–105
Google Scholar
Moritz C A, Frank M I. LoGPC: Modeling network contention in message-passing programs. In: Proceedings of the 1998 ACM SIGMETRICS Joint International Conference on Measurement and Modeling of Computer Systems. 1998, 254–263
Google Scholar
Chen W G, Zhai J D, Zhang J, Zheng W M. LogGPO: an accurate communication model for performance prediction of MPI programs. Science in China Series F: Information Sciences, 2009, 52(10): 1785–1791
Article MATH Google Scholar
Liao X K, Xiao L Q, Yang C Q, Lu Y T. MilkyWay-2 supercomputer: system and application. Frontiers of Computer Science, 2014, 8(3): 345–356
Article MathSciNet Google Scholar
Kelton W D, Law A M. Simulation Modeling and Analysis. Boston: McGraw Hill, 2000
MATH Google Scholar
Varga A. The OMNeT++ discrete event simulation system. In: Proceedings of the European Simulation Multiconference. 2001
Google Scholar
Gropp W. MPICH2: a new start for MPI implementations. In: Kranzlmüller D, Volkert J, Kacsuk P, Dongarra J eds. Recent Advances in Parallel Virtual Machine and Message Passing Interface.Springer Berlin Heidelberg, 2002: 7
Gabriel E, Fagg G E, Bosilca G, Angskun T, Dongrra J J, Squyres J M, Sahay V, Kambadur P, Barrett B, Lumsdaine A, Castain R H, Daniel D J, Graham R L, Woodall T S. Open MPI: goals, concept, and design of a next generation MPI implementation. In: Kranzlmüller D, Kacsuk P, Dongarra J, eds. Recent Advances in Parallel Virtual Machine and Message Passing Interface. Springer Berlin Heidelberg, 2004: 97–104
Chapter Google Scholar
Kim M S, Son D M, Ko Y B, Kim Y H. A simulation study of the PLC-MAC performance using network simulator-2. In: Proceedings of 2008 IEEE International Symposium on Power Line Communications and Its Applications. 2008, 99–104
Google Scholar
Vetter J S, Mueller F. Communication characteristics of large-scale scientific applications for contemporary cluster architectures. Journal of Parallel and Distributed Computing, 2003, 63(9): 853–865
Article MATH Google Scholar
Becker D, Wolf F, Frings W, Geimer M, Wylie B J N, Mohr B. Automatic trace-based performance analysis of metacomputing applications. In: Proceedings of 2007 IEEE International Parallel and Distributed Processing Symposium. 2007, 1–10
Chapter Google Scholar
Nagel W E, Arnold A, Weber M, Hoppe H S, Solchenbach K. VAMPIR: visualization and analysis of MPI resources. MD5 SHA512, 1996
Google Scholar
Mohr B, Wolf F. KOJAK—a tool set for automatic performance analysis of parallel programs. In: Kosch H, Böszörményi L, Hellwagner H, eds. Euro-Par 2003 Parallel Processing. Springer Berlin Heidelberg, 2003: 1301–1304
Chapter Google Scholar
Shende S S, Malony A D. The Tau parallel performance system. International Journal of High Performance Computing Applications, 2006, 20(2): 287–311
Article Google Scholar
O’Carroll F, Tezuka H, Hori A, Ishikawa Y. The design and implementation of zero copy MPI using commodity hardware with a high performance network. In: Proceedings of the 12th ACM International Conference on Supercomputing. 1998, 243–250
Chapter Google Scholar
PadovanoM. System and method for accessing a storage area network as network attached storage. US Patent, 6606690, 2003-08-12
Hamada T, Nakasato N. InfiniBand Trade Association, InfiniBand architecture specification: release 1.0. In: Proceedings of 2005 International Conference on Field Programmable Logic and Applications. 2005
Google Scholar
Xie M, Lu Y, Wang K F, Liu L, Cao H J, Yang X J. Tianhe-1A interconnect and message-passing services. IEEE Micro, 2011 (1): 8–20
Google Scholar
Wu J, Liao X K, Dong D Z, Wang L, Li C L. HVCRouter: energy ecient networkon-chip router with heterogeneous virtual channels. In: Wang G J, Zomaya A, Perez G M, Li K L, EDS. Algorithms and Architectures for Parallel Processing. Springer International Publishing, 2015: 199–213
Chapter Google Scholar
Ma S, Jerger N E, Wang Z Y. DBAR: an effcient routing algorithm to support multiple concurrent applications in networks-on-chip. In: Proceedings of the 38th Annual International Symposium on Computer Architecture (ISCA). 2011, 413–424
Google Scholar
Chen J, Zhou W, Ben C. Supremum of idle routers on 2D-mesh with dimension-order routing. Journal of Computational Information Systems, 2014, 10(22): 9897–9906
Google Scholar
Lusk E, Huss S, Saphir B, Snir M. MPI: a message-passing interface standard. 2009
Google Scholar
Li J, Huang W, Lefurgy C, Zhang L X, Denzel W E, Treumann R R, Wang K. Power shifting in thrifty interconnection network. In: Proceedings of the 17th IEEE International Symposium on High Performance Computer Architecture (HPCA). 2011, 156–167
Google Scholar
Liao X K. MilkyWay-2: back to the world Top 1. Frontiers of Computer Science, 2014, 8(3): 343–344
Article MathSciNet Google Scholar
Bailey D H, Barszcz E, Barton J T, Browning D S, Carter R L, Dagum L, Fatoohi R A, Frederickson P O, Lasinski T A, Schreiber R S, Simon H D, Venkatakrishnan V, Weeratunga S K. The NAS parallel benchmarks. International Journal of High Performance Computing Applications, 1991, 5(3): 63–73
Article Google Scholar
Initiative A S C. The ASCI sweep3d benchmark code. 1995
Google Scholar
Velho P, Legrand A. Accuracy study and improvement of network simulation in the SimGrid framework. In: Proceedings of the 2nd International Conference on Simulation Tools and Techniques. 2009, 13
Google Scholar
Tabe T B, Stout Q F. The use of the MPI communication library in the NAS parallel benchmarks. Ann Arbor, 1999(1001): 48109
Google Scholar
Matsutani H, Koibuchi M, Wang D, Amano H. Run-time power gating of on-chip routers using look-ahead routing. In: Proceedings of the 2008 Asia and South Pacific Design Automation Conference. 2008, 55–60
Chapter Google Scholar
Mihic K, Simunic T, De Micheli G. Reliability and power management of integrated systems. In: Proceedings of 2004 IEEE Euromicro Symposium on Digital System Design. 2004, 5–11
Google Scholar

Download references

Author information

Authors and Affiliations

State Key Laboratory of High Performance Computing, School of Computer, National University of Defense Technology, Changsha, 410073, China
Wenhao Zhou, Juan Chen, Chen Cui, Qian Wang & Yuhua Tang
Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha, 410073, China
Dezun Dong

Authors

Wenhao Zhou
View author publications
Search author on:PubMed Google Scholar
Juan Chen
View author publications
Search author on:PubMed Google Scholar
Chen Cui
View author publications
Search author on:PubMed Google Scholar
Qian Wang
View author publications
Search author on:PubMed Google Scholar
Dezun Dong
View author publications
Search author on:PubMed Google Scholar
Yuhua Tang
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Juan Chen.

Additional information

Wenhao Zhou received the BS and MS degrees in the School of Computer, National University of Defense Technology, China in 2013 and 2015. His research interests focus on energy-aware HPC interconnection networks and parallel software framework.

Juan Chen received the PhD degree in Computer Department, National University of Defense Technology (NUDT), China in 2007. She is now an associate professor in Key Laboratory of High Performance Computing at NUDT. Her research interests focus on supercomputer systems, energy-aware interconnection network design, and parallel software framework.

Chen Cui received the BS degree in the School of Electronics Engineering and Computer Science at Peking University, China in 2015, and now he is a MS student at National University of Defense Technology, China. His research interests focus on the large scale parallel numerical simulation and parallel software framework.

Qian Wang received the BS degree in the School of Computer at National University of Defense Technology (NUDT), China in 2011, and now is a PhD student at NUDT. Her research interests focus on the large scale parallel numerical simulation and parallel software framework.

Dezun Dong received the BS, MS, and PhD degrees from the National University of Defense Technology (NUDT), China in 2002, 2004 and 2010, respectively. He is an associate professor in the Collage of Computer, NUDT. His research interests range across high performance computer systems, high speed interconnect networks, wireless networks, and distributed computing algorithms. Currently, he focuses on performance evaluation of high-performance interconnection networks for supercomputers and data centers. He is a member of the ACM, IEEE, and CCF.

Yuhua Tang received her BS and MS degrees in Computer Department from National University of Defense Technology (NUDT), China in 1983 and 1986, respectively. She is now a professor in National Laboratory for Paralleling and Distributed Processing at NUDT. Her research interests include supercomputer architecture and core router’s design.

Electronic supplementary material

Supplementary material, approximately 188 KB.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, W., Chen, J., Cui, C. et al. Detailed and clock-driven simulation for HPC interconnection network. Front. Comput. Sci. 10, 797–811 (2016). https://doi.org/10.1007/s11704-016-5035-3

Download citation

Received: 25 January 2015
Accepted: 25 December 2015
Published: 13 July 2016
Issue Date: October 2016
DOI: https://doi.org/10.1007/s11704-016-5035-3

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Detailed and clock-driven simulation for HPC interconnection network

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

In-Network Monitoring Strategies for HPC Cloud

HPC-Smart Infrastructures: A Review and Outlook on Performance Analysis Methods and Tools

SONAR: Automated Communication Characterization for HPC Applications

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material, approximately 188 KB.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now