Modeling Large-Scale Slim Fly Networks Using Parallel Discrete-Event Simulation
Abstract
As supercomputers approach exascale performance, the increased number of processors translates to an increased demand on the underlying network interconnect. We present that the slim fly network topology, a new low-diameter, low-latency, and low-cost interconnection network, is gaining interest as one possible solution for next-generation supercomputing interconnect systems. In this article, we present a high-fidelity slim fly packet-level model leveraging the Rensselaer Optimistic Simulation System (ROSS) and Co-Design of Exascale Storage (CODES) frameworks. We validate the model with published work before scaling the network size up to an unprecedented 1 million compute nodes and confirming that the slim fly observes peak network throughput at extreme scale. In addition to synthetic workloads, we evaluate large-scale slim fly models with real communication workloads from applications in the Design Forward program with over 110,000 MPI processes. We show strong scaling of the slim fly model on an Intel cluster achieving a peak network packet transfer rate of 2.3 million packets per second and processing over 7 billion discrete events using 128 MPI tasks. Enabled by the strong performance capabilities of the model, we perform a detailed application trace and routing protocol performance study. Lastly, through analysis of metrics such as packet latency, hopmore »
- Authors:
-
- Rensselaer Polytechnic Inst., Troy, NY (United States)
- Argonne National Lab. (ANL), Lemont, IL (United States)
- Publication Date:
- Research Org.:
- Argonne National Lab. (ANL), Argonne, IL (United States)
- Sponsoring Org.:
- USDOE Office of Science (SC), Advanced Scientific Computing Research (ASCR); Air Force Research Laboratory (AFRL)
- OSTI Identifier:
- 1488539
- Grant/Contract Number:
- AC02-06CH11357
- Resource Type:
- Journal Article: Accepted Manuscript
- Journal Name:
- ACM Transactions on Modeling and Computer Simulation
- Additional Journal Information:
- Journal Volume: 28; Journal Issue: 4; Journal ID: ISSN 1049-3301
- Publisher:
- Association for Computing Machinery
- Country of Publication:
- United States
- Language:
- English
- Subject:
- 97 MATHEMATICS AND COMPUTING; Parallel discrete event simulation; Interconnection networks; Network topologies; Slim Fly
Citation Formats
Wolfe, Noah, Mubarak, Misbah, Carothers, Christopher D., Ross, Robert B., and Carns, Philip H. Modeling Large-Scale Slim Fly Networks Using Parallel Discrete-Event Simulation. United States: N. p., 2018.
Web. doi:10.1145/3203406.
Wolfe, Noah, Mubarak, Misbah, Carothers, Christopher D., Ross, Robert B., & Carns, Philip H. Modeling Large-Scale Slim Fly Networks Using Parallel Discrete-Event Simulation. United States. https://doi.org/10.1145/3203406
Wolfe, Noah, Mubarak, Misbah, Carothers, Christopher D., Ross, Robert B., and Carns, Philip H. 2018.
"Modeling Large-Scale Slim Fly Networks Using Parallel Discrete-Event Simulation". United States. https://doi.org/10.1145/3203406. https://www.osti.gov/servlets/purl/1488539.
@article{osti_1488539,
title = {Modeling Large-Scale Slim Fly Networks Using Parallel Discrete-Event Simulation},
author = {Wolfe, Noah and Mubarak, Misbah and Carothers, Christopher D. and Ross, Robert B. and Carns, Philip H.},
abstractNote = {As supercomputers approach exascale performance, the increased number of processors translates to an increased demand on the underlying network interconnect. We present that the slim fly network topology, a new low-diameter, low-latency, and low-cost interconnection network, is gaining interest as one possible solution for next-generation supercomputing interconnect systems. In this article, we present a high-fidelity slim fly packet-level model leveraging the Rensselaer Optimistic Simulation System (ROSS) and Co-Design of Exascale Storage (CODES) frameworks. We validate the model with published work before scaling the network size up to an unprecedented 1 million compute nodes and confirming that the slim fly observes peak network throughput at extreme scale. In addition to synthetic workloads, we evaluate large-scale slim fly models with real communication workloads from applications in the Design Forward program with over 110,000 MPI processes. We show strong scaling of the slim fly model on an Intel cluster achieving a peak network packet transfer rate of 2.3 million packets per second and processing over 7 billion discrete events using 128 MPI tasks. Enabled by the strong performance capabilities of the model, we perform a detailed application trace and routing protocol performance study. Lastly, through analysis of metrics such as packet latency, hop count, and congestion, we find that the slim fly network is able to leverage simple minimal routing and achieve the same performance as more complex adaptive routing for tested DOE benchmark applications.},
doi = {10.1145/3203406},
url = {https://www.osti.gov/biblio/1488539},
journal = {ACM Transactions on Modeling and Computer Simulation},
issn = {1049-3301},
number = 4,
volume = 28,
place = {United States},
year = {Thu Aug 30 00:00:00 EDT 2018},
month = {Thu Aug 30 00:00:00 EDT 2018}
}
Web of Science
Works referenced in this record:
A case study in using massively parallel simulation for extreme-scale torus network codesign
conference, January 2014
- Mubarak, Misbah; Carothers, Christopher D.; Ross, Robert B.
- Proceedings of the 2nd ACM SIGSIM/PADS conference on Principles of advanced discrete simulation - SIGSIM-PADS '14
Load-Balancing in Multistage Interconnection Networks under Multiple-Pass Routing
journal, August 1996
- Wang, Sying-Jyan
- Journal of Parallel and Distributed Computing, Vol. 36, Issue 2
(SAI) Stalled, Active and Idle: Characterizing Power and Performance of Large-Scale Dragonfly Networks
conference, September 2016
- Groves, Taylor; Grant, Ryan E.; Hemmer, Scott
- 2016 IEEE International Conference on Cluster Computing (CLUSTER)
Virtual-channel flow control
journal, March 1992
- Dally, W. J.
- IEEE Transactions on Parallel and Distributed Systems, Vol. 3, Issue 2
Speeding up Nek5000 with autotuning and specialization
conference, January 2010
- Shin, Jaewook; Hall, Mary W.; Chame, Jacqueline
- Proceedings of the 24th ACM International Conference on Supercomputing - ICS '10
Efficient optimistic parallel simulations using reverse computation
journal, July 1999
- Carothers, Christopher D.; Perumalla, Kalyan S.; Fujimoto, Richard M.
- ACM Transactions on Modeling and Computer Simulation, Vol. 9, Issue 3
Warp speed: executing time warp on 1,966,080 cores
conference, January 2013
- Barnes, Peter D.; Carothers, Christopher D.; Jefferson, David R.
- Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation - SIGSIM-PADS '13
Modeling a Million-Node Dragonfly Network Using Massively Parallel Discrete-Event Simulation
conference, November 2012
- Mubarak, Misbah; Carothers, Christopher D.; Ross, Robert
- 2012 SC Companion: High Performance Computing, Networking, Storage and Analysis (SCC), 2012 SC Companion: High Performance Computing, Networking Storage and Analysis
Geometric realisation of the graphs of McKay–Miller–Širáň
journal, March 2004
- Hafner, Paul R.
- Journal of Combinatorial Theory, Series B, Vol. 90, Issue 2
A Scheme for Fast Parallel Communication
journal, May 1982
- Valiant, L. G.
- SIAM Journal on Computing, Vol. 11, Issue 2
Trace-driven Co-simulation of High-Performance Computing Systems using OMNeT++
conference, January 2009
- Minkenberg, Cyriel; Herrera, German Rodriguez
- 2nd International ICST Conference on Simulation Tools and Techniques, Proceedings of the Second International ICST Conference on Simulation Tools and Techniques
ROSS: A high-performance, low-memory, modular Time Warp system
journal, November 2002
- Carothers, Christopher D.; Bauer, David; Pearce, Shawn
- Journal of Parallel and Distributed Computing, Vol. 62, Issue 11
Cost-effective diameter-two topologies: analysis and evaluation
conference, January 2015
- Kathareios, Georgios; Minkenberg, Cyriel; Prisacari, Bogdan
- Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis on - SC '15
LogGP: incorporating long messages into the LogP model---one step closer towards a realistic model for parallel computation
conference, January 1995
- Alexandrov, Albert; Ionescu, Mihai F.; Schauser, Klaus E.
- Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures - SPAA '95
The cost of conservative synchronization in parallel discrete event simulations
journal, April 1993
- Nicol, David M.
- Journal of the ACM, Vol. 40, Issue 2
The structural simulation toolkit
journal, March 2011
- Rodrigues, A. F.; CooperBalls, E.; Jacob, B.
- ACM SIGMETRICS Performance Evaluation Review, Vol. 38, Issue 4
A Note on Large Graphs of Diameter Two and Given Maximum Degree
journal, September 1998
- McKay, Brendan D.; Miller, Mirka; Širáň, Jozef
- Journal of Combinatorial Theory, Series B, Vol. 74, Issue 1
Enabling Parallel Simulation of Large-Scale HPC Network Systems
journal, January 2017
- Mubarak, Misbah; Carothers, Christopher D.; Ross, Robert B.
- IEEE Transactions on Parallel and Distributed Systems, Vol. 28, Issue 1
Techniques for modeling large-scale HPC I/O workloads
conference, January 2015
- Snyder, Shane; Carns, Philip
- Proceedings of the 6th International Workshop on Performance Modeling, Benchmarking, and Simulation of High Performance Computing Systems - PMBS '15
FatTreeSim: Modeling Large-scale Fat-Tree Networks for HPC Systems and Data Centers Using Parallel and Discrete Event Simulation
conference, January 2015
- Liu, Ning; Haider, Adnan; Sun, Xian-He
- Proceedings of the 3rd ACM Conference on SIGSIM-Principles of Advanced Discrete Simulation - SIGSIM-PADS '15
Technology-Driven, Highly-Scalable Dragonfly Topology
journal, June 2008
- Kim, John; Dally, Wiliam J.; Scott, Steve
- ACM SIGARCH Computer Architecture News, Vol. 36, Issue 3
Modeling a Million-Node Slim Fly Network Using Parallel Discrete-Event Simulation
conference, January 2016
- Wolfe, Noah; Carothers, Christopher D.; Mubarak, Misbah
- Proceedings of the 2016 annual ACM Conference on SIGSIM Principles of Advanced Discrete Simulation - SIGSIM-PADS '16