skip to main content
10.1145/3295500.3356140acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article
Public Access

HyperX topology: first at-scale implementation and comparison to the fat-tree

Published: 17 November 2019 Publication History

Abstract

The de-facto standard topology for modern HPC systems and data-centers are Folded Clos networks, commonly known as Fat-Trees. The number of network endpoints in these systems is steadily increasing. The switch radix increase is not keeping up, forcing an increased path length in these multi-level trees that will limit gains for latency-sensitive applications. Additionally, today's Fat-Trees force the extensive use of active optical cables which carries a prohibitive cost-structure at scale. To tackle these issues, researchers proposed various low-diameter topologies, such as Dragonfly. Another novel, but only theoretically studied, option is the HyperX. We built the world's first 3 Pflop/s supercomputer with two separate networks, a 3--level Fat-Tree and a 12×8 HyperX. This dual-plane system allows us to perform a side-by-side comparison using a broad set of benchmarks. We show that the HyperX, together with our novel communication pattern-aware routing, can challenge the performance of, or even outperform, traditional Fat-Trees.

References

[1]
Ahmed H. Abdel-Gawad, Mithuna Thottethodi, and Abhinav Bhatele. 2014. RAHTM: Routing Algorithm Aware Hierarchical Task Mapping. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '14). IEEE Press, Piscataway, NJ, USA, 325--335.
[2]
Dennis Abts and John Kim. 2011. High Performance Datacenter Networks: Architectures, Algorithms, and Opportunities. Morgan & Claypool Publishers.
[3]
Jung Ho Ahn, Nathan Binkert, Al Davis, Moray McLaren, and Robert S. Schreiber. 2009. HyperX: Topology, Routing, and Packaging of Efficient Large-scale Networks. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC '09). ACM, New York, NY, USA, 41:1--41:11.
[4]
Yuichiro Ajima, Tomohiro Inoue, Shinya Hiramoto, Toshiyuki Shimizu, and Yuzo Takagi. 2012. The Tofu Interconnect. IEEE Micro 32, 1 (2012), 21--31.
[5]
Baba Arimilli, Ravi Arimilli, Vicente Chung, Scott Clark, Wolfgang Denzel, Ben Drerup, Torsten Hoefler, Jody Joyner, Jerry Lewis, Jian Li, Nan Ni, and Ram Rajamony. 2010. The PERCS High-Performance Interconnect. In 2010 IEEE 18th Annual Symposium on High Performance Interconnects (HOTI). 75--82.
[6]
Sadoon Azizi, Farshad Safaei, and Naser Hashemi. 2013. On the topological properties of HyperX. The Journal of Supercomputing 66, 1 (Oct. 2013), 572--593.
[7]
Baidu, Inc. 2017. baidu-allreduce. https://github.com/baidu-research/baidu-allreduce
[8]
Claude Bernard, Tom Burch, Thomas A. DeGrand, Carleton DeTar, Steven Gottlieb, Urs M. Heller, James E. Hetrick, Kostas Orginos, Bob Sugar, and Doug Toussaint. 2000. Scaling tests of the improved Kogut-Susskind quark action. Physical Review D 61, 11 (April 2000), 4.
[9]
Maciej Besta and Torsten Hoefler. 2014. Slim Fly: A Cost Effective Low-diameter Network Topology. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '14). IEEE Press, Piscataway, NJ, USA, 348--359.
[10]
Kevin Brown, Jens Domke, and Satoshi Matsuoka. 2015. Hardware-Centric Analysis of Network Performance for MPI Applications. In 2015 21th IEEE International Conference on Parallel and Distributed Systems (ICPADS). IEEE Press, Melbourne, Australia, 8.
[11]
Dong Chen, Noel Eisley, Philip Heidelberger, Robert Senger, Yutaka Sugawara, Sameer Kumar, Valentina Salapura, David Satterfield, Burkhard Steinmacher-Burow, and Jeffrey Parker. 2012. The IBM Blue Gene/Q Interconnection Fabric. IEEE Micro 32, 1 (Jan. 2012), 32--43.
[12]
Charles Clos. 1953. A Study of Non-Blocking Switching Networks. The Bell System Technical Journal 32, 2 (March 1953), 406--424.
[13]
William Dally and Brian Towles. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
[14]
Jens Domke and Torsten Hoefler. 2016. Scheduling-Aware Routing for Supercomputers. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '16). IEEE Press, Piscataway, NJ, USA, 13:1--13:12. http://dl.acm.org/citation.cfm?id=3014904.3014922
[15]
Jens Domke, Torsten Hoefler, and Satoshi Matsuoka. 2014. Fail-in-Place Network Design: Interaction between Topology, Routing Algorithm and Failures. In Proceedings of the IEEE/ACM International Conference for High Performance Computing, Networking, Storage and Analysis (SC14) (SC '14). IEEE Press, New Orleans, LA, USA, 597--608.
[16]
Jens Domke, Torsten Hoefler, and Satoshi Matsuoka. 2016. Routing on the Dependency Graph: A New Approach to Deadlock-Free High-Performance Routing. In Proceedings of the 25th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC '16). ACM, New York, NY, USA, 3--14.
[17]
Jens Domke, Torsten Hoefler, and Wolfgang E. Nagel. 2011. Deadlock-Free Oblivious Routing for Arbitrary Topologies. In Proceedings of the 25th IEEE International Parallel & Distributed Processing Symposium (IPDPS). IEEE Computer Society, Washington, DC, USA, 613--624.
[18]
Jack Dongarra, Michael Heroux, and Piotr Luszczek. 2015. HPCG Benchmark: a New Metric for Ranking High Performance Computing Systems. Technical Report ut-eecs-15-736. University of Tennessee. https://library.eecs.utk.edu/pub/594
[19]
Exascale Computing Project. 2018. ECP Proxy Apps Suite. https://proxyapps.exascaleproject.org/ecp-proxy-apps-suite/
[20]
Greg Faanes, Abdulla Bataineh, Duncan Roweth, Tom Court, Edwin Froese, Bob Alverson, Tim Johnson, Joe Kopnick, Mike Higgins, and James Reinhard. 2012. Cray Cascade: a Scalable HPC System based on a Dragonfly Network. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC '12). IEEE Computer Society Press, Los Alamitos, CA, USA, 103:1--103:9. http://dl.acm.org/citation.cfm?id=2388996.2389136
[21]
Samuel A. Fineberg and Kevin T. Pedretti. 1999. Analysis of 100Mb/s Ethernet for the Whitney Commodity Computing Testbed. In Proceedings of the The 7th Symposium on the Frontiers of Massively Parallel Computation (FRONTIERS '99). IEEE Computer Society, Washington, DC, USA, 276--. http://dl.acm.org/citation.cfm?id=795668.796738
[22]
Ikki Fujiwara, Michihiro Koibuchi, Hiroki Matsutani, and Henri Casanova. 2014. Skywalk: A Topology for HPC Networks with Low-Delay Switches. In 2014 IEEE 28th International Parallel and Distributed Processing Symposium. 263--272.
[23]
Edgar Gabriel, Graham E. Fagg, George Bosilca, Thara Angskun, Jack J. Dongarra, Jeffrey M. Squyres, Vishal Sahay, Prabhanjan Kambadur, Brian Barrett, Andrew Lumsdaine, Ralph H. Castain, David J. Daniel, Richard L. Graham, and Timothy S. Woodall. 2004. Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation. In Proceedings, 11th European PVM/MPI Users' Group Meeting. Budapest, Hungary, 97--104.
[24]
GSIC, Tokyo Institute of Technology. 2013. TSUBAME2.5 Hardware and Software. https://www.gsic.titech.ac.jp/sites/default/files/spec25e1.pdf
[25]
Francois Gygi. 2008. Architecture of Qbox: A Scalable First-principles Molecular Dynamics Code. IBM Journal of Research and Development 52, 1/2 (Jan. 2008), 137--144. http://dl.acm.org/citation.cfm?id=1375990.1376003
[26]
Salman Habib, Vitali Morozov, Nicholas Frontiere, Hal Finkel, Adrian Pope, Katrin Heitmann, Kalyan Kumaran, Venkatram Vishwanath, Tom Peterka, Joe Insley, David Daniel, Patricia Fasel, and Zarija Lukić. 2016. HACC: Extreme Scaling and Performance Across Diverse Architectures. Commun. ACM 60, 1 (Dec. 2016), 97--104.
[27]
Michael A Heroux, Douglas W Doerfler, Paul S Crozier, James M Willenbring, H Carter Edwards, Alan Williams, Mahesh Rajan, Eric R Keiter, Heidi K Thornquist, and Robert W Numrich. 2009. Improving Performance via Mini-applications. Technical Report SAND2009-5574. Sandia National Laboratories.
[28]
Torsten Hoefler and Roberto Belli. 2015. Scientific benchmarking of parallel computing systems: twelve ways to tell the masses when reporting performance results. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '15). Austin, TX, USA, 73:1--73:12.
[29]
Torsten Hoefler, Torsten Mehlan, Andrew Lumsdaine, and Wolfgang Rehm. 2007. Netgauge: A Network Performance Measurement Framework. In High Performance Computing and Communications, Third International Conference, HPCC 2007, Houston, USA, September 26--28, 2007, Proceedings, Vol. 4782. Springer, 659--671.
[30]
Torsten Hoefler, Timo Schneider, and Andrew Lumsdaine. 2008. Multistage Switches are not Crossbars: Effects of Static Routing in High-Performance Networks. In Proceedings of the 2008 IEEE International Conference on Cluster Computing. IEEE Computer Society.
[31]
Torsten Hoefler, Timo Schneider, and Andrew Lumsdaine. 2009. Optimized Routing for Large-Scale InfiniBand Networks. In 17th Annual IEEE Symposium on High Performance Interconnects (HOTI 2009).
[32]
Torsten Hoefler, Timo Schneider, and Andrew Lumsdaine. 2010. Characterizing the Influence of System Noise on Large-Scale Applications by Simulation. In Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC '10). IEEE Computer Society, Washington, DC, USA, 1--11.
[33]
Torsten Hoefler and Marc Snir. 2011. Generic Topology Mapping Strategies for Large-scale Parallel Architectures. In Proceedings of the 2011 ACM International Conference on Supercomputing(ICS'11). ACM, Tucson, AZ, 75--85.
[34]
InfiniBand® Trade Association. 2015. InfiniBandTM Architecture Specification Volume 1 Release 1.3 (General Specifications).
[35]
Intel Corporation. 2018. Intel® MPI Benchmarks User Guide. https://software.intel.com/en-us/imb-user-guide
[36]
Nikhil Jain, Abhinav Bhatele, Louis H. Howell, David Böhme, Ian Karlin, Edgar A. León, Misbah Mubarak, Noah Wolfe, Todd Gamblin, and Matthew L. Leininger. 2017. Predicting the Performance Impact of Different Fat-tree Configurations. In Proceedings of the ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC '17). ACM, New York, NY, USA, 50:1--50:13. Note: LLNL-CONF-736289.
[37]
Nikhil Jain, Abhinav Bhatele, Xiang Ni, Todd Gamblin, and Laxmikant V. Kalé. 2017. Partitioning Low-Diameter Networks to Eliminate Inter-Job Interference. In 2017 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2017, Orlando, FL, USA, May 29 - June 2, 2017 (IPDPS '17). IEEE Computer Society, 439--448.
[38]
Haoqiang Jin, Dennis Jespersen, Piyush Mehrotra, Rupak Biswas, Lei Huang, and Barbara Chapman. 2011. High performance computing using MPI and OpenMP on multi-core parallel systems. Parallel Comput. 37, 9 (2011), 562--575.
[39]
Steven J. Johnston, Philip J. Basford, Colin S. Perkins, Herry Herry, Fung Po Tso, Dimitrios Pezaros, Robert D. Mullins, Eiko Yoneki, Simon J. Cox, and Jeremy Singer. 2018. Commodity single board computer clusters and their applications. Future Generation Computer Systems 89 (2018), 201--212.
[40]
Georgios Kathareios, Cyriel Minkenberg, Bogdan Prisacari, German Rodriguez, and Torsten Hoefler. 2015. Cost-effective Diameter-two Topologies: Analysis and Evaluation. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '15). ACM, New York, NY, USA, 36:1--36:11.
[41]
John Kim, WiliamJ. Dally, Steve Scott, and Dennis Abts. 2008. Technology-Driven, Highly-Scalable Dragonfly Topology. ACM SIGARCH Comput. Architecture News 36, 3 (June 2008), 77--88.
[42]
Benjamin Klenk and Holger Fröning. 2017. An Overview of MPI Characteristics of Exascale Proxy Applications. In High Performance Computing: 32nd International Conference, ISC High Performance 2017 (ISC '17). Frankfurt, Germany, 217--236.
[43]
Andreas Knüpfer, Holger Brunst, Jens Doleschal, Matthias Jurenz, Matthias Lieber, Holger Mickler, Matthias S. Müller, and Wolfgang E. Nagel. 2008. The Vampir Performance Analysis Tool-Set. In Tools for High Performance Computing, Michael Resch, Rainer Keller, Valentin Himmler, Bettina Krammer, and Alexander Schulz (Eds.). Springer Berlin Heidelberg, 139--155.
[44]
Lawrence Livermore National Laboratory. 2019. Qbox: Computing Electronic Structures at the Quantum Level. https://computation.llnl.gov/projects/qbox-computing-structures-quantum-level
[45]
Matt Leininger. 2014. CORAL Benchmark Codes. https://asc.llnl.gov/CORAL-benchmarks/
[46]
Charles E. Leiserson. 1985. Fat-trees: Universal Networks for Hardware-efficient Supercomputing. IEEE Trans. Comput. 34, 10 (Oct. 1985), 892--901. http://dl.acm.org/citation.cfm?id=4492.4495
[47]
Edgar A. León, Ian Karlin, Abhinav Bhatele, Steven H. Langer, Chris Chambreau, Louis H. Howell, Trent D'Hooge, and Matthew L. Leininger. 2016. Characterizing Parallel Scientific Applications on Commodity Clusters: An Empirical Study of a Tapered Fat-tree. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '16). IEEE Press, Piscataway, NJ, USA, 78:1--78:12. http://dl.acm.org/citation.cfm?id=3014904.3015009
[48]
Ning Liu, Adnan Haider, Xian-He Sun, and Dong Jin. 2015. FatTreeSim: Modeling Large-scale Fat-Tree Networks for HPC Systems and Data Centers Using Parallel and Discrete Event Simulation. In Proceedings of the 3rd ACM SIGSIM Conference on Principles of Advanced Discrete Simulation (SIGSIM PADS '15). ACM, New York, NY, USA, 199--210.
[49]
Pedro López, José Flich, and José Duato. 2001. Deadlock-free routing in InfiniBand through destination renaming. In International Conference on Parallel Processing, 2001. (ICPP '01). 427--434.
[50]
J. C. Martínez, José Flich, Antonio Robles, Pedro López, and José Duato. 2003. Supporting Fully Adaptive Routing in InfiniBand Networks. In Proceedings of the 17th International Symposium on Parallel and Distributed Processing (IPDPS '03). IEEE Computer Society, Washington, DC, USA, 44.1--54.1. http://dl.acm.org/citation.cfm?id=838237.838493
[51]
Message Passing Interface Forum. 2015. MPI: A Message-Passing Interface Standard Version 3.1. http://www.mpi-forum.org/docs/mpi-3.1/mpi31-report.pdf
[52]
George Michelogiannakis, Khaled Z. Ibrahim, John Shalf, Jeremiah J. Wilke, Samuel Knight, and Joseph P. Kenny. 2017. APHiD: Hierarchical Task Placement to Enable a Tapered Fat Tree Topology for Lower Power and Cost in HPC Networks. In Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid '17). IEEE Press, Piscataway, NJ, USA, 228--237.
[53]
Takahiro Misawa, Satoshi Morita, Kazuyoshi Yoshimi, Mitsuaki Kawamura, Yuichi Motoyama, Kota Ido, Takahiro Ohgoe, Masatoshi Imada, and Takeo Kato. 2018. mVMC-Open-source software for many-variable variational Monte Carlo method. Computer Physics Communications (2018).
[54]
Jamaludin Mohd-Yusof, Sriram Swaminarayan, and Timothy C. Germann. 2013. Co-design for molecular dynamics: An exascale proxy application. Technical Report LA-UR 13-20839. Los Alamos National Laboratory. http://www.lanl.gov/orgs/adtsc/publications/science_highlights_2013/docs/Pg88_89.pdf
[55]
Misbah Mubarak, Nikhil Jain, Jens Domke, Noah Wolfe, Caitlin Ross, Kelvin Li, Abhinav Bhatele, Christopher D. Carothers, Kwan-Liu Ma, and Robert B. Ross. 2017. Toward Reliable Validation of HPC Interconnect Simulations. In Proceedings of the 2017 Winter Simulation Conference (WSC '17). IEEE Press, Las Vegas, NV, USA, 659--674.
[56]
Jayaram Mudigonda, Praveen Yalagandula, and Jeffrey C. Mogul. 2011. Taming the Flying Cable Monster: A Topology Design and Optimization Framework for Data-center Networks. In Proceedings of the 2011 USENIX Conference on USENIX Annual Technical Conference (USENIX ATC'11). USENIX Association, Berkeley, CA, USA, 14. http://dl.acm.org/citation.cfm?id=2002181.2002189
[57]
Richard C. Murphy, Kyle B. Wheeler, Brian W. Barrett, and James A. Ang. 2010. Introducing the Graph 500. In Proceedings of the Cray User's Group Meeting (CUG). 5.
[58]
Takahito Nakajima, Michio Katouda, Muneaki Kamiya, and Yutaka Nakatsuka. 2014. NTChem: A High-Performance Software Package for Quantum Molecular Simulation. International Journal of Quantum Chemistry 115, 5 (Dec. 2014), 349--359.
[59]
National Energy Research Scientific Computing Center (NERSC). 2016. NERSC-8 / Trinity Benchmarks. https://www.nersc.gov/users/computational-systems/cori/nersc-8-procurement/trinity-nersc-8-rfp/nersc-8-trinity-benchmarks/
[60]
National Energy Research Scientific Computing Center (NERSC). 2018. MILC. https://www.nersc.gov/users/computational-systems/cori/nersc-8--procurement/trinity-nersc-8-rfp/nersc-8-trinity-benchmarks/milc/
[61]
Sabine R. Öhring, Maximilian Ibel, Sajal K. Das, and Mohan J. Kumar. 1995. On generalized fat trees. In IPPS '95: Proceedings of the 9th International Symposium on Parallel Processing. IEEE Computer Society, Washington, DC, USA, 37.
[62]
Kenji Ono, Masako Iwata, Tsuyoshi Tamaki, Yasuhiro Kawashima, Kei Akasaka, Soichiro Suzuki, Junya Onishi, Ken Uzawa, Kazuhiro Hamaguchi, Yohei Miyazaki, and Masashi Imano. [n. d.]. FFV-C package. http://avr-aics-riken.github.io/ffvc_ package/
[63]
OpenFabrics Alliance. 2018. OpenSM. https://github.com/linux-rdma/opensm
[64]
Jongsoo Park, Mikhail Smelyanskiy, Ulrike Meier Yang, Dheevatsa Mudigere, and Pradeep Dubey. 2015. High-performance Algebraic Multigrid Solver Optimized for Multi-core Based Distributed Parallel Systems. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '15). ACM, Austin, TX, USA, 54:1--54:12.
[65]
Scott Parker, Vitali Morozov, Sudheer Chunduri, Kevin Harms, Chris Knight, and Kalyan Kumaran. 2017. Early Evaluation of the Cray XC40 Xeon Phi System 'Theta' at Argonne. In Proceedings of the Cray User's Group Meeting (CUG) (CUG '17). https://www.osti.gov/biblio/1393541
[66]
Fabrizio Petrini and Marco Vanneschi. 1997. k-ary n-trees: high performance networks for massively parallel architectures. In 11th International Parallel Processing Symposium. 87--93.
[67]
Samuel D. Pollard, Nikhil Jain, Stephen Herbein, and Abhinav Bhatele. 2018. Evaluation of an Interference-free Node Allocation Policy on Fat-tree Clusters. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC '18). IEEE Press, Piscataway, NJ, USA, 26:1--26:13. http://dl.acm.org/citation.cfm?id=3291656.3291691
[68]
RIKEN AICS. 2015. Fiber Miniapp Suite. https://fiber-miniapp.github.io/
[69]
Gonzalo P. Rodrigo Álvarez, Per-Olov Östberg, Erik Elmroth, Katie Antypas, Richard Gerber, and Lavanya Ramakrishnan. 2015. HPC System Lifetime Story: Workload Characterization and Evolutionary Analyses on NERSC Systems. In Proceedings of the 24th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '15). ACM, New York, NY, USA, 57--60.
[70]
German Rodriguez, Ramon Beivide, Cyriel Minkenberg, Jesus Labarta, and Mateo Valero. 2009. Exploring Pattern-aware Routing in Generalized Fat Tree Networks. In Proceedings of the 23rd International Conference on Supercomputing (ICS '09). ACM, New York, NY, USA, 276--285.
[71]
SchedMD LLC. 2019. slurm.conf (SelectType). https://slurm.schedmd.com/slurm.conf.html
[72]
Michael D. Schroeder, Andrew D. Birell, Michael Burrows, Hal Murray, Roger M. Needham, Thomas L. Rodeheffer, Edwin H. Satterthwaite, and Charles P. Thacker. 1991. Autonet: A High-speed, Self-Configuring Local Area Network Using Point-to-Point Links. IEEE Journal on Selected Areas in Communications 9, 8 (Oct. 1991).
[73]
Sameer S. Shende and Allen D. Malony. 2006. The Tau Parallel Performance System. The International Journal of High Performance Computing Applications 20 (2006), 287--331.
[74]
Alexander Shpiner, Zachy Haramaty, Saar Eliad, Vladimir Zdornov, Barak Gafni, and Eitan Zahavi. 2017. Dragonfly+: Low Cost Topology for Scaling Datacenters. In 2017 IEEE 3rd International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB). 1--8.
[75]
Tor Skeie, Olav Lysne, and Ingebjørg Theiss. 2002. Layered Shortest Path (LASH) Routing in Irregular System Area Networks. In IPDPS '02: Proceedings of the 16th International Parallel and Distributed Processing Symposium. IEEE Computer Society, Washington, DC, USA, 194.
[76]
Staci A. Smith, Clara E. Cromey, David K. Lowenthal, Jens Domke, Nikhil Jain, Jayaraman J. Thiagarajan, and Abhinav Bhatele. 2018. Mitigating Inter-Job Interference Using Adaptive Flow-Aware Routing. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '18). IEEE Press, Piscataway, NJ, USA, 27:1--27:15. http://dl.acm.org/citation.cfm?id=3291656.3291692
[77]
Erich Strohmaier, Jack Dongarra, Horst Simon, and Martin Meuer. 2018. TOP500. http://www.top500.org/
[78]
Stanimire Tomov, Azzam Haidar, Daniel Schultz, and Jack Dongarra. 2018. Evaluation and Design of FFT for Distributed Accelerated Systems. Tech Report FFT-ECP ST-MS-10-1216. Innovative Computing Laboratory, University of Tennessee. https://www.icl.utk.edu/files/publications/2018/icl-utk-1079--2018.pdf
[79]
Koji Ueno, Toyotaro Suzumura, Naoya Maruyama, Katsuki Fujisawa, and Satoshi Matsuoka. 2016. Extreme scale breadth-first search on supercomputers. In 2016 IEEE International Conference on Big Data (Big Data). 1040--1047.
[80]
Abhinav Vishnu, Matt Koop, Adam Moody, Amith R. Mamidala, Sundeep Narravula, and Dhabaleswar K. Panda. 2007. Hot-Spot Avoidance With Multi-Pathing Over InfiniBand: An MPI Perspective. In Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid '07). 479--486.
[81]
Noah Wolfe, Christopher D. Carothers, Misbah Mubarak, Robert Ross, and Philip Carns. 2016. Modeling a Million-Node Slim Fly Network Using Parallel Discrete-Event Simulation. In Proceedings of the 2016 Annual ACM Conference on SIGSIM Principles of Advanced Discrete Simulation (SIGSIM-PADS '16). ACM, New York, NY, USA, 189--199.
[82]
Noah Wolfe, Misbah Mubarak, Nikhil Jain, Jens Domke, Abhinav Bhatele, Christopher D. Carothers, and Robert B. Ross. 2017. Preliminary Performance Analysis of Multi-rail Fat-tree Networks. In 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid '17). IEEE Press, Madrid, Spain, 258--261. Short paper.
[83]
Xu Yang, John Jenkins, Misbah Mubarak, Robert B. Ross, and Zhiling Lan. 2016. Watch out for the Bully!: Job Interference Study on Dragonfly Network. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '16). IEEE Press, Piscataway, NJ, USA, 64:1--64:11. http://dl.acm.org/citation.cfm?id=3014904.3014990
[84]
Andy B. Yoo, Morris A. Jette, and Mark Grondona. 2003. SLURM: Simple Linux Utility for Resource Management. In Job Scheduling Strategies for Parallel Processing: 9th International Workshop, JSSPP 2003, Seattle, WA, USA, June 24, 2003. Revised Paper, Dror Feitelson, Larry Rudolph, and Uwe Schwiegelshohn (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 44--60.
[85]
Eitan Zahavi. 2010. D-Mod-K Routing Providing Non-Blocking Traffic for Shift Permutations on Real Life Fat Trees. Technical Report. https://webee.technion.ac.il/publication-link/index/id/574
[86]
Eitan Zahavi. 2012. Fat-tree Routing and Node Ordering Providing Contention Free Traffic for MPI Global Collectives. J. Parallel Distrib. Comput. 72, 11 (Nov. 2012), 1423--1432.

Cited By

View all
  • (2024)SwingProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691905(1445-1462)Online publication date: 16-Apr-2024
  • (2024)Topologies for Blockchain Payment Channel Networks: Models and ConstructionsIEEE/ACM Transactions on Networking10.1109/TNET.2024.344527432:6(4781-4797)Online publication date: Dec-2024
  • (2024)An Evaluation of the Effect of Network Cost Optimization for Leadership Class SupercomputersSC24: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41406.2024.00037(1-16)Online publication date: 17-Nov-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2019
1921 pages
ISBN:9781450362290
DOI:10.1145/3295500
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2019

Permissions

Request permissions for this article.

Check for updates

Badges

Author Tags

  1. HyperX
  2. InfiniBand
  3. PARX
  4. fat-tree
  5. network topology
  6. routing

Qualifiers

  • Research-article

Funding Sources

Conference

SC '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)403
  • Downloads (Last 6 weeks)24
Reflects downloads up to 11 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)SwingProceedings of the 21st USENIX Symposium on Networked Systems Design and Implementation10.5555/3691825.3691905(1445-1462)Online publication date: 16-Apr-2024
  • (2024)Topologies for Blockchain Payment Channel Networks: Models and ConstructionsIEEE/ACM Transactions on Networking10.1109/TNET.2024.344527432:6(4781-4797)Online publication date: Dec-2024
  • (2024)An Evaluation of the Effect of Network Cost Optimization for Leadership Class SupercomputersSC24: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41406.2024.00037(1-16)Online publication date: 17-Nov-2024
  • (2024)MUSE: A Runtime Incrementally Reconfigurable Network Adapting to HPC Real-Time Traffic2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00073(765-779)Online publication date: 27-May-2024
  • (2024)A Bandwidth-Optimal All-to-All Communication in Two-Dimensional Fully Connected Network2024 IEEE 24th International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid59990.2024.00010(1-7)Online publication date: 6-May-2024
  • (2024)Trade-off topology design for hierarchical network based on job characteristicsCCF Transactions on High Performance Computing10.1007/s42514-024-00193-zOnline publication date: 21-May-2024
  • (2023)Path Diversity and Survivability for the HyperX Datacenter TopologyIEEE Transactions on Network and Service Management10.1109/TNSM.2023.328591420:3(2370-2385)Online publication date: Sep-2023
  • (2023)Analysing Mechanisms for Virtual Channel Management in Low-Diameter Networks2023 IEEE 35th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)10.1109/SBAC-PAD59825.2023.00011(12-22)Online publication date: 17-Oct-2023
  • (2023)Efficient Intra-Rack Resource Disaggregation for HPC Using Co-Packaged DWDM Photonics2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00021(158-172)Online publication date: 31-Oct-2023
  • (2023)Realizing Optimal All-to-All Personalized Communication Using Butterfly-Based NetworksIEEE Access10.1109/ACCESS.2023.327949411(51064-51083)Online publication date: 2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media