skip to main content
10.1145/3295500.3356208acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Topology-custom UGAL routing on dragonfly

Published: 17 November 2019 Publication History

Abstract

The Dragonfly network has been deployed in the current generation supercomputers and will be used in the next generation supercomputers. The Universal Globally Adaptive Load-balance routing (UGAL) is the state-of-the-art routing scheme for Dragonfly. In this work, we show that the performance of the conventional UGAL can be further improved on many practical Dragonfly networks, especially the ones with a small number of groups, by customizing the paths used in UGAL for each topology. We develop a scheme to compute the custom sets of paths for each topology and compare the performance of our topology-custom UGAL routing (T-UGAL) with conventional UGAL. Our evaluation with different UGAL variations and different topologies demonstrates that by customizing the routes, T-UGAL offers significant improvements over UGAL on many practical Dragonfly networks in terms of both latency when the network is under low load and throughput when the network is under high load.

References

[1]
John Kim, Wiliam J. Dally, Steve Scott, and Dennis Abts. Technology-driven, highly-scalable dragonfly topology. In Proceedings of the 35th Annual International Symposium on Computer Architecture, ISCA '08, pages 77--88, Washington, DC, USA, 2008. IEEE Computer Society.
[2]
Greg Faanes, Abdulla Bataineh, Duncan Roweth, Tom Court, Edwin Froese, Bob Alverson, Tim Johnson, Joe Kopnick, Mike Higgins, and James Reinhard. Cray cascade: A scalable hpc system based on a dragonfly network. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, SC '12, pages 103:1--103:9, Los Alamitos, CA, USA, 2012. IEEE Computer Society Press.
[3]
Oak Ridge National Laboratory. Introducing Titan---the world's #1 open science supercomputer. https://www.olcf.ornl.gov/titan/, 2012.
[4]
Billy J. Archer and Manuel Vigil. The Trinity system. In Nuclear Explosive Code Development Conference (NECDC), Los Alamos, New Mexico, October 20--24, 2014. Also appears as Los Alamos Technical Report LA-UR-15-20221.
[5]
Cray Inc. Slingshot: the interconnect for exascale computing. White paper, Feburary 2019. available at https://www.cray.com/sites/default/files/Slingshot-The-Interconnect-for-the-Exascale-Era.pdf.
[6]
Arjun Singh. Load-Balanced Routing In Interconnection Networks. PhD thesis, Stanford University, 2005.
[7]
Nan Jiang, John Kim, and William J. Dally. Indirect adaptive routing on large scale interconnection networks. In Proceedings of the 36th Annual International Symposium on Computer Architecture, ISCA '09, pages 220--231, New York, NY, USA, 2009. ACM.
[8]
M. Garcia, E. Vallejo, R. Beivide, M. Odriozola, C. Camarero, M. Valero, G. Rodríguez, J. Labarta, and C. Minkenberg. On-the-fly adaptive routing in high-radix hierarchical networks. In Parallel Processing (ICPP), 2012 41st International Conference on, pages 279--288, Sept 2012.
[9]
M. Garcia, E. Vallejo, R. Beivide, M. Valero, and G. Rodríguez. OFAR-CM: Efficient Dragonfly networks with simple congestion management. In High-Performance Interconnects (HOTI), 2013 IEEE 21st Annual Symposium on, pages 55--62, Aug 2013.
[10]
M. Garcia, E. Vallejo, R. Beivide, M. Odriozola, and M. Valero. Efficient routing mechanisms for Dragonfly networks. In Parallel Processing (ICPP), 2013 42nd International Conference on, pages 582--592, Oct 2013.
[11]
J. Won, G. Kim, J. Kim, T. Jiang, M. Parker, and S. Scott. Overcoming far-end congestion in large-scale networks. In High Performance Computer Architecture (HPCA), 2015 IEEE 21st International Symposium on, pages 415--427, Feb 2015.
[12]
P. Fuentes, E. Vallejo, M. Garcia, R. Beivide, G. Rodríguez, C. Minkenberg, and M. Valero. Contention-based nonminimal adaptive routing in high-radix networks. In Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International, pages 103--112, May 2015.
[13]
Peyman Faizian, Juan Francisco Alfaro, Md Shafayat Rahman, Md Atiqul Mollah, Xin Yuan, Scott Pakin, and Michael Lang. TPR: traffic pattern-based adaptive routing for dragonfly networks. IEEE Trans. Multi-Scale Computing Systems, 4(4):931--943, 2018.
[14]
E. Hastings, D. Rincon-Cruz, M. Spehlmann, S. Meyers, A. Xu, D. P. Bunde, and V.J. Leung. Comparing global link arrangements for dragonfly networks. In 2015 IEEE International Conference on Cluster Computing, pages 361--370, Sep. 2015.
[15]
L. G. Valiant. A scheme for fast parallel communication. SIAM Journal on Computing, 11(2):350--361, 1982.
[16]
Md Atiqul Mollah, Peyman Faizian, Md Shafayat Rahman, Xin Yuan, Scott Pakin, and Michael Lang. Modeling ugal on the dragonfly topology. In International Workshop on Performance Modeling, Benchmarking, and Simulation on High Performance Computer Systems (PMBS'17), pages 136--157, 01 2017.
[17]
Md Shafayat Rahman, Md Atiqul Mollah, Peyman Faizian, and Xin Yuan. Load-balanced slim fly networks. In Proceedings of the 47th International Conference on Parallel Processing, ICPP 2018, pages 41:1--41:10, New York, NY, USA, 2018. ACM.
[18]
N. Jiang, D. U. Becker, G. Michelogiannakis, J. Balfour, B. Towles, D. E. Shaw, J. Kim, and W. J. Dally. A detailed and flexible cycle-accurate network-on-chip simulator. In 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pages 86--96, April 2013.

Cited By

View all
  • (2024)Enhanced UGAL Routing Schemes for Dragonfly NetworksProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656602(449-459)Online publication date: 30-May-2024
  • (2024)MUSE: A Runtime Incrementally Reconfigurable Network Adapting to HPC Real-Time Traffic2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00073(765-779)Online publication date: 27-May-2024
  • (2024)Analysis and prediction of performance variability in large-scale computing systemsThe Journal of Supercomputing10.1007/s11227-024-06040-wOnline publication date: 28-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SC '19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis
November 2019
1921 pages
ISBN:9781450362290
DOI:10.1145/3295500
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

  • IEEE CS

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 November 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. UGAL routing
  2. dragonfly
  3. interconnection network

Qualifiers

  • Research-article

Funding Sources

  • NSF (National Science Foundation), Los Alamos National Laboratory

Conference

SC '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,516 of 6,373 submissions, 24%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)70
  • Downloads (Last 6 weeks)6
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Enhanced UGAL Routing Schemes for Dragonfly NetworksProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3656602(449-459)Online publication date: 30-May-2024
  • (2024)MUSE: A Runtime Incrementally Reconfigurable Network Adapting to HPC Real-Time Traffic2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS57955.2024.00073(765-779)Online publication date: 27-May-2024
  • (2024)Analysis and prediction of performance variability in large-scale computing systemsThe Journal of Supercomputing10.1007/s11227-024-06040-wOnline publication date: 28-Mar-2024
  • (2023)Adaptive Routing with Hierarchical Reinforcement Learning on Dragonfly NetworksICC 2023 - IEEE International Conference on Communications10.1109/ICC45041.2023.10278794(403-409)Online publication date: 28-May-2023
  • (2023)QVal: A Novel Routing Algorithm for Dragonfly Networks2023 IEEE International Conference on High Performance Computing & Communications, Data Science & Systems, Smart City & Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys60770.2023.00057(361-368)Online publication date: 17-Dec-2023
  • (2023)SDT: A Low-cost and Topology-reconfigurable Testbed for Network Research2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00036(343-353)Online publication date: 31-Oct-2023
  • (2023)An Analysis of Long-Tailed Network Latency Distribution and Background Traffic on Dragonfly+Benchmarking, Measuring, and Optimizing10.1007/978-3-031-31180-2_8(123-142)Online publication date: 13-May-2023
  • (2022)Performance trade-offs in reconfigurable networks for HPCJournal of Optical Communications and Networking10.1364/JOCN.45176014:6(454)Online publication date: 11-May-2022
  • (2022)Fairness-improved UGAL Routing Scheme against Selfish Users in Dragonfly Networks2022 IEEE 8th International Conference on Computer and Communications (ICCC)10.1109/ICCC56324.2022.10065845(659-664)Online publication date: 9-Dec-2022
  • (2022)Service Provisioning in WSS-Based All-Optical Data Center Network with Dragonfly Topology2022 Asia Communications and Photonics Conference (ACP)10.1109/ACP55869.2022.10088804(1280-1283)Online publication date: 5-Nov-2022
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media