skip to main content
10.1145/3225058.3225081acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

Load-Balanced Slim Fly Networks

Published: 13 August 2018 Publication History

Abstract

The Slim Fly topology has recently been proposed for the future generation supercomputers. It has small diameter and relies on the Universal Globally Adaptive Load-balanced (UGAL) routing, which adapts the routes between minimal (MIN) routing and Valiant Load-Balancing (VLB) routing to exploit the network capacity. In this work, we show that the current Slim Fly is not load-balanced for both MIN routing and VLB routing, in that certain links in the network have a significantly higher probability to carry traffic than others. As such, hot spots are more likely to form on such links. We propose two approaches to address this problem and to make Slim Fly load-balanced: (1) modifying the topology by selectively increasing the bandwidth of the potential hot-spot links so that the original routing becomes load-balanced, and (2) modifying the routing scheme by using a weighted VLB routing to distribute the traffic in a more load balanced fashion than the original VLB routing on the original Slim Fly. The results of our performance analysis and simulation demonstrate that both approaches result in more effective Slim Fly than its current form.

References

[1]
{n. d.}. IBM CPLEX Optimizer. ({n. d.}). https://www.ibm.com/us-en/marketplace/ibm-ilog-cplex/.
[2]
Maciej Besta and Torsten Hoefler. 2014. Slim fly: a cost effective low-diameter network topology. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE Press, 348--359.
[3]
Paolo Costa, Hitesh Ballani, and Dushyanth Narayanan. 2014. Rethinking the Network Stack for Rack-scale Computers. In HotCloud.
[4]
Greg Faanes, Abdulla Bataineh, Duncan Roweth, Edwin Froese, Bob Alverson, Tim Johnson, Joe Kopnick, Mike Higgins, James Reinhard, et al. 2012. Cray Cascade: A Scalable HPC System Based on a Dragonfly network. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis. IEEE Computer Society Press, 103.
[5]
P. Faizian, M. S. Rahman, M. A. Mollah, X. Yuan, S. Pakin, and M. Lang. 2016. Traffic Pattern-Based Adaptive Routing for Intra-Group Communication in Dragonfly Networks. In 2016 IEEE 24th Annual Symposium on High-Performance Interconnects (HOTI). 19--26.
[6]
P. Fuentes, E. Vallejo, M. Garcia, R. Beivide, G. Rodriguez, C. Minkenberg, and M. Valero. 2015. Contention-Based Nonminimal Adaptive Routing in High-Radix Networks. In Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International. 103--112.
[7]
Marina Garcia, Enrique Vallejo, Ramon Beivide, Miguel Odriozola, Cristobal Camarero, Mateo Valero, Jesús Labarta, Cyriel Minkenberg, et al. 2012. On-the-fly adaptive routing in high-radix hierarchical networks. In Parallel Processing (ICPP), 2012 41st International Conference on. IEEE, 279--288.
[8]
Albert Greenberg, James R Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David A Maltz, Parveen Patel, and Sudipta Sengupta. 2009. VL2: a scalable and flexible data center network. In ACM SIGCOMM computer communication review, Vol. 39. ACM, 51--62.
[9]
Deyu Han, Zhaofeng Wang, and David P Bunde. 2017. Improving Valiant Routing for Slim Fly Networks. In Parallel Processing Workshops (ICPPW), 2017 46th International Conference on. IEEE, 155--161.
[10]
Nan Jiang, James Balfour, Daniel U Becker, Brian Towles, William J Dally, George Michelogiannakis, and John Kim. 2013. A detailed and flexible cycle-accurate network-on-chip simulator. In Performance Analysis of Systems and Software (ISPASS), 2013 IEEE International Symposium on. IEEE, 86--96.
[11]
Nan Jiang, John Kim, and William J Dally. 2009. Indirect adaptive routing on large scale interconnection networks. In ACM SIGARCH Computer Architecture News, Vol. 37. ACM, 220--231.
[12]
Georgios Kathareios, Cyriel Minkenberg, Bogdan Prisacari, German Rodriguez, and Torsten Hoefler. 2015. Cost-effective diameter-two topologies: analysis and evaluation. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 36.
[13]
J. Kim, W.J. Dally, S. Scott, and D. Abts. 2008. Technology-Driven, Highly-Scalable Dragonfly Topology. In Computer Architecture, 2008. ISCA '08. 35th International Symposium on. 77--88.
[14]
Brendan D McKay, Mirka Miller, and Jozef Širáň. 1998. A note on large graphs of diameter two and given maximum degree. Journal of Combinatorial Theory, Series B 74, 1 (1998), 110--118.
[15]
Arjun Singh. 2005. Load-balanced routing in interconnection networks. Ph.D. Dissertation. Stanford University.
[16]
Leslie G. Valiant. 1982. A scheme for fast parallel communication. SIAM journal on computing 11, 2 (1982), 350--361.
[17]
J. Won, G. Kim, J. Kim, T. Jiang, M. Parker, and S. Scott. 2015. Overcoming far-end congestion in large-scale networks. In High Performance Computer Architecture (HPCA), 2015 IEEE 21st International Symposium on. 415--427.
[18]
Pedro Yébenes, Jesus Escudero-Sahuquillo, Pedro J Garcıa, Francisco J Quiles, and Torsten Hoefler. {n. d.}. Improving Non-Minimal and Adaptive Routing Algorithms in Slim Fly Networks. ({n. d.}).

Cited By

View all
  • (2021)High-Performance Routing With Multipathing and Path Diversity in Ethernet and HPC NetworksIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.303576132:4(943-959)Online publication date: 1-Apr-2021
  • (2019)Modeling Universal Globally Adaptive Load-Balanced RoutingACM Transactions on Parallel Computing10.1145/33496206:2(1-23)Online publication date: 30-Aug-2019
  • (2019)Topology-custom UGAL routing on dragonflyProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356208(1-15)Online publication date: 17-Nov-2019

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICPP '18: Proceedings of the 47th International Conference on Parallel Processing
August 2018
945 pages
ISBN:9781450365109
DOI:10.1145/3225058
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • University of Oregon: University of Oregon

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 August 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Slim Fly
  2. UGAL
  3. VLB
  4. Weight VLB
  5. load-balance

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

ICPP 2018

Acceptance Rates

ICPP '18 Paper Acceptance Rate 91 of 313 submissions, 29%;
Overall Acceptance Rate 91 of 313 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)1
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2021)High-Performance Routing With Multipathing and Path Diversity in Ethernet and HPC NetworksIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.303576132:4(943-959)Online publication date: 1-Apr-2021
  • (2019)Modeling Universal Globally Adaptive Load-Balanced RoutingACM Transactions on Parallel Computing10.1145/33496206:2(1-23)Online publication date: 30-Aug-2019
  • (2019)Topology-custom UGAL routing on dragonflyProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3295500.3356208(1-15)Online publication date: 17-Nov-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media