Abstract
The interconnection network is a crucial subsystem in High-Performance Computing clusters and Data-centers, guaranteeing high bandwidth and low latency to the applications’ communication operations. Unfortunately, congestion situations may spoil network performance unless the network design applies specific countermeasures. Adaptive routing algorithms are a traditional approach to dealing with congestion since they provide traffic flows with alternative routes that bypass congested areas. However, adaptive routing decisions at switches are typically based on local information without a global network traffic perspective, leading to congestion spreading throughout the network beyond the original congested areas. In this paper, we propose a new efficient congestion management strategy that leverages adaptive routing notifications currently available in some interconnect technologies and efficiently isolates the congesting flows in reserved spaces at switch buffers. The experiment results based on simulations of realistic traffic scenarios show that our proposal removes the congestion impact.
Similar content being viewed by others
Data Availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
Notes
Note that the term CLOS is currently employed, mainly in Data-centers context, to refer to topologies similar (if not identical) to Fat Trees. Although the original CLOS network was a three-stage non-blocking unidirectional topology, the fact that a Fat Tree can be built by recursively applying the CLOS construction method (followed by folding) led to using the terms CLOS and Fat Tree interchangeably.
Note that a link failure may also generate a congestion situation, as the network bisection bandwidth decreases due to this failure.
Thanks to the ARNs mechanism, the switches also monitor the switch link status to detect link failures. Hereafter, we will only focus on the congestion monitoring functionality.
We assume the same criterion as defined in Sect. A10.1.2 of the InfiniBand specification.
References
Leiserson CE (1985) Fat-trees: Universal networks for hardware-efficient supercomputing. IEEE Trans Comput C–34(10):892–901. https://doi.org/10.1109/TC.1985.6312192
Requena CG, Villamón FG, Gómez ME, López P, Duato J (2007) Deterministic versus Adaptive Routing in Fat-Trees. In: 21th International Parallel and Distributed Processing Symposium (IPDPS) 2007, Proceedings, 26-30 March 2007, Long Beach, California, USA, pp. 1–8. IEEE. https://doi.org/10.1109/IPDPS.2007.370482
Zahavi E, Johnson G, Kerbyson DJ, Lang M (2010) Optimized InfiniBand\(^{\text{ TM }}\) fat-tree routing for shift all-to-all communication patterns. J CCPE 22(2):217–231
Rodriguez G, Minkenberg C, Beivide R, Luijten RP, Labarta J, Valero M (2009) Oblivious routing schemes in extended generalized fat tree networks. In: 2009 IEEE International Conference on Cluster Computing and Workshops, pp. 1–8. https://doi.org/10.1109/CLUSTR.2009.5289145
Zahavi E, Keslassy I, Kolodny A (2014) Distributed adaptive routing convergence to non-blocking DCN routing assignments. IEEE J Sel Areas Commun 32(1):88–101. https://doi.org/10.1109/JSAC.2014.140109
Geoffray P, Hoefler T (2008) Adaptive routing strategies for modern high performance networks. In: 16th Annual IEEE Symposium on High Performance Interconnects (HOTI 2008), 26-28 August 2008, Stanford, CA, USA, pp. 165–172. IEEE Computer Society. https://doi.org/10.1109/HOTI.2008.21
Kim J, Dally WJ, Abts D (2006) Interconnect routing and scheduling - adaptive routing in high-radix clos network. In: Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, November 11-17, 2006, Tampa, FL, USA, p. 92. ACM Press. https://doi.org/10.1145/1188455.1188552
Gratz P, Grot B, Keckler SW (2008) Regional congestion awareness for load balance in networks-on-chip. In: 2008 IEEE 14th International Symposium on High Performance Computer Architecture, pp. 203–214. https://doi.org/10.1109/HPCA.2008.4658640
Ma S, Jerger NE, Wang Z (2011) Dbar: An efficient routing algorithm to support multiple concurrent applications in networks-on-chip. In: 2011 38th Annual International Symposium on Computer Architecture (ISCA), pp. 413–424
Hopps CE (2000) Analysis of an equal-cost multi-path algorithm. RFC 2992, 1–8. https://doi.org/10.17487/RFC2992
He K, Rozner E, Agarwal K, Felter W, Carter JB, Akella A (2015) Presto: Edge-based load balancing for fast datacenter networks. In: Uhlig S, Maennel O, Karp B, Padhye J (eds.) Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, SIGCOMM 2015, London, United Kingdom, August 17-21, 2015, pp. 465–478. ACM. https://doi.org/10.1145/2785956.2787507
Ghorbani S, Yang Z, Godfrey PB, Ganjali Y, Firoozshahian A (2017) DRILL: micro load balancing for low-latency data center networks. In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication, SIGCOMM 2017, Los Angeles, CA, USA, August 21-25, 2017, pp. 225–238. ACM. https://doi.org/10.1145/3098822.3098839
Wang S, Luo J, Wong WS (2018) Improved power of two choices for fat-tree routing. IEEE Trans Netw Serv Manage 15(4):1706–1719. https://doi.org/10.1109/TNSM.2018.2865543
Besta M, Hoefler T (2014) Slim Fly: A Cost Effective Low-Diameter Network Topology. In: Damkroger T, Dongarra J (eds.) International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014, New Orleans, LA, USA, November 16-21, 2014, pp. 348–359. IEEE. https://doi.org/10.1109/SC.2014.34
Singla A, Hong C, Popa L, Godfrey PB (2012) Jellyfish: Networking data centers randomly. In: Gribble SD, Katabi D (eds.) Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2012, San Jose, CA, USA, April 25-27, 2012, pp. 225–238. USENIX Association. https://www.usenix.org/conference/nsdi12/technical-sessions/presentation/singla
Kim J, Dally WJ, Scott S, Abts D (2008) Technology-Driven, Highly-Scalable Dragonfly Topology. In: 35th International Symposium on Computer Architecture (ISCA) 2008, June 21-25, 2008, Beijing, China, pp. 77–88. IEEE Computer Society. https://doi.org/10.1109/ISCA.2008.19
Valiant LG (1982) A scheme for fast parallel communication. SIAM J Comput 11(2):350–361. https://doi.org/10.1137/0211027
Jiang N, Kim J, Dally WJ (2009) Indirect adaptive routing on large scale interconnection networks. In: 36th International Symposium on Computer Architecture (ISCA 2009), June 20-24, 2009, Austin, TX, USA, pp. 220–231. https://doi.org/10.1145/1555754.1555783
Newaz MN, Mollah MA, Faizian P, Tong Z (2021) Improving adaptive routing performance on large scale megafly topology. In: The 21st IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid, May 10-13, 2021, Melbourne, Victoria, Australia, p. 1. IEEE/ACM
Besta M, Domke J, Schneider M, Konieczny M, Girolamo SD, Schneider T, Singla A, Hoefler T (2021) High-performance routing with multipathing and path diversity in ethernet and HPC networks. IEEE Trans Parallel Distribut Syst 32(4):943–959. https://doi.org/10.1109/TPDS.2020.3035761
Guo L, Congdon P (2021) Ieee 802 nendica report: Intelligent lossless data center networks. IEEE SA Industry Connections–IEEE 802 Nendica Report: Intelligent Lossless Data Center Networks, 1–44
García PJ, Flich J, Duato J, Johnson I, Quiles FJ, Naven F (2005) Dynamic evolution of congestion trees: Analysis and impact on switch architecture. In: High Performance Embedded Architectures and Compilers, First International Conference, HiPEAC 2005, Barcelona, Spain, November 17-18, 2005, Proceedings, pp. 266–285. https://doi.org/10.1007/11587514_18
Karol MJ, Hluchyj MG, Morgan SP (1987) Input versus output queueing on a space-division packet switch. IEEE Trans Commun 35(12):1347–1356. https://doi.org/10.1109/TCOM.1987.1096719
Yoshigoe K (2007) Threshold-based exhaustive round-robin for the cicq switch with virtual crosspoint queues. In: 2007 IEEE International Conference on Communications, pp. 6325–6329. https://doi.org/10.1109/ICC.2007.1047
Jurczyk M, Schwederski T (1996) Phenomenon of Higher Order Head-of-Line Blocking in Multistage Interconnection Networks under Nonuniform Traffic Patterns
Rocher-Gonzalez J, Escudero-Sahuquillo J, García PJ, Quiles FJ, Mora G (2021) Towards an efficient combination of adaptive routing and queuing schemes in fat-tree topologies. J Parallel Distribut Comput 147:46–63. https://doi.org/10.1016/j.jpdc.2020.07.009
Rocher-Gonzalez J, Escudero-Sahuquillo J, García PJ, Flor FJQ, Mora G (2019) Efficient congestion management for high-speed interconnects using adaptive routing. In: 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2019, Larnaca, Cyprus, May 14-17, 2019, pp. 221–230. IEEE. https://doi.org/10.1109/CCGRID.2019.00036
Nachiondo T, Flich J, Duato J (2010) Buffer management strategies to reduce HoL blocking. Parallel and Distribut Syst IEEE Trans 21(6):739–753. https://doi.org/10.1109/TPDS.2009.63
Guay WL, Bogdanski B, Reinemo S, Lysne O, Skeie T (2011) vFtree - A Fat-Tree Routing Algorithm Using Virtual Lanes to Alleviate Congestion. In: 25th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2011, Anchorage, Alaska, USA, 16-20 May, 2011 - Conference Proceedings, pp. 197–208. https://doi.org/10.1109/IPDPS.2011.28
Escudero-Sahuquillo J, García PJ, Quiles FJ, Reinemo S, Skeie T, Lysne O, Duato J (2014) A new proposal to deal with congestion in InfiniBand-based fat-trees. J Parallel Distrib Comput 74(1):1802–1819. https://doi.org/10.1016/j.jpdc.2013.09.002
Mellanox: NVIDIA MELLANOX QUANTUM -PRODUCT BRIEF. https://network.nvidia.com/sites/default/files/doc-2020/pb-quantum-hdr-switch-silicon.pdf Accessed 2020-09-15
Mellanox: How To Configure Adaptive Routing and SHIELD (New). https://support.mellanox.com/s/article/How-To-Configure-Adaptive-Routing-and-Self-Healing-Networking-New Accessed 2021-08-05
Zahavi E (2011) Fat-trees routing and node ordering providing contention free traffic for MPI global collectives. In: 25th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2011, Anchorage, Alaska, USA, 16-20 May 2011 - Workshop Proceedings, pp. 761–770. IEEE. https://doi.org/10.1109/IPDPS.2011.219
Rodríguez G, Minkenberg C, Beivide R, Luijten RP, Labarta J, Valero M (2009) Oblivious routing schemes in extended generalized fat tree networks. In: Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31 - September 4, 2009, New Orleans, Louisiana, USA, pp. 1–8. IEEE Computer Society. https://doi.org/10.1109/CLUSTR.2009.5289145
Zahavi E, Johnson G, Kerbyson DJ, Lang M (2010) Optimized InfiniBand fat-tree routing for shift all-to-all communication patterns. Concurrency and Comput: Practice and Exp 22(2):217–231. https://doi.org/10.1002/cpe.1527
Intel®omni-path fabric suite fabric manager. (2015). https://www.intel.com/content/dam/support/us/en/documents/network/omni-adptr/sb/Intel_OP_FabricSuite_Fabric_Manager_UG_H76468_v1_0.pdf
Vignéras P, Quintin J-N (2016) The bxi routing architecture for exascale supercomputer. The Journal of Supercomputing 72. https://doi.org/10.1007/s11227-016-1755-2
De Sensi D, Di Girolamo S, McMahon K, Roweth D, Hoefler T (2020) An in-depth analysis of the slingshot interconnect, pp. 1–14. https://doi.org/10.1109/SC41405.2020.00039
InfiniBand Trade Association.: InfiniBandTM Architecture Specification Volume 1 - Release 1.3 (2015)
Haramaty Z, Zahavi E, Gabbay F, Crupnicoff D, Marelli A, Bloch G (Apr 2015) Adaptive Routing Using Inter-switch Notifications. US20140211631A1
Gran EG, Zahavi E, Reinemo S, Skeie T, Shainer G, Lysne O (2011) On the relation between congestion control, switch arbitration and fairness. In: 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2011, Newport Beach, CA, USA, May 23-26, 2011, pp. 342–351. IEEE Computer Society. https://doi.org/10.1109/CCGrid.2011.67
Escudero-Sahuquillo J, Garcia PJ, Quiles FJ, Maglione-Mathey G, Duato J (2018) Feasible enhancements to congestion control in infiniband-based networks. J Parallel Distribut Comput 112:35–52. https://doi.org/10.1016/j.jpdc.2017.09.008
Yebenes P, Escudero-Sahuquillo J, García PJ, Quiles FJ (2013) Towards modeling interconnection networks of exascale systems with omnet++. In: 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2013, Belfast, United Kingdom, February 27 - March 1, 2013, pp. 203–207. https://doi.org/10.1109/PDP.2013.36
OpenSim Ltd: OMNeT++ Discrete Event Simulator
Gran EG, Zahavi E, Reinemo S-A, Skeie T, Shainer G, Lysne O (2011) On the relation between congestion control, switch arbitration and fairness. In: 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 342–351. https://doi.org/10.1109/CCGrid.2011.67
Andujar FJ, Villar JA, Alfaro FJ, Sánchez JL, Escudero-Sahuquillo J (2016) An open-source family of tools to reproduce mpi-based workloads in interconnection network simulators. J Supercomput 72(12):4601–4628
The HPCC Benchmark. http://icl.cs.utk.edu/hpcc/. http://icl.cs.utk.edu/hpcc/ Accessed 2016-12-19
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society. https://doi.org/10.1109/CVPR.2016.308
Acknowledgements
This work is part of the R&D Project Grant PID2019-109001RA-I00, funded by MCIN/AEI/10.13039/501100011033. Moreover, this work has also been jointly supported by Junta de Comunidades de Castilla-La Mancha under the project SBPLY/17/180501/000498.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that they have no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Rocher-Gonzalez, J., Escudero-Sahuquillo, J., Garcia, P.J. et al. Congestion management in high-performance interconnection networks using adaptive routing notifications. J Supercomput 79, 7804–7834 (2023). https://doi.org/10.1007/s11227-022-04926-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-022-04926-1