Skip to main content
Log in

Congestion management in high-performance interconnection networks using adaptive routing notifications

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The interconnection network is a crucial subsystem in High-Performance Computing clusters and Data-centers, guaranteeing high bandwidth and low latency to the applications’ communication operations. Unfortunately, congestion situations may spoil network performance unless the network design applies specific countermeasures. Adaptive routing algorithms are a traditional approach to dealing with congestion since they provide traffic flows with alternative routes that bypass congested areas. However, adaptive routing decisions at switches are typically based on local information without a global network traffic perspective, leading to congestion spreading throughout the network beyond the original congested areas. In this paper, we propose a new efficient congestion management strategy that leverages adaptive routing notifications currently available in some interconnect technologies and efficiently isolates the congesting flows in reserved spaces at switch buffers. The experiment results based on simulations of realistic traffic scenarios show that our proposal removes the congestion impact.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data Availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Notes

  1. Note that the term CLOS is currently employed, mainly in Data-centers context, to refer to topologies similar (if not identical) to Fat Trees. Although the original CLOS network was a three-stage non-blocking unidirectional topology, the fact that a Fat Tree can be built by recursively applying the CLOS construction method (followed by folding) led to using the terms CLOS and Fat Tree interchangeably.

  2. Note that a link failure may also generate a congestion situation, as the network bisection bandwidth decreases due to this failure.

  3. Thanks to the ARNs mechanism, the switches also monitor the switch link status to detect link failures. Hereafter, we will only focus on the congestion monitoring functionality.

  4. We assume the same criterion as defined in Sect. A10.1.2 of the InfiniBand specification.

References

  1. Leiserson CE (1985) Fat-trees: Universal networks for hardware-efficient supercomputing. IEEE Trans Comput C–34(10):892–901. https://doi.org/10.1109/TC.1985.6312192

    Article  Google Scholar 

  2. Requena CG, Villamón FG, Gómez ME, López P, Duato J (2007) Deterministic versus Adaptive Routing in Fat-Trees. In: 21th International Parallel and Distributed Processing Symposium (IPDPS) 2007, Proceedings, 26-30 March 2007, Long Beach, California, USA, pp. 1–8. IEEE. https://doi.org/10.1109/IPDPS.2007.370482

  3. Zahavi E, Johnson G, Kerbyson DJ, Lang M (2010) Optimized InfiniBand\(^{\text{ TM }}\) fat-tree routing for shift all-to-all communication patterns. J CCPE 22(2):217–231

    Google Scholar 

  4. Rodriguez G, Minkenberg C, Beivide R, Luijten RP, Labarta J, Valero M (2009) Oblivious routing schemes in extended generalized fat tree networks. In: 2009 IEEE International Conference on Cluster Computing and Workshops, pp. 1–8. https://doi.org/10.1109/CLUSTR.2009.5289145

  5. Zahavi E, Keslassy I, Kolodny A (2014) Distributed adaptive routing convergence to non-blocking DCN routing assignments. IEEE J Sel Areas Commun 32(1):88–101. https://doi.org/10.1109/JSAC.2014.140109

    Article  Google Scholar 

  6. Geoffray P, Hoefler T (2008) Adaptive routing strategies for modern high performance networks. In: 16th Annual IEEE Symposium on High Performance Interconnects (HOTI 2008), 26-28 August 2008, Stanford, CA, USA, pp. 165–172. IEEE Computer Society. https://doi.org/10.1109/HOTI.2008.21

  7. Kim J, Dally WJ, Abts D (2006) Interconnect routing and scheduling - adaptive routing in high-radix clos network. In: Proceedings of the ACM/IEEE SC2006 Conference on High Performance Networking and Computing, November 11-17, 2006, Tampa, FL, USA, p. 92. ACM Press. https://doi.org/10.1145/1188455.1188552

  8. Gratz P, Grot B, Keckler SW (2008) Regional congestion awareness for load balance in networks-on-chip. In: 2008 IEEE 14th International Symposium on High Performance Computer Architecture, pp. 203–214. https://doi.org/10.1109/HPCA.2008.4658640

  9. Ma S, Jerger NE, Wang Z (2011) Dbar: An efficient routing algorithm to support multiple concurrent applications in networks-on-chip. In: 2011 38th Annual International Symposium on Computer Architecture (ISCA), pp. 413–424

  10. Hopps CE (2000) Analysis of an equal-cost multi-path algorithm. RFC 2992, 1–8. https://doi.org/10.17487/RFC2992

  11. He K, Rozner E, Agarwal K, Felter W, Carter JB, Akella A (2015) Presto: Edge-based load balancing for fast datacenter networks. In: Uhlig S, Maennel O, Karp B, Padhye J (eds.) Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication, SIGCOMM 2015, London, United Kingdom, August 17-21, 2015, pp. 465–478. ACM. https://doi.org/10.1145/2785956.2787507

  12. Ghorbani S, Yang Z, Godfrey PB, Ganjali Y, Firoozshahian A (2017) DRILL: micro load balancing for low-latency data center networks. In: Proceedings of the Conference of the ACM Special Interest Group on Data Communication, SIGCOMM 2017, Los Angeles, CA, USA, August 21-25, 2017, pp. 225–238. ACM. https://doi.org/10.1145/3098822.3098839

  13. Wang S, Luo J, Wong WS (2018) Improved power of two choices for fat-tree routing. IEEE Trans Netw Serv Manage 15(4):1706–1719. https://doi.org/10.1109/TNSM.2018.2865543

    Article  Google Scholar 

  14. Besta M, Hoefler T (2014) Slim Fly: A Cost Effective Low-Diameter Network Topology. In: Damkroger T, Dongarra J (eds.) International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014, New Orleans, LA, USA, November 16-21, 2014, pp. 348–359. IEEE. https://doi.org/10.1109/SC.2014.34

  15. Singla A, Hong C, Popa L, Godfrey PB (2012) Jellyfish: Networking data centers randomly. In: Gribble SD, Katabi D (eds.) Proceedings of the 9th USENIX Symposium on Networked Systems Design and Implementation, NSDI 2012, San Jose, CA, USA, April 25-27, 2012, pp. 225–238. USENIX Association. https://www.usenix.org/conference/nsdi12/technical-sessions/presentation/singla

  16. Kim J, Dally WJ, Scott S, Abts D (2008) Technology-Driven, Highly-Scalable Dragonfly Topology. In: 35th International Symposium on Computer Architecture (ISCA) 2008, June 21-25, 2008, Beijing, China, pp. 77–88. IEEE Computer Society. https://doi.org/10.1109/ISCA.2008.19

  17. Valiant LG (1982) A scheme for fast parallel communication. SIAM J Comput 11(2):350–361. https://doi.org/10.1137/0211027

    Article  MathSciNet  MATH  Google Scholar 

  18. Jiang N, Kim J, Dally WJ (2009) Indirect adaptive routing on large scale interconnection networks. In: 36th International Symposium on Computer Architecture (ISCA 2009), June 20-24, 2009, Austin, TX, USA, pp. 220–231. https://doi.org/10.1145/1555754.1555783

  19. Newaz MN, Mollah MA, Faizian P, Tong Z (2021) Improving adaptive routing performance on large scale megafly topology. In: The 21st IEEE/ACM International Symposium on Cluster, Cloud and Internet Computing, CCGrid, May 10-13, 2021, Melbourne, Victoria, Australia, p. 1. IEEE/ACM

  20. Besta M, Domke J, Schneider M, Konieczny M, Girolamo SD, Schneider T, Singla A, Hoefler T (2021) High-performance routing with multipathing and path diversity in ethernet and HPC networks. IEEE Trans Parallel Distribut Syst 32(4):943–959. https://doi.org/10.1109/TPDS.2020.3035761

    Article  Google Scholar 

  21. Guo L, Congdon P (2021) Ieee 802 nendica report: Intelligent lossless data center networks. IEEE SA Industry Connections–IEEE 802 Nendica Report: Intelligent Lossless Data Center Networks, 1–44

  22. García PJ, Flich J, Duato J, Johnson I, Quiles FJ, Naven F (2005) Dynamic evolution of congestion trees: Analysis and impact on switch architecture. In: High Performance Embedded Architectures and Compilers, First International Conference, HiPEAC 2005, Barcelona, Spain, November 17-18, 2005, Proceedings, pp. 266–285. https://doi.org/10.1007/11587514_18

  23. Karol MJ, Hluchyj MG, Morgan SP (1987) Input versus output queueing on a space-division packet switch. IEEE Trans Commun 35(12):1347–1356. https://doi.org/10.1109/TCOM.1987.1096719

    Article  Google Scholar 

  24. Yoshigoe K (2007) Threshold-based exhaustive round-robin for the cicq switch with virtual crosspoint queues. In: 2007 IEEE International Conference on Communications, pp. 6325–6329. https://doi.org/10.1109/ICC.2007.1047

  25. Jurczyk M, Schwederski T (1996) Phenomenon of Higher Order Head-of-Line Blocking in Multistage Interconnection Networks under Nonuniform Traffic Patterns

  26. Rocher-Gonzalez J, Escudero-Sahuquillo J, García PJ, Quiles FJ, Mora G (2021) Towards an efficient combination of adaptive routing and queuing schemes in fat-tree topologies. J Parallel Distribut Comput 147:46–63. https://doi.org/10.1016/j.jpdc.2020.07.009

    Article  Google Scholar 

  27. Rocher-Gonzalez J, Escudero-Sahuquillo J, García PJ, Flor FJQ, Mora G (2019) Efficient congestion management for high-speed interconnects using adaptive routing. In: 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2019, Larnaca, Cyprus, May 14-17, 2019, pp. 221–230. IEEE. https://doi.org/10.1109/CCGRID.2019.00036

  28. Nachiondo T, Flich J, Duato J (2010) Buffer management strategies to reduce HoL blocking. Parallel and Distribut Syst IEEE Trans 21(6):739–753. https://doi.org/10.1109/TPDS.2009.63

    Article  Google Scholar 

  29. Guay WL, Bogdanski B, Reinemo S, Lysne O, Skeie T (2011) vFtree - A Fat-Tree Routing Algorithm Using Virtual Lanes to Alleviate Congestion. In: 25th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2011, Anchorage, Alaska, USA, 16-20 May, 2011 - Conference Proceedings, pp. 197–208. https://doi.org/10.1109/IPDPS.2011.28

  30. Escudero-Sahuquillo J, García PJ, Quiles FJ, Reinemo S, Skeie T, Lysne O, Duato J (2014) A new proposal to deal with congestion in InfiniBand-based fat-trees. J Parallel Distrib Comput 74(1):1802–1819. https://doi.org/10.1016/j.jpdc.2013.09.002

    Article  Google Scholar 

  31. Mellanox: NVIDIA MELLANOX QUANTUM -PRODUCT BRIEF. https://network.nvidia.com/sites/default/files/doc-2020/pb-quantum-hdr-switch-silicon.pdf Accessed 2020-09-15

  32. Mellanox: How To Configure Adaptive Routing and SHIELD (New). https://support.mellanox.com/s/article/How-To-Configure-Adaptive-Routing-and-Self-Healing-Networking-New Accessed 2021-08-05

  33. Zahavi E (2011) Fat-trees routing and node ordering providing contention free traffic for MPI global collectives. In: 25th IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2011, Anchorage, Alaska, USA, 16-20 May 2011 - Workshop Proceedings, pp. 761–770. IEEE. https://doi.org/10.1109/IPDPS.2011.219

  34. Rodríguez G, Minkenberg C, Beivide R, Luijten RP, Labarta J, Valero M (2009) Oblivious routing schemes in extended generalized fat tree networks. In: Proceedings of the 2009 IEEE International Conference on Cluster Computing, August 31 - September 4, 2009, New Orleans, Louisiana, USA, pp. 1–8. IEEE Computer Society. https://doi.org/10.1109/CLUSTR.2009.5289145

  35. Zahavi E, Johnson G, Kerbyson DJ, Lang M (2010) Optimized InfiniBand fat-tree routing for shift all-to-all communication patterns. Concurrency and Comput: Practice and Exp 22(2):217–231. https://doi.org/10.1002/cpe.1527

    Article  Google Scholar 

  36. Intel®omni-path fabric suite fabric manager. (2015). https://www.intel.com/content/dam/support/us/en/documents/network/omni-adptr/sb/Intel_OP_FabricSuite_Fabric_Manager_UG_H76468_v1_0.pdf

  37. Vignéras P, Quintin J-N (2016) The bxi routing architecture for exascale supercomputer. The Journal of Supercomputing 72. https://doi.org/10.1007/s11227-016-1755-2

  38. De Sensi D, Di Girolamo S, McMahon K, Roweth D, Hoefler T (2020) An in-depth analysis of the slingshot interconnect, pp. 1–14. https://doi.org/10.1109/SC41405.2020.00039

  39. InfiniBand Trade Association.: InfiniBandTM Architecture Specification Volume 1 - Release 1.3 (2015)

  40. Haramaty Z, Zahavi E, Gabbay F, Crupnicoff D, Marelli A, Bloch G (Apr 2015) Adaptive Routing Using Inter-switch Notifications. US20140211631A1

  41. Gran EG, Zahavi E, Reinemo S, Skeie T, Shainer G, Lysne O (2011) On the relation between congestion control, switch arbitration and fairness. In: 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGrid 2011, Newport Beach, CA, USA, May 23-26, 2011, pp. 342–351. IEEE Computer Society. https://doi.org/10.1109/CCGrid.2011.67

  42. Escudero-Sahuquillo J, Garcia PJ, Quiles FJ, Maglione-Mathey G, Duato J (2018) Feasible enhancements to congestion control in infiniband-based networks. J Parallel Distribut Comput 112:35–52. https://doi.org/10.1016/j.jpdc.2017.09.008

    Article  Google Scholar 

  43. Yebenes P, Escudero-Sahuquillo J, García PJ, Quiles FJ (2013) Towards modeling interconnection networks of exascale systems with omnet++. In: 21st Euromicro International Conference on Parallel, Distributed, and Network-Based Processing, PDP 2013, Belfast, United Kingdom, February 27 - March 1, 2013, pp. 203–207. https://doi.org/10.1109/PDP.2013.36

  44. OpenSim Ltd: OMNeT++ Discrete Event Simulator

  45. Gran EG, Zahavi E, Reinemo S-A, Skeie T, Shainer G, Lysne O (2011) On the relation between congestion control, switch arbitration and fairness. In: 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 342–351. https://doi.org/10.1109/CCGrid.2011.67

  46. Andujar FJ, Villar JA, Alfaro FJ, Sánchez JL, Escudero-Sahuquillo J (2016) An open-source family of tools to reproduce mpi-based workloads in interconnection network simulators. J Supercomput 72(12):4601–4628

    Article  Google Scholar 

  47. The HPCC Benchmark. http://icl.cs.utk.edu/hpcc/. http://icl.cs.utk.edu/hpcc/ Accessed 2016-12-19

  48. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016, pp. 2818–2826. IEEE Computer Society. https://doi.org/10.1109/CVPR.2016.308

Download references

Acknowledgements

This work is part of the R&D Project Grant PID2019-109001RA-I00, funded by MCIN/AEI/10.13039/501100011033. Moreover, this work has also been jointly supported by Junta de Comunidades de Castilla-La Mancha under the project SBPLY/17/180501/000498.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jose Rocher-Gonzalez.

Ethics declarations

Conflict of interest

All authors declare that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rocher-Gonzalez, J., Escudero-Sahuquillo, J., Garcia, P.J. et al. Congestion management in high-performance interconnection networks using adaptive routing notifications. J Supercomput 79, 7804–7834 (2023). https://doi.org/10.1007/s11227-022-04926-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-022-04926-1

Keywords

Navigation