ABSTRACT
InfiniBand (IB) has become one of the most popular high-speed interconnects in High Performance Computing (HPC). The backpressure effect of credit-based link-layer flow control in IB introduces congestion spreading, which increases queueing delay and hurts application completion time. IB congestion control (IB CC) has been defined in IB specification to address the congestion spreading problem. Nowadays, HPC clusters are increasingly being used to run diverse workloads with a shared network infrastructure. The coexistence of messages transfers of different applications imposes great challenges to IB CC. In this paper, we re-exam IB CC through fine-grained experimental observations and reveal several fundamental problems. Inspired by our understanding and insights, we present a new receiver-driven congestion control for InfiniBand (RR CC). RR CC includes two key mechanisms: receiver-driven congestion identification and receiver-driven rate regulation, which empower eliminating both in-network congestion and endpoint congestion in one control loop. RR CC has much fewer parameters and requires no modifications to InfiniBand switches. Evaluations show that RR CC achieves better average/tail message latency and link utilization than IB CC under various scenarios.
- 2016. InfiniBand Flit Level Model. https://omnetpp.org/download-items/InfiniBand-FlitSim.htmlGoogle Scholar
- 2016. Life in the Fast Lane: InfiniBand Continues to Reign as HPC Interconnect of Choice. https://www.infinibandta.org/life-in-the-fast-lane-infiniband-continues-to-reign-as-hpc-interconnect-of-choice/Google Scholar
- 2019. The InfiniBand® Trade Association Architecture Specification, Volume 1, Version 1.3. https://cw.infinibandta.org/document/dl/7859Google Scholar
- 2019. Mellanox 40/56/100/200Gbs InfiniBand Switch System Family. https://www.mellanox.com/related-docs/products/SwitchSystem_Brochure.pdfGoogle Scholar
- 2020. Mellanox technologies. http://www.mellanox.comGoogle Scholar
- 2020. OMNeT++ Discrete Event Simulator. http://omnetpp.org/Google Scholar
- 2020. Top 500 Supercomputer Sites. https://www.top500.org/Google Scholar
- Fatma Alali, Fabrice Mizero, Malathi Veeraraghavan, and John M Dennis. 2017. A measurement study of congestion in an InfiniBand network. In 2017 Network Traffic Measurement and Analysis Conference (TMA). IEEE, 1–9.Google ScholarCross Ref
- Mohammad Alizadeh, Abdul Kabbani, Berk Atikoglu, and Balaji Prabhakar. 2011. Stability analysis of QCN: the averaging principle. In Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems. ACM, 49–60.Google ScholarDigital Library
- Kevin A Brown, Nikhil Jain, Satoshi Matsuoka, Martin Schulz, and Abhinav Bhatele. 2018. Interference between I/O and MPI Traffic on Fat-tree Networks. In Proceedings of the 47th International Conference on Parallel Processing. ACM.Google ScholarDigital Library
- Sudheer Chunduri, Taylor Groves, Peter Mendygral, Brian Austin, Jacob Balma, Krishna Kandalla, Kalyan Kumaran, Glenn Lockwood, Scott Parker, Steven Warren, Nathan Wichmann, and Nicholas Wright. 2019. GPCNeT: Designing a Benchmark Suite for Inducing and Measuring Contention in HPC Networks. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. Article 42, 33 pages.Google ScholarDigital Library
- Sudheer Chunduri, Scott Parker, Pavan Balaji, Kevin Harms, and Kalyan Kumaran. 2018. Characterization of MPI usage on a production supercomputer. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 386–400.Google ScholarDigital Library
- Jose Duato, Ian Johnson, Jose Flich, Finbar Naven, P Garcia, and Teresa Nachiondo. 2005. A new scalable and cost-effective congestion management strategy for lossless multistage interconnection networks. In 11th International Symposium on High-Performance Computer Architecture. IEEE, 108–119.Google ScholarDigital Library
- Jesus Escudero-Sahuquillo, Pedro Javier Garcia, Francisco J Quiles, Jose Flich, and Jose Duato. 2010. Cost-effective congestion management for interconnection networks using distributed deterministic routing. In 2010 IEEE 16th International Conference on Parallel and Distributed Systems. IEEE, 355–364.Google ScholarDigital Library
- Jesus Escudero-Sahuquillo, Pedro J Garcia, Francisco J Quiles, Sven-Arne Reinemo, Tor Skeie, Olav Lysne, and Jose Duato. 2014. A new proposal to deal with congestion in InfiniBand-based fat-trees. J. Parallel and Distrib. Comput. 74, 1 (2014), 1802–1819.Google ScholarDigital Library
- Jesus Escudero-Sahuquillo, Ernst Gunnar Gran, Pedro Javier Garcia, Jose Flich, Tor Skeie, Olav Lysne, Francisco Jose Quiles, and Jose Duato. 2011. Combining congested-flow isolation and injection throttling in hpc interconnection networks. In 2011 International Conference on Parallel Processing. IEEE, 662–672.Google ScholarDigital Library
- Pedro Javier García, Francisco J Quiles, Jose Flich, Jose Duato, Ian Johnson, and Finbar Naven. 2006. Efficient, scalable congestion management for interconnection networks. IEEE Micro 26, 5 (2006), 52–66.Google ScholarDigital Library
- Patrick Geoffray and Torsten Hoefler. 2008. Adaptive routing strategies for modern high performance networks. In 2008 16th IEEE Symposium on High Performance Interconnects. IEEE, 165–172.Google ScholarDigital Library
- Crispın Gomez, Francisco Gilabert, María Engracia Gomez, Pedro López, and José Duato. 2007. Deterministic versus adaptive routing in fat-trees. In 2007 IEEE International Parallel and Distributed Processing Symposium. IEEE, 1–8.Google ScholarCross Ref
- Ernst Gunnar Gran, Magne Eimot, Sven-Arne Reinemo, Tor Skeie, Olav Lysne, Lars Paul Huse, and Gilad Shainer. 2010. First experiences with congestion control in InfiniBand hardware. In 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS). IEEE, 1–12.Google ScholarCross Ref
- Ernst Gunnar Gran and Sven-Arne Reinemo. 2011. InfiniBand congestion control: modelling and validation. In Proceedings of the 4th International ICST Conference on Simulation Tools and Techniques. 390–397.Google ScholarCross Ref
- Ernst Gunnar Gran, Sven-Arne Reinemo, Olav Lysne, Tor Skeie, Eitan Zahavi, and Gilad Shainer. 2012. Exploring the scope of the InfiniBand congestion control mechanism. In 2012 IEEE 26th International Parallel and Distributed Processing Symposium. IEEE, 1131–1143.Google ScholarDigital Library
- Paul Gratz, Boris Grot, and Stephen W Keckler. 2008. Regional congestion awareness for load balance in networks-on-chip. In 2008 IEEE 14th International Symposium on High Performance Computer Architecture. IEEE, 203–214.Google ScholarCross Ref
- Wei Lin Guay, Bartosz Bogdanski, Sven-Arne Reinemo, Olav Lysne, and Tor Skeie. 2011. vFtree-a fat-tree routing algorithm using virtual lanes to alleviate congestion. In 2011 IEEE International Parallel & Distributed Processing Symposium. IEEE, 197–208.Google ScholarDigital Library
- Saurabh Jha, Archit Patke, Jim Brandt, Ann Gentile, Benjamin Lim, Mike Showerman, Greg Bauer, Larry Kaplan, Zbigniew Kalbarczyk, William Kramer, and Ravi Iyer. 2020. Measuring Congestion in High-Performance Datacenter Interconnects. In 17th USENIX Symposium on Networked Systems Design and Implementation (NSDI 20). Santa Clara, CA, 37–57.Google Scholar
- Saurabh Jha, Archit Patke, Jim Brandt, Ann Gentile, Mike Showerman, Eric Roman, Zbigniew T Kalbarczyk, William T Kramer, and Ravishankar K Iyer. 2019. A Study of Network Congestion in Two Supercomputing High-Speed Interconnects. arXiv preprint arXiv:1907.05312(2019).Google Scholar
- Nan Jiang, Daniel U Becker, George Michelogiannakis, and William J Dally. 2012. Network congestion avoidance through speculative reservation. In IEEE International Symposium on High-Performance Comp Architecture. IEEE, 1–12.Google ScholarDigital Library
- Nan Jiang, Larry Dennison, and William J Dally. 2015. Network endpoint congestion control for fine-grained communication. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1–12.Google ScholarDigital Library
- Qian Liu, Robert D Russell, and Ernst Gunnar Gran. 2016. Improvements to the InfiniBand congestion control mechanism. In 2016 IEEE 24th Annual Symposium on High-Performance Interconnects (HOTI). IEEE, 27–36.Google ScholarCross Ref
- Fabrice Mizero, Malathi Veeraraghavan, Qian Liu, Robert D Russell, and John M Dennis. 2016. A dynamic congestion management system for InfiniBand networks. Supercomputing frontiers and innovations 3, 2 (2016), 5–20.Google Scholar
- Misbah Mubarak, Philip Carns, Jonathan Jenkins, Jianping Kelvin Li, Nikhil Jain, Shane Snyder, Robert Ross, Christopher D Carothers, Abhinav Bhatele, and Kwan-Liu Ma. 2017. Quantifying i/o and communication traffic interference on dragonfly networks equipped with burst buffers. In 2017 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 204–215.Google ScholarCross Ref
- Rong Pan, Balaji Prabhakar, and Ashvin Laxmikantha. 2007. QCN: Quantized congestion notification. IEEE802 1(2007).Google Scholar
- G Pfister, Mitchell Gusat, Wolfgang Denzel, David Craddock, Nan Ni, W Rooney, Ton Engbersen, Ronald Luijten, Rajasekar Krishnamurthy, and Jose Duato. 2005. Solving hot spot contention using infiniband architecture congestion control. Proceedings HP-IPC 2005(2005), 6.Google Scholar
- Arjun Singh. 2005. Load-balanced routing in interconnection networks. Ph.D. Dissertation. Stanford University.Google Scholar
- Staci A Smith, Clara E Cromey, David K Lowenthal, Jens Domke, Nikhil Jain, Jayaraman J Thiagarajan, and Abhinav Bhatele. 2018. Mitigating inter-job interference using adaptive flow-aware routing. In SC18: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 346–360.Google ScholarDigital Library
- Philip Taffet and John Mellor-Crummey. 2019. Understanding congestion in high performance interconnection networks using sampling. In SC19: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. 1–24.Google ScholarDigital Library
- Ke Wu, Dezun Dong, Cunlu Li, Shan Huang, and Yi Dai. 2019. Network congestion avoidance through packet-chaining reservation. In Proceedings of the 48th International Conference on Parallel Processing. 1–10.Google ScholarDigital Library
Recommendations
InfiniBand congestion control: modelling and validation
SIMUTools '11: Proceedings of the 4th International ICST Conference on Simulation Tools and TechniquesIn a lossless interconnection network congestion may results in performance degradation if no countermeasure is taken. To relieve the consequences of congestion, and by that to achieve good utilization of networks resources even at high network load, ...
Unreliable transport protocol using congestion control for high-speed networks
Currently there is no control for the real-time traffic of multimedia applications using UDP (User Datagram Protocol) in high-speed networks. Therefore, although a number of high-speed TCP (Transmission Control Protocol) protocols have been developed ...
Receiver-centric congestion control with a misbehaving receiver: Vulnerabilities and end-point solutions
Receiver-driven TCP protocols delegate key congestion control functions to receivers. Their goal is to exploit information available only at receivers in order to improve latency and throughput in diverse scenarios ranging from wireless access links to ...
Comments