Abstract
Modern high-speed interconnection networks include support for the provision of quality of service (QoS) to the applications. The output scheduling algorithm plays an important role in the QoS provision, choosing the packets to be delivered from the output buffers. InfiniBand, one of the most used interconnection technologies, includes a table-based scheduler composed of a high- and a low-priority tables, and a counter limiting the number of high priority traffic flows that may be delivered before giving the opportunity to low priority ones. Therefore, the performance of the traffic flows in the network largely depends on the table configuration since the switch scheduler uses this information to allow/deny packets being forwarded, according to the QoS provision scheme. As far as we know, there is no study on the influence of these configurations to the traffic flows performance. In this paper, we present an offline analysis tool to accurately determine the expected end-to-end latency and bandwidth of the traffic flows in an InfiniBand-based network using the information contained in the high- and low-priority tables. Moreover, we present a methodology to aid network administrators in configuring the QoS provision in a real InfiniBand cluster. Finally, we evaluate the analysis tool, comparing its results with those obtained from a real cluster and from simulation.





Similar content being viewed by others
Notes
A VL is active when it stores packets and has credits to send at least one packet.
References
Ahn JH, Son YH, Kim J (2013) Scalable high-radix router microarchitecture using a network switch organization. ACM Trans Archit Code Optim (TACO) 10(3):17
Alfaro FJ, Sánchez JL, Duato J (2004) QoS in InfiniBand subnetworks. IEEE Trans Paral Distrib Syst 15(9):810–823
Alfaro FJ, Sánchez JL, Orozco L, Duato J (2003) Providing QoS in InfiniBand for regular and irregular topologies. In: CCECE 2003-Canadian Conference on Electrical and Computer Engineering. Toward a Caring and Humane Technology (Cat. No. 03CH37436), vol 2, pp 1079–1082. IEEE
Birrittella MS et al (2015) Intel® Omni-Path Architecture: Enabling scalable, high performance fabrics. In: IEEE 23rd Annual Symposium on High-Performance Interconnects (HOTI), 2015, pp 1–9. IEEE
Cano-Cano J, Andújar FJ, Alfaro-Cortés FJ, Sánchez JL (2021) QoS provision in hierarchical and non-hierarchical switch architectures. J Paral Distrib Comput 148:138–150
Crupnicoff D, Das S, Zahavi E (2005) Deploying quality of service and congestion control in InfiniBand-based data center networks. Mellanox Technologies
Demers A, Keshav S, Shenker S (1989) Analysis and simulation of a fair queueing algorithm. ACM SIGCOMM Comput Commun Rev 19(4):1–12
Greenberg AG, Madras N (1992) How fair is fair queuing. J ACM (JACM) 39(3):568–598
Hiperion repository homepage. https://gitraap.i3a.info/fandujar/hiperion. Accessed 23 Oct 2020
InfiniBand Trade Association, et al (2020) InfiniBand architecture specification release 1.4. http://www.infinibandta.org
Keyes DE (2011) Exaflop/s: the why and the how. Compt Rend Mécanique 339(2–3):70–77
Martínez R, Alfaro FJ, Sánchez JL (2006) Decoupling the bandwidth and latency bounding for table-based schedulers. In: Proceedings of the 2006 International Conference on Parallel Processing (ICPP’06), pp 155–163. IEEE
Martínez R, Alfaro FJ, Sánchez JL (2009) Providing QoS with the deficit table scheduler. IEEE Trans Paral Distrib Syst 21(3):327–341
OpenSM Mellanox homepage. https://bit.ly/2ZC8EKD. Accessed 21 Aug 2020
Perftest Package homepage. https://community.mellanox.com/s/article/perftest-package. Accessed 21 Aug 2020
Pfister GF (2001) An introduction to the InfiniBand architecture. High Perform Mass Storage Paral I/O 42:617–632
RDMA aware networks programming user manual. https://bit.ly/2FDwvlX
Savoie L (2019) Inter-job optimization in high performance computing
Savoie L, Lowenthal DK, De Supinski BR, Mohror K, Jain N (2019) Mitigating inter-job interference via process-level quality-of-service. In: Proceedings of the 2019 IEEE International Conference on Cluster Computing (CLUSTER), pp 1–5. IEEE
Seifert R (1998) Gigabit ethernet: technology and applications for high speed LANs. Addison-Wesley Reading, Massachusetts
Sivaraman V (2000) End-to-end delay service in high-speed packet networks using earliest deadline first scheduling. University of California, Los Angeles
Souza A, Pelckmans K, Tordsson J (2020) A HPC Co-Scheduler with Reinforcement Learning
TOP500 homepage. https://www.top500.org. Accessed 20 Jan 2021
Valiant LG (1982) A scheme for fast parallel communication. SIAM J Comput 11(2):350–361
Yébenes P, Escudero-Sahuquillo J, Requena CG, García PJ, Alfaro FJ, Quiles FJ, Duato J (2014) Combining HoL-blocking avoidance and differentiated services in high-speed interconnects. In: Proceedings of the 21st International Conference on High Performance Computing, HiPC 2014, Goa, India, December 17–20, 2014, pp 1–10. IEEE Computer Society
Acknowledgements
This work has been supported by the Junta de Comunidades de Castilla-La Mancha, European Commission (FEDER funds) and Ministerio de Ciencia, Innovación y Universidades under projects SBPLY/17/180501/000498 and RTI2018-098156-B-C52, respectively. It is also co-financed by the University of Castilla-La Mancha and Fondo Europeo de Desarrollo Regional funds under project 2019-GRIN-27060. Javier Cano-Cano is also funded by the MINECO under FPI grant BES-2016-078800.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Cano-Cano, J., Andújar, F.J., Escudero-Sahuquillo, J. et al. A methodology to enable QoS provision on InfiniBand hardware. J Supercomput 77, 9934–9946 (2021). https://doi.org/10.1007/s11227-021-03667-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-021-03667-x