Skip to main content
Log in

A methodology to enable QoS provision on InfiniBand hardware

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Modern high-speed interconnection networks include support for the provision of quality of service (QoS) to the applications. The output scheduling algorithm plays an important role in the QoS provision, choosing the packets to be delivered from the output buffers. InfiniBand, one of the most used interconnection technologies, includes a table-based scheduler composed of a high- and a low-priority tables, and a counter limiting the number of high priority traffic flows that may be delivered before giving the opportunity to low priority ones. Therefore, the performance of the traffic flows in the network largely depends on the table configuration since the switch scheduler uses this information to allow/deny packets being forwarded, according to the QoS provision scheme. As far as we know, there is no study on the influence of these configurations to the traffic flows performance. In this paper, we present an offline analysis tool to accurately determine the expected end-to-end latency and bandwidth of the traffic flows in an InfiniBand-based network using the information contained in the high- and low-priority tables. Moreover, we present a methodology to aid network administrators in configuring the QoS provision in a real InfiniBand cluster. Finally, we evaluate the analysis tool, comparing its results with those obtained from a real cluster and from simulation.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Notes

  1. A VL is active when it stores packets and has credits to send at least one packet.

References

  1. Ahn JH, Son YH, Kim J (2013) Scalable high-radix router microarchitecture using a network switch organization. ACM Trans Archit Code Optim (TACO) 10(3):17

    Google Scholar 

  2. Alfaro FJ, Sánchez JL, Duato J (2004) QoS in InfiniBand subnetworks. IEEE Trans Paral Distrib Syst 15(9):810–823

    Article  Google Scholar 

  3. Alfaro FJ, Sánchez JL, Orozco L, Duato J (2003) Providing QoS in InfiniBand for regular and irregular topologies. In: CCECE 2003-Canadian Conference on Electrical and Computer Engineering. Toward a Caring and Humane Technology (Cat. No. 03CH37436), vol 2, pp 1079–1082. IEEE

  4. Birrittella MS et al (2015) Intel® Omni-Path Architecture: Enabling scalable, high performance fabrics. In: IEEE 23rd Annual Symposium on High-Performance Interconnects (HOTI), 2015, pp 1–9. IEEE

  5. Cano-Cano J, Andújar FJ, Alfaro-Cortés FJ, Sánchez JL (2021) QoS provision in hierarchical and non-hierarchical switch architectures. J Paral Distrib Comput 148:138–150

    Article  Google Scholar 

  6. Crupnicoff D, Das S, Zahavi E (2005) Deploying quality of service and congestion control in InfiniBand-based data center networks. Mellanox Technologies

  7. Demers A, Keshav S, Shenker S (1989) Analysis and simulation of a fair queueing algorithm. ACM SIGCOMM Comput Commun Rev 19(4):1–12

    Article  Google Scholar 

  8. Greenberg AG, Madras N (1992) How fair is fair queuing. J ACM (JACM) 39(3):568–598

    Article  MathSciNet  Google Scholar 

  9. Hiperion repository homepage. https://gitraap.i3a.info/fandujar/hiperion. Accessed 23 Oct 2020

  10. InfiniBand Trade Association, et al (2020) InfiniBand architecture specification release 1.4. http://www.infinibandta.org

  11. Keyes DE (2011) Exaflop/s: the why and the how. Compt Rend Mécanique 339(2–3):70–77

    Article  Google Scholar 

  12. Martínez R, Alfaro FJ, Sánchez JL (2006) Decoupling the bandwidth and latency bounding for table-based schedulers. In: Proceedings of the 2006 International Conference on Parallel Processing (ICPP’06), pp 155–163. IEEE

  13. Martínez R, Alfaro FJ, Sánchez JL (2009) Providing QoS with the deficit table scheduler. IEEE Trans Paral Distrib Syst 21(3):327–341

    Article  Google Scholar 

  14. OpenSM Mellanox homepage. https://bit.ly/2ZC8EKD. Accessed 21 Aug 2020

  15. Perftest Package homepage. https://community.mellanox.com/s/article/perftest-package. Accessed 21 Aug 2020

  16. Pfister GF (2001) An introduction to the InfiniBand architecture. High Perform Mass Storage Paral I/O 42:617–632

    Google Scholar 

  17. RDMA aware networks programming user manual. https://bit.ly/2FDwvlX

  18. Savoie L (2019) Inter-job optimization in high performance computing

  19. Savoie L, Lowenthal DK, De Supinski BR, Mohror K, Jain N (2019) Mitigating inter-job interference via process-level quality-of-service. In: Proceedings of the 2019 IEEE International Conference on Cluster Computing (CLUSTER), pp 1–5. IEEE

  20. Seifert R (1998) Gigabit ethernet: technology and applications for high speed LANs. Addison-Wesley Reading, Massachusetts

    Google Scholar 

  21. Sivaraman V (2000) End-to-end delay service in high-speed packet networks using earliest deadline first scheduling. University of California, Los Angeles

    Google Scholar 

  22. Souza A, Pelckmans K, Tordsson J (2020) A HPC Co-Scheduler with Reinforcement Learning

  23. TOP500 homepage. https://www.top500.org. Accessed 20 Jan 2021

  24. Valiant LG (1982) A scheme for fast parallel communication. SIAM J Comput 11(2):350–361

    Article  MathSciNet  Google Scholar 

  25. Yébenes P, Escudero-Sahuquillo J, Requena CG, García PJ, Alfaro FJ, Quiles FJ, Duato J (2014) Combining HoL-blocking avoidance and differentiated services in high-speed interconnects. In: Proceedings of the 21st International Conference on High Performance Computing, HiPC 2014, Goa, India, December 17–20, 2014, pp 1–10. IEEE Computer Society

Download references

Acknowledgements

This work has been supported by the Junta de Comunidades de Castilla-La Mancha, European Commission (FEDER funds) and Ministerio de Ciencia, Innovación y Universidades under projects SBPLY/17/180501/000498 and RTI2018-098156-B-C52, respectively. It is also co-financed by the University of Castilla-La Mancha and Fondo Europeo de Desarrollo Regional funds under project 2019-GRIN-27060. Javier Cano-Cano is also funded by the MINECO under FPI grant BES-2016-078800.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Javier Cano-Cano.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cano-Cano, J., Andújar, F.J., Escudero-Sahuquillo, J. et al. A methodology to enable QoS provision on InfiniBand hardware. J Supercomput 77, 9934–9946 (2021). https://doi.org/10.1007/s11227-021-03667-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-021-03667-x

Keywords

Navigation