Abstract
A key element in any system based on several interconnected computing and/or storage nodes is the interconnection network. Currently, one of the main concerns of high-speed interconnection network designers is how to improve network performance while using the minimum number of network resources. In that sense, in this paper we describe an efficient switch architecture suitable for any interconnect technology implementing deterministic source-based routing. This switch architecture uses the same network resources to provide two issues that improve network performance: Congestion Management and QoS support. We also present results to compare the effectiveness of this architecture to those of other proposals typically used to provide these issues in this context. These results have been obtained for synthetic traffic and for traces from parallel benchmarks and video frames. From the results, we can conclude that in any traffic scenario, our proposal is as effective as the previous ones, while requiring fewer resources and thus being much more cost-effective.
Similar content being viewed by others
Notes
We assume for the sake of simplicity 8-port switches and four SLs. Anyway, architectures with a different number of ports or SLs could be easily deducted.
We use the term flit as the generic flow-control unit, but note each interconnect technology may define its own flit size.
These values are adequate to monitor instantaneous traffic behavior.
References
Anderson T, Owicki S, Saxe J, Thacker C (1993) High-speed switch scheduling for local-area networks. ACM Trans Comput Syst 11(4):319–352
AS (2003) Advanced switching core architecture specification. Revision 1.0. Advanced Switching Interconnect Special Interest Group
Advanced Scientific Computing Advisory Committee (ASCAC) (2010) The opportunities and challenges of exascale computing. Tech rep, US Department of Energy
Berejuck MD, Zeferino CA (2009) Adding mechanisms for QoS to a network-on-chip. In: Proceedings of the 22nd annual symposium on integrated circuits and system design: chip on the dunes (SBCCI). ACM, New York, pp 1–6
Calyam P, Lee C (2005) Characterizing voice and video traffic behavior over the Internet. In: Proceedings of the international symposium on computer and information sciences (ISCIS). Advances in computer science and engineering. Imperial College Press, London. Special edition book series
Chrysos N, Katevenis M (2004) Multiple priorities in a two-lane buffered crossbar. In: Proceedings of the IEEE Globecom 2004 conference, CR-ROM, paper ID GE15-3
Chrysos NI (2007) Congestion management for non-blocking Clos networks. In: Proceedings of the third ACM/IEEE symposium on architecture for networking and communications systems (ANCS), Orlando, Florida, USA, pp 117–126
Dally W, Carvey P, Dennison L (1998) Architecture of the Avici Terabit switch/router. In: Proceedings of the 6th symposium on high performance interconnections (HOTI), pp 41–50
Dally WJ, Aoki H (1993) Deadlock-free adaptive routing in multicomputer networks using virtual channels. IEEE Trans Parallel Distrib Syst 4(4):466–475
Dongarra JJ (2011) Performance of various computers using standard linear equations software. Tech rep CS-89-85, University of Tennessee, Knoxville, TN, USA. http://www.netlib.org/benchmark/performance.ps
Dongarra JJ, Meuer HW, Strohmaier E (2011) TOP500 supercomputer sites. http://www.top500.org
Escudero-Sahuquillo J, García PJ, Quiles FJ, Duato J (2010) An efficient strategy for reducing head-of-line blocking in fat-trees. In: Euro-Par 2010 parallel processing. Lecture notes in computer science, vol 6272. Springer, Berlin, pp 413–427
Escudero-Sahuquillo J, Gran E, García P, Flich J, Skeie T, Lysne O, Quiles F, Duato J (2011) Combining congested-flow isolation and injection throttling in HPC interconnection networks. In: Proceedings of the 40th international conference on parallel processing (ICPP), Taipei, Taiwan, pp 662–672
García PJ, Flich J, Duato J, Johnson I, Quiles FJ, Naven F (2006) Efficient, scalable congestion management for interconnection networks. IEEE MICRO 26(5):52–66
Gran EG, Reinemo SA, Lysne O, Skeie T, Zahavi E, Shainer G (2012) Exploring the Scope of the InfiniBand Congestion Control Mechanism. In: IPDPS pp 1131–1143
Gratz P, Grot B, Keckler SW (2008) Regional congestion awareness for load balance in networks-on-chip. In: HPCA, pp 203–214
Guay WL, Bogdanski B, Reinemo SA, Lysne O, Skeie T (2011) vFtree—a fat-tree routing algorithm using virtual lanes to alleviate congestion. In: IPDPS, pp 197–208
Gusat M, Craddock D, Denzel W, Engbersen A, Ni N, Pfister G, Rooney W, Duato J (2005) Congestion control in InfiniBand networks. In: Proceedings of the 13th symposium on high performance interconnects (HOTI), pp 158–159
IBA (2007) InfiniBand architecture specification. Vol 1. Release 1.2.1. InfiniBand Trade Association
IEEE802.1D (1998) IEEE 802.1D information technology—telecommunications and information exchange between systems—local and metropolitan area networks—common specifications. Part 3. Media access control (MAC) bridges. IEEE 802.1 Bridging & Management Working Group
Jurczyk M, Schwederski T (1996) Phenomenon of higher order head-of-line blocking in multistage interconnection networks under nonuniform traffic patterns. IEICE Trans Inf Syst E79-D(8):1124–1129
Katevenis M, Sidiropoulos S, Courcoubetis C (1991) Weighted round-robin cell multiplexing in a general-purpose ATM switch. IEEE J Sel Areas Commun 9(8):1265–1279
Leiserson CE (1985) Fat-trees: universal networks for hardware-efficient supercomputing. IEEE Trans Comput 34(10):892–901
Martínez A, García P, Alfaro F, Flich J, Sánchez J, Quiles F, Duato J (2006) Towards a cost-effective interconnection network architecture with QoS and congestion management support. In: Euro-Par 2006 parallel processing. Lecture notes in computer science, vol 4128. Springer, Berlin, pp 884–895
Martínez A, Alfaro F, Sánchez J, Quiles F, Duato J (2007) A new cost-effective technique for QoS support in clusters. IEEE Trans Parallel Distrib Syst 18(12):1714–1726. doi:10.1109/TPDS.2007.1108
Martínez A, García PJ, Alfaro FJ, Sánchez JL, Flich J, Quiles FJ, Duato J (2007) Integrated QoS provision and congestion management for interconnection networks. In: Euro-Par 2007 parallel processing. Lecture notes in computer science, vol 4641. Springer, Berlin, pp 837–847
Martínez A, García PJ, Alfaro FJ, Sánchez JL, Flich J, Quiles FJ, Duato J (2009) A switch architecture guaranteeing QoS provision and HOL blocking elimination. IEEE Trans Parallel Distrib Syst 20(1):13–24
Mellanox (2011) IS5022: 8-port non-blocking unmanaged 40 Gb/s InfiniBand Switch System. http://www.mellanox.com/related-docs/prod_ib_switch_systems/IS5022.pdf
Minkenberg C, Abel F, Gusat M, Luijten RP, Denzel W (2003) Current issues in packet switch design. Comput Commun Rev 33:119–124
Myrinet (2005) Myrinet. Web page: http://www.myrinet.com
Nachiondo T, Flich J, Duato J (2010) Buffer management strategies to reduce HoL-blocking. IEEE Trans Parallel Distrib Syst 21(6):739–753
NAS (2011) NAS parallel benchmarks. NASA Advanced Supercomputing Division. Available at http://www.nas.nasa.gov/Resources/Software/npb.html
QSN (2005) QsNet overview. White paper, Quadrics Ltd. http://www.quadrics.com
Ridruejo FJ, Miguel-Alonso J (2005) INSEE: an interconnection network simulation and evaluation environment. In: Euro-Par 2005 parallel processing. Lecture notes in computer science, vol 3648. Springer, Berlin, pp 1014–1023
Shang L, Peh LS, Jha NK (2003) Dynamic voltage scaling with links for power optimization of interconnection networks. In: Proceedings of the 9th symposium on high performance computer architecture (HPCA), pp 91–102
Shreedhar M, Varghese G (1995) Efficient fair queueing using deficit round robin. In: Proceedings of the conference on applications, technologies, architectures, and protocols for computer communication, New York, NY, USA, pp 231–242
Singh A, Dally WJ, Towles B, Gupta A (2004) Globally adaptive load-balanced routing on tori. IEEE Comput Archit Lett 3(1):6–9
Tamir Y, Frazier G (1992) Dynamically-allocated multi-queue buffers for VLSI communication switches. IEEE Trans Comput 41(6):725–737
Thottetodi M, Lebeck A, Mukherjee S (2001) Self-tuned congestion control for multiprocessor networks. In: Proceedings of the seventh international symposium on high performance computer. Architecture (HPCA)
Wang M, Siegel HJ, Nichols MA, Abraham S (1995) Using a multipath network for reducing the effects of hot spots. IEEE Trans Parallel Distrib Syst 6(3):252–268
Wong FC, Martin RP, Arpaci-Dusseau RH, Culler DE (1999) Architectural requirements and scalability of the NAS parallel benchmarks. In: Proceedings of the 1999 ACM/IEEE conference on supercomputing. ACM, New York (Supercomputing ’99). doi:10.1145/331532.331573
Acknowledgements
We would like to thank and acknowledge the Intelligent Systems Group (ISG) of the University of the Basque Country (UPV/EHU) for their contribution to this paper.
This work has been jointly supported by the Spanish MINECO under the project TIN2012-38341-C04-04, and by the JCCM under the project POII10-0289-3724.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Villar, J.A., García, P.J., Alfaro, F.J. et al. An integrated solution for QoS provision and congestion management in high-performance interconnection networks using deterministic source-based routing. J Supercomput 66, 284–304 (2013). https://doi.org/10.1007/s11227-013-0904-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-013-0904-0