ABSTRACT
Driven by the demand of communication systems, field programmable gate array (FPGA) devices have significantly enhanced their aggregate transceiver bandwidth, reaching terabits per second for the upcoming generation. This paper asks the question whether a single-chip switch fabric can be built that saturates the available transceiver bandwidth.
In answering this question, we propose a new switch fabric organization, called Grouped Crosspoint Queued switch, that brings significant memory efficiency over the state-of-the-art organizations. This makes it possible to build high bandwidth, high radix switches directly on FPGA that rivals ASIC performance. The proposal was validated at small scale by a 16x16 160Gps switch on the available Virtex-6 device, and simulated at a larger scale of fat-tree switching network with 5Tbps capacity.
- Booksim interconnection network simulator. http://nocs.stanford.edu/cgi-bin/trac.cgi/wiki/resources/booksim.Google Scholar
- Xilinx virtex-7 data sheet. http://www.xilinx.com/support/documentation/data_sheets/ds183_Virtex_7_Data_Sheet.pdf.Google Scholar
- F. Abel, C. Minkenberg, R. P. Luijten, M. Gusat, and I. IIiadis. A four-terabit packet switch supporting long round-trip times. 2002.Google Scholar
- Actel, Inc. Designing high-speed ATM switch fabrics by using Actel FPGAs. http://www.actel.com/documents/hispeedatm an.pdf, 1996.Google Scholar
- Altera, Inc. Integrating 100-GbE switching solutions on 28-nms FPGAs. http://www.altera.com/literature/wp/wp-01127-stxv-100gbe-switching.pdf, 2010.Google Scholar
- N. Binkert, A. Davis, N. P. Jouppi, M. McLaren, N. Muralimanohar, R. Schreiber, and J. H. Ahn. The role of optics in future high radix switch design. In Proceedings of the 38th annual international symposium on Computer architecture, ISCA '11, pages 437--448, New York, NY, USA, 2011. ACM. Google ScholarDigital Library
- U. Cummings, D. Daly, R. Collins, and V. Agarwal. Fulcrum's FocalPoint FM4000: A scalable, low-latency 10 gige switch for high-performance data centers. In 17th IEEE Symposium on High Performance Interconnects, 2009. Google ScholarDigital Library
- W. J. Dally and B. Towles. Principles and Practices of Interconnection Networks. Morgan Kaufman, 2004. Google ScholarDigital Library
- W. E. Denzel, A. P. J. Engbersen, and I. Iliadis. A flexible shared-buffer switch for ATM at Gb/s rates. Computer Networks & IDSN Systems, 27:611--624, January 1995. Google ScholarDigital Library
- A. P. J. Engbersen. Prizma switch technology. IBM Journal of Research and Development, March, 2003. Google ScholarDigital Library
- S. Iyer. Load balancing and parallelism for the internet. PhD. Thesis, Standford University, July, 2008. Google ScholarDigital Library
- S. Iyer and N. McKeown. Using constraint sets to achieve delay bounds in CIOQ switches. IEEE Communications Letters, 7(6), June, 2003.Google ScholarCross Ref
- Y. Kanizo, D. Hay, and I. Keslassy. The crosspoint-queued switch. In IEEE International Conference on Computer Communications, Rio de Janeiro, Brazil, 2009.Google ScholarCross Ref
- M. J. Karol, M. G. Hluchyj, and S. P. Morgan. Input vs. output queuing on a space-division packet switch. IEEE Transactions on Communication, 35(12):1347--1356, 1987.Google ScholarCross Ref
- J. Kim, W. J. Dally, B. Towles, and A. K. Gupta. Microarchitecture of a high-radix router. In 32nd Annual International Symposium on Computer Architecture, New York, NY, USA, 2005. Google ScholarDigital Library
- A. W. Lockwood, N. McKeown, G. Watson, G. Gibb, P. Hartke, J. Naous, R. Raghuraman, and J. Luo. NetFPGA - an open platform for gigabit-rate network switching and routing. In IEEE Microelectronic Systems Education, San Diego, CA, USA, June 2007. Google ScholarDigital Library
- N. McKeown. Scheduling algorithms for input-queued cell switches. PhD. Thesis, University of California at Berkeley, 1995. Google ScholarDigital Library
- N. McKeown, M. Lzzard, A. Mekkittikul, W. Ellersick, and M. Horowitz. The tiny tera: A small high-bandwidth packet switch core. In Proceedings of Hot Interconnects IV.Google Scholar
- N. McKeown, A. Mekkittikul, V. Anantharam, and J. Walrand. Achieving 100% throughput in an input queue switch. In IEEE International Conference on Computer Communications, San Francisco, CA, USA, 1996. Google ScholarDigital Library
- C. Minkenberg and T. Engbersen. A combined input and output queued packet switched system based on PRIZMA switch-on-a-chip technology. IEEE Communication Magazine, 38:70--77, 2000. Google ScholarDigital Library
- D. Simos. Design of a 32x32 variable-packet-size buffered crossbar switch chip. MSc. Thesis, University of Crete, July, 2004.Google Scholar
- Y. Tamir and G. Frazier. High performance multi-queue buffers for VLSI communication switches. In 15th Annual International Symposium on Computer Architecture, HI, USA, June 1988. Google ScholarDigital Library
- Xilinx, Inc. High-speed buffered crossbar switch design using Virtex-EM devices. http://japan.xilinx.com/support/documentation/application_notes/xapp240.pdf, 2000.Google Scholar
- Xilinx, Inc. Building crosspoint switches with CoolRunner-II CPLDs. http://www.xilinx.com/support/documentation/application_notes/xapp380.pdf, 2002.Google Scholar
- K. Yoshigoe, K. Christensen, and A. Jacob. The RR/RR CICQ switch: Hardware design for 10-Gbps link speed. In IEEE International Performance, Computing, and Communications Conference, 2003.Google Scholar
Index Terms
- Saturating the transceiver bandwidth: switch fabric design on FPGAs
Recommendations
Deadlock Avoidance for Switches Based on Wormhole Networks
ICPP '99: Proceedings of the 1999 International Conference on Parallel ProcessingWe consider the use of wormhole routed networks as internal interconnects of switches. The interconnection of two wormhole networks that are free from deadlocks does not necessarily produce a deadlock free network. This is problematic when wormhole ...
A low-power WNoC transceiver with a novel energy consumption management scheme for dependable IoT systems
AbstractWireless network-on-chip architectures (WNoCs), by combining wired and wireless modules and links, provide fast and efficient communication infrastructures for complex on-chip systems such as various IoT and intelligent systems. The ...
Highlights- Simulation of a low-power WNoC transceiver at circuit-level.
- Designing a novel ...
Research on next-generation scalable routers implemented with H-Torus topology
The exponential growth of user traffic has been driving routers to run at higher capacity. In a traditional router, the centralized switching fabric is becoming the bottleneck for its limited number of ports and complicated scheduling algorithms. Direct ...
Comments