skip to main content
research-article

Simple Virtual Channel Allocation for High-Throughput and High-Frequency On-Chip Routers

Published: 21 May 2015 Publication History

Abstract

Packet-switched network-on-chip (NoC) has provided a scalable solution to the communications for tiled multicore processors. However, the virtual channel (VC) buffers in the NoC consume significant dynamic and leakage power. To improve the energy efficiency of the router design, it is advantageous to use small buffer sizes while still maintaining throughput of the network. This article proposes two new virtual channel allocation (VA) mechanisms, termed fixed VC assignment with dynamic VC allocation (FVADA) and adjustable VC assignment with dynamic VC allocation (AVADA). VCs are designated to output ports and allocated to packets according to such assignment. This can help to reduce the head-of-line blocking. Such VC-output port assignment can also be adjusted dynamically to accommodate traffic changes. Simulation results show that both mechanisms can improve network throughput by 41% on average. Real traffic evaluation shows a network latency reduction of up to 66%. In addition, AVADA can outperform the baseline in throughput with only half of the buffer size. Finally, we are able to achieve comparable or better throughput than a previous dynamic VC allocator while reducing its critical path delay by 57%. Hence, the proposed VA mechanisms are suitable for low-power, high-throughput, and high-frequency NoC designs.

References

[1]
Arizona State University. 2011. PTM Interconnect Model. Available at http://ptm.asu.edu/.
[2]
James Balfour and William J. Dally. 2006. Design tradeoffs for tiled CMP on-chip networks. In Proceedings of the 20th Annual International Conference on Supercomputing (ICS’06). ACM, New York, NY, 187--198.
[3]
Daniel U. Becker. 2012. Efficient Microarchitecture for Network-on-Chip Routers. Ph.D. Dissertation. Stanford University, Stanford, CA.
[4]
Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08). 72--81.
[5]
Bryan Black, Murali Annavaram, Ned Brekelbaum, John Devale, Lei Jiang, Gabriel H. Loh, Don Mccauley, Pat Morrow, Donald W. Nelson, Daniel Pantuso, Paul Reed, Jeff Rupley, Sadasivan Shankar, John Shen, and Clair Webb. 2006. Die stacking (3D) microarchitecture. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39). 469--479.
[6]
Shailender Chaudhry, Robert Cypher, Magnus Ekman, Martin Karlsson, Anders Landin, Sherman Yip, Håkan Zeffer, and Marc Tremblay. 2009. Rock: A high-performance sparc CMT processor. IEEE Micro 29, 2, 6--16.
[7]
Yunho Choi and Timothy Mark Pinkston. 2002. Evaluation of queue designs for true fully adaptive routers. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA’02), Vol. 4. 1746--1752.
[8]
William Dally and Brian Towles. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann, San Francisco, CA.
[9]
William J. Dally. 1990. Virtual-channel flow control. In Proceedings of the 17th Annual International Symposium on Computer Architecture (ISCA’90). ACM, New York, NY, 60--68.
[10]
William J. Dally and Charles L. Seitz. 1987. Deadlock-free message routing in multiprocessor interconnection networks. IEEE Transactions on Computers 36, 5, 547--553.
[11]
Jose Duato. 1991. Deadlock-free adaptive routing algorithms for multicomputers: evaluation of a new algorithm. In Proceedings of the 3rd IEEE Symposium on Parallel and Distributed Processing. 840--847.
[12]
Fabrizio Fazzino, Maurizio Palesi, and Davide Patti. 2005. Noxim: The NoC Simulator. Retrieve March 12, 2015, from http://noxim.sourceforge.net.
[13]
Mike Galles. 1997. Spider: A high-speed network interconnect. IEEE Micro, 17, 1, 34--39.
[14]
Lance Hammond, Basem A. Nayfeh, and Kunle Olukotun. 1997. A single-chip multiprocessor. Computer 30, 9, 79--85.
[15]
Yatin Hoskote, Sriram Vangal, Arvind Singh, Nitin Borkar, and Shekhar Borkar. 2007. A 5-GHz mesh interconnect for a teraflops processor. IEEE Micro 27, 5, 51--61.
[16]
Taeho Kgil, Ali Saidi, Nathan Binkert, Ronald Dreslinski, Steven Reinhardt, Krisztian Flautner, and Trevor Mudge. 2006. PicoServer: Using 3D stacking technology to enable a compact energy efficient chip multiprocessor. In ASPLOS-XII: Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems. 117--128.
[17]
John Kim, James Balfour, and William Dally. 2007. Flattened butterfly topology for on-chip networks. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40). IEEE, Los Alamitos, CA, 172--182.
[18]
Jongman Kim, Chrysostomos Nicopoulos, Dongkook Park, Vijaykrishnan Narayanan, Mazin S. Yousif, and Chita R. Das. 2006. A gracefully degrading and energy-efficient modular router architecture for on-chip networks. In Proceedings of the International Symposium on Computer Architecture. 4--15.
[19]
Jongman Kim, Dongkook Park, Theocharis Theocharides, Narayanan Vijaykrishnan, and Chita R. Das. 2005. A low latency router supporting adaptivity for on-chip interconnects. In Proceedings of the 42nd Annual Conference on Design Automation (DAC’05). ACM, New York, NY, 559--564.
[20]
Avinash K. Kodi, Ashwini Sarathy, and Ahmed Louri. 2008. iDEAL: Inter-router dual-function energy and area-efficient links for network-on-chip (NoC) architectures. In Proceedings of the 35th International Symposium on Computer Architecture (ISCA’08). 241--250.
[21]
Amit Kumar, Li-Shiuan Peh, and Niraj K. Jha. 2008. Token flow control. In Proceedings of the 41st IEEE/ACM International Symposium on Microarchitecture (MICRO-41). 342--353.
[22]
Amit Kumar, Li-Shiuan Peh, Partha Kundu, and Niraj K. Jha. 2007b. Express virtual channels: Towards the ideal interconnection fabric. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA’07). ACM, New York, NY, 150--161.
[23]
Rakesh Kumar, Victor Zyuban, and Dean M. Tullsen. 2005. Interconnections in multi-core architectures: Understanding mechanisms, overheads and scaling. In Proceedings of the 32nd Annual International Symposium on Computer Architecture. IEEE, Los Alamitos, CA, 408--419.
[24]
Amit Kumar, Partha Kundu, Arvind P. Singh, Li-Shiuan Peh, and Niraj K. Jha. 2007a. A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS. In Proceedings of the 25th International Conference on Computer Design (ICCD’07). 63--70.
[25]
George Kurian, Jason E. Miller, James Psota, Jonathan Eastep, Jifeng Liu, Jurgen Michel, Lionel C. Kimerling, and Anant Agarwal. 2010. ATAC: A 1000-core cache-coherent processor with on-chip optical network. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT’10). 477--488.
[26]
George Michelogiannakis, James Balfour, and William J. Dally. 2009. Elastic-buffer flow control for on-chip networks. In Proceedings of the IEEE 15th International Symposium on High Performance Computer Architecture (HPCA’09). 151--162.
[27]
Thomas Moscibroda and Onur Mutlu. 2009. A case for bufferless routing in on-chip networks. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA’09). ACM, New York, NY, 196--207.
[28]
Shubhendu S. Mukherjee, Peter Bannon, Steven Lang, Aaron Spink, and David Webb. 2001. The Alpha 21364 network architecture. In Proceedings of Hot Interconnects 9. 113--117.
[29]
Robert Mullins, Andrew West, and Simon Moore. 2004. Low-latency virtual-channel routers for on-chip networks. In Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA’04). IEEE, Los Alamitos, CA, 188--197.
[30]
Nan Ni, Marius Pirvu, and Laxmi Bhuyan. 1998. Circular buffered switch design with wormhole routing and virtual channels. In Proceedings of the International Conference on Computer Design: VLSI in Computers and Processors (ICCD’98). 466--473.
[31]
Chrysostomos A. Nicopoulos, Dongkook Park, Jongman Kim, Narayanan Vijaykrishnan, Mazin S. Yousif, and Chita R. Das. 2006. ViChaR: A dynamic virtual channel regulator for network-on-chip routers. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39). 333--346.
[32]
Kunle Olukotun, Basem A. Nayfeh, Lance Hammond, Ken Wilson, and Kunyung Chang. 1996. The case for a single-chip multiprocessor. In ACM SIGPLAN Notices 31, 9, 2--11.
[33]
Joonho Park, Brian W. O’Krafka, Statmatis Vassiliadis, and Jose Delgado-Frias. 1994. Design and evaluation of a DAMQ multiprocessor network with self-compacting buffers. In Proceedings of Supercomputing’94. 713--722.
[34]
Li-Shiuan Peh and William J. Dally. 2001a. A delay model and speculative architecture for pipelined routers. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture (HPCA’01). 255--266.
[35]
Li-Shiuan Peh and William J. Dally. 2001b. A delay model for router microarchitectures. IEEE Micro 21, 1, 26--34.
[36]
Yuval Tamir and Gregory L. Frazier. 1988. High-performance multiqueue buffers for VLSI communication switches. In Proceedings of the 15th Annual International Symposium on Computer Architecture. 343--354.
[37]
Sriram Vangal, Jason, Gregory Ruhl, Saurabh Dighe, Howard Wilson, James Tschanz, David Finan, Priya Iyer, Arvind Singh, Tiju Jacob, Shailendra Jain, Sriram Venkataraman, Yatin Hoskote, and Nitin Borkar. 2007a. An 80-tile 1.28TFLOPS network-on-chip in 65nm CMOS. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC’07). Digest of Technical Papers. 98--589.
[38]
Sriram Vangal, Arvind Singh, James Howard, Saurabh Dighe, Nitin Borkar, and Atila Alvandpour. 2007b. A 5.1GHz 0.34mm2 router for network-on-chip applications. In Proceedings of the IEEE Symposium on VLSI Circuits. 42--43.
[39]
Hangsheng Wang, Li-Shiuan Peh, and Sharid Malik. 2003. Power-driven design of router microarchitectures in on-chip networks. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-36). 105--116.
[40]
Yi Xu, Bo Zhao, Youtao Zhang, and Jun Yang. 2010. Simple virtual channel allocation for high throughput and high frequency on-chip routers. In Proceedings of the IEEE 16th International Symposium on High Performance Computer Architecture (HPCA’10). 1--11.

Cited By

View all
  • (2023) ReDeSIGN: Re use of De bug S tructures for I mprovement in Performance G ain of N oC Based MPSoCs IEEE Transactions on Emerging Topics in Computing10.1109/TETC.2022.320361111:2(432-447)Online publication date: 1-Apr-2023
  • (2017)ProNoCMicroprocessors & Microsystems10.1016/j.micpro.2017.08.00754:C(60-74)Online publication date: 1-Oct-2017

Index Terms

  1. Simple Virtual Channel Allocation for High-Throughput and High-Frequency On-Chip Routers

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Parallel Computing
    ACM Transactions on Parallel Computing  Volume 2, Issue 1
    Special Issue on SPAA 2012
    May 2015
    202 pages
    ISSN:2329-4949
    EISSN:2329-4957
    DOI:10.1145/2757213
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 May 2015
    Accepted: 01 November 2014
    Revised: 01 January 2014
    Received: 01 August 2013
    Published in TOPC Volume 2, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Chip multiprocessor
    2. dynamic allocation
    3. network-on-chip
    4. thousand-core
    5. virtual channel
    6. virtual channel allocation

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)8
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 05 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023) ReDeSIGN: Re use of De bug S tructures for I mprovement in Performance G ain of N oC Based MPSoCs IEEE Transactions on Emerging Topics in Computing10.1109/TETC.2022.320361111:2(432-447)Online publication date: 1-Apr-2023
    • (2017)ProNoCMicroprocessors & Microsystems10.1016/j.micpro.2017.08.00754:C(60-74)Online publication date: 1-Oct-2017

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media