research-article

Simple Virtual Channel Allocation for High-Throughput and High-Frequency On-Chip Routers

Authors:

Jun YangAuthors Info & Claims

ACM Transactions on Parallel Computing (TOPC), Volume 2, Issue 1

Article No.: 6, Pages 1 - 23

https://doi.org/10.1145/2742349

Published: 21 May 2015 Publication History

Abstract

Packet-switched network-on-chip (NoC) has provided a scalable solution to the communications for tiled multicore processors. However, the virtual channel (VC) buffers in the NoC consume significant dynamic and leakage power. To improve the energy efficiency of the router design, it is advantageous to use small buffer sizes while still maintaining throughput of the network. This article proposes two new virtual channel allocation (VA) mechanisms, termed fixed VC assignment with dynamic VC allocation (FVADA) and adjustable VC assignment with dynamic VC allocation (AVADA). VCs are designated to output ports and allocated to packets according to such assignment. This can help to reduce the head-of-line blocking. Such VC-output port assignment can also be adjusted dynamically to accommodate traffic changes. Simulation results show that both mechanisms can improve network throughput by 41% on average. Real traffic evaluation shows a network latency reduction of up to 66%. In addition, AVADA can outperform the baseline in throughput with only half of the buffer size. Finally, we are able to achieve comparable or better throughput than a previous dynamic VC allocator while reducing its critical path delay by 57%. Hence, the proposed VA mechanisms are suitable for low-power, high-throughput, and high-frequency NoC designs.

References

[1]

Arizona State University. 2011. PTM Interconnect Model. Available at http://ptm.asu.edu/.

[2]

James Balfour and William J. Dally. 2006. Design tradeoffs for tiled CMP on-chip networks. In Proceedings of the 20th Annual International Conference on Supercomputing (ICS’06). ACM, New York, NY, 187--198.

Digital Library

[3]

Daniel U. Becker. 2012. Efficient Microarchitecture for Network-on-Chip Routers. Ph.D. Dissertation. Stanford University, Stanford, CA.

[4]

Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08). 72--81.

Digital Library

[5]

Bryan Black, Murali Annavaram, Ned Brekelbaum, John Devale, Lei Jiang, Gabriel H. Loh, Don Mccauley, Pat Morrow, Donald W. Nelson, Daniel Pantuso, Paul Reed, Jeff Rupley, Sadasivan Shankar, John Shen, and Clair Webb. 2006. Die stacking (3D) microarchitecture. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39). 469--479.

Digital Library

[6]

Shailender Chaudhry, Robert Cypher, Magnus Ekman, Martin Karlsson, Anders Landin, Sherman Yip, Håkan Zeffer, and Marc Tremblay. 2009. Rock: A high-performance sparc CMT processor. IEEE Micro 29, 2, 6--16.

Digital Library

[7]

Yunho Choi and Timothy Mark Pinkston. 2002. Evaluation of queue designs for true fully adaptive routers. In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA’02), Vol. 4. 1746--1752.

Digital Library

[8]

William Dally and Brian Towles. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann, San Francisco, CA.

Digital Library

[9]

William J. Dally. 1990. Virtual-channel flow control. In Proceedings of the 17th Annual International Symposium on Computer Architecture (ISCA’90). ACM, New York, NY, 60--68.

Digital Library

[10]

William J. Dally and Charles L. Seitz. 1987. Deadlock-free message routing in multiprocessor interconnection networks. IEEE Transactions on Computers 36, 5, 547--553.

Digital Library

[11]

Jose Duato. 1991. Deadlock-free adaptive routing algorithms for multicomputers: evaluation of a new algorithm. In Proceedings of the 3rd IEEE Symposium on Parallel and Distributed Processing. 840--847.

Digital Library

[12]

Fabrizio Fazzino, Maurizio Palesi, and Davide Patti. 2005. Noxim: The NoC Simulator. Retrieve March 12, 2015, from http://noxim.sourceforge.net.

[13]

Mike Galles. 1997. Spider: A high-speed network interconnect. IEEE Micro, 17, 1, 34--39.

Digital Library

[14]

Lance Hammond, Basem A. Nayfeh, and Kunle Olukotun. 1997. A single-chip multiprocessor. Computer 30, 9, 79--85.

Digital Library

[15]

Yatin Hoskote, Sriram Vangal, Arvind Singh, Nitin Borkar, and Shekhar Borkar. 2007. A 5-GHz mesh interconnect for a teraflops processor. IEEE Micro 27, 5, 51--61.

Digital Library

[16]

Taeho Kgil, Ali Saidi, Nathan Binkert, Ronald Dreslinski, Steven Reinhardt, Krisztian Flautner, and Trevor Mudge. 2006. PicoServer: Using 3D stacking technology to enable a compact energy efficient chip multiprocessor. In ASPLOS-XII: Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems. 117--128.

Digital Library

[17]

John Kim, James Balfour, and William Dally. 2007. Flattened butterfly topology for on-chip networks. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-40). IEEE, Los Alamitos, CA, 172--182.

Digital Library

[18]

Jongman Kim, Chrysostomos Nicopoulos, Dongkook Park, Vijaykrishnan Narayanan, Mazin S. Yousif, and Chita R. Das. 2006. A gracefully degrading and energy-efficient modular router architecture for on-chip networks. In Proceedings of the International Symposium on Computer Architecture. 4--15.

Digital Library

[19]

Jongman Kim, Dongkook Park, Theocharis Theocharides, Narayanan Vijaykrishnan, and Chita R. Das. 2005. A low latency router supporting adaptivity for on-chip interconnects. In Proceedings of the 42nd Annual Conference on Design Automation (DAC’05). ACM, New York, NY, 559--564.

Digital Library

[20]

Avinash K. Kodi, Ashwini Sarathy, and Ahmed Louri. 2008. iDEAL: Inter-router dual-function energy and area-efficient links for network-on-chip (NoC) architectures. In Proceedings of the 35th International Symposium on Computer Architecture (ISCA’08). 241--250.

Digital Library

[21]

Amit Kumar, Li-Shiuan Peh, and Niraj K. Jha. 2008. Token flow control. In Proceedings of the 41st IEEE/ACM International Symposium on Microarchitecture (MICRO-41). 342--353.

Digital Library

[22]

Amit Kumar, Li-Shiuan Peh, Partha Kundu, and Niraj K. Jha. 2007b. Express virtual channels: Towards the ideal interconnection fabric. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA’07). ACM, New York, NY, 150--161.

Digital Library

[23]

Rakesh Kumar, Victor Zyuban, and Dean M. Tullsen. 2005. Interconnections in multi-core architectures: Understanding mechanisms, overheads and scaling. In Proceedings of the 32nd Annual International Symposium on Computer Architecture. IEEE, Los Alamitos, CA, 408--419.

Digital Library

[24]

Amit Kumar, Partha Kundu, Arvind P. Singh, Li-Shiuan Peh, and Niraj K. Jha. 2007a. A 4.6Tbits/s 3.6GHz single-cycle NoC router with a novel switch allocator in 65nm CMOS. In Proceedings of the 25th International Conference on Computer Design (ICCD’07). 63--70.

[25]

George Kurian, Jason E. Miller, James Psota, Jonathan Eastep, Jifeng Liu, Jurgen Michel, Lionel C. Kimerling, and Anant Agarwal. 2010. ATAC: A 1000-core cache-coherent processor with on-chip optical network. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques (PACT’10). 477--488.

Digital Library

[26]

George Michelogiannakis, James Balfour, and William J. Dally. 2009. Elastic-buffer flow control for on-chip networks. In Proceedings of the IEEE 15th International Symposium on High Performance Computer Architecture (HPCA’09). 151--162.

[27]

Thomas Moscibroda and Onur Mutlu. 2009. A case for bufferless routing in on-chip networks. In Proceedings of the 36th Annual International Symposium on Computer Architecture (ISCA’09). ACM, New York, NY, 196--207.

Digital Library

[28]

Shubhendu S. Mukherjee, Peter Bannon, Steven Lang, Aaron Spink, and David Webb. 2001. The Alpha 21364 network architecture. In Proceedings of Hot Interconnects 9. 113--117.

Digital Library

[29]

Robert Mullins, Andrew West, and Simon Moore. 2004. Low-latency virtual-channel routers for on-chip networks. In Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA’04). IEEE, Los Alamitos, CA, 188--197.

Digital Library

[30]

Nan Ni, Marius Pirvu, and Laxmi Bhuyan. 1998. Circular buffered switch design with wormhole routing and virtual channels. In Proceedings of the International Conference on Computer Design: VLSI in Computers and Processors (ICCD’98). 466--473.

Digital Library

[31]

Chrysostomos A. Nicopoulos, Dongkook Park, Jongman Kim, Narayanan Vijaykrishnan, Mazin S. Yousif, and Chita R. Das. 2006. ViChaR: A dynamic virtual channel regulator for network-on-chip routers. In Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-39). 333--346.

Digital Library

[32]

Kunle Olukotun, Basem A. Nayfeh, Lance Hammond, Ken Wilson, and Kunyung Chang. 1996. The case for a single-chip multiprocessor. In ACM SIGPLAN Notices 31, 9, 2--11.

Digital Library

[33]

Joonho Park, Brian W. O’Krafka, Statmatis Vassiliadis, and Jose Delgado-Frias. 1994. Design and evaluation of a DAMQ multiprocessor network with self-compacting buffers. In Proceedings of Supercomputing’94. 713--722.

Digital Library

[34]

Li-Shiuan Peh and William J. Dally. 2001a. A delay model and speculative architecture for pipelined routers. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture (HPCA’01). 255--266.

Digital Library

[35]

Li-Shiuan Peh and William J. Dally. 2001b. A delay model for router microarchitectures. IEEE Micro 21, 1, 26--34.

Digital Library

[36]

Yuval Tamir and Gregory L. Frazier. 1988. High-performance multiqueue buffers for VLSI communication switches. In Proceedings of the 15th Annual International Symposium on Computer Architecture. 343--354.

Digital Library

[37]

Sriram Vangal, Jason, Gregory Ruhl, Saurabh Dighe, Howard Wilson, James Tschanz, David Finan, Priya Iyer, Arvind Singh, Tiju Jacob, Shailendra Jain, Sriram Venkataraman, Yatin Hoskote, and Nitin Borkar. 2007a. An 80-tile 1.28TFLOPS network-on-chip in 65nm CMOS. In Proceedings of the IEEE International Solid-State Circuits Conference (ISSCC’07). Digest of Technical Papers. 98--589.

[38]

Sriram Vangal, Arvind Singh, James Howard, Saurabh Dighe, Nitin Borkar, and Atila Alvandpour. 2007b. A 5.1GHz 0.34mm2 router for network-on-chip applications. In Proceedings of the IEEE Symposium on VLSI Circuits. 42--43.

[39]

Hangsheng Wang, Li-Shiuan Peh, and Sharid Malik. 2003. Power-driven design of router microarchitectures in on-chip networks. In Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-36). 105--116.

Digital Library

[40]

Yi Xu, Bo Zhao, Youtao Zhang, and Jun Yang. 2010. Simple virtual channel allocation for high throughput and high frequency on-chip routers. In Proceedings of the IEEE 16th International Symposium on High Performance Computer Architecture (HPCA’10). 1--11.

Cited By

Rout SM BSinha MDeb S(2023) ReDeSIGN: Re use of De bug S tructures for I mprovement in Performance G ain of N oC Based MPSoCs IEEE Transactions on Emerging Topics in Computing10.1109/TETC.2022.320361111:2(432-447)Online publication date: 1-Apr-2023
https://doi.org/10.1109/TETC.2022.3203611
Monemi ATang JPalesi MMarsono M(2017)ProNoCMicroprocessors & Microsystems10.1016/j.micpro.2017.08.00754:C(60-74)Online publication date: 1-Oct-2017
https://dl.acm.org/doi/10.1016/j.micpro.2017.08.007

Index Terms

Simple Virtual Channel Allocation for High-Throughput and High-Frequency On-Chip Routers
1. Networks
  1. Network protocols

Recommendations

Exploring hybrid photonic networks-on-chip foremerging chip multiprocessors
CODES+ISSS '09: Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis

Increasing application complexity and improvements in process technology have today enabled chip multiprocessors (CMPs) with tens to hundreds of cores on a chip. Networks on Chip (NoCs) have emerged as scalable communication fabrics that can support ...
METEOR: Hybrid photonic ring-mesh network-on-chip for multicore architectures
Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers

With increasing application complexity and improvements in process technology, Chip MultiProcessors (CMPs) with tens to hundreds of cores on a chip are becoming a reality. Networks-on-Chip (NoCs) have emerged as scalable communication fabrics that can ...
The Design of On-the-Fly Virtual Channel Allocation for Low Cost High Performance On-Chip Routers
ICNC '10: Proceedings of the 2010 First International Conference on Networking and Computing

Network-on-Chip (NoC) is an important communication infrastructure for System-on-Chips (SoCs). Designing high performance NoCs with minimized area overhead is becoming a major technical challenge. In this paper, we propose the on-the-fly virtual channel ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Parallel Computing

ACM Transactions on Parallel Computing Volume 2, Issue 1

Special Issue on SPAA 2012

May 2015

202 pages

ISSN:2329-4949

EISSN:2329-4957

DOI:10.1145/2757213

Editor:
Phillip B. Gibbons
Intel Labs, Pittsburgh, USA

Issue’s Table of Contents

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 May 2015

Accepted: 01 November 2014

Revised: 01 January 2014

Received: 01 August 2013

Published in TOPC Volume 2, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

National Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
161
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 05 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Rout SM BSinha MDeb S(2023) ReDeSIGN: Re use of De bug S tructures for I mprovement in Performance G ain of N oC Based MPSoCs IEEE Transactions on Emerging Topics in Computing10.1109/TETC.2022.320361111:2(432-447)Online publication date: 1-Apr-2023
https://doi.org/10.1109/TETC.2022.3203611
Monemi ATang JPalesi MMarsono M(2017)ProNoCMicroprocessors & Microsystems10.1016/j.micpro.2017.08.00754:C(60-74)Online publication date: 1-Oct-2017
https://dl.acm.org/doi/10.1016/j.micpro.2017.08.007

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents