research-article

Slim NoC: A Low-Diameter On-Chip Network Topology for High Energy Efficiency and Scalability

Authors:

Syed Minhaj Hassan,

Sudhakar Yalamanchili,

Rachata Ausavarungnirun,

Torsten HoeflerAuthors Info & Claims

ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems

Pages 43 - 55

https://doi.org/10.1145/3173162.3177158

Published: 19 March 2018 Publication History

Abstract

Emerging chips with hundreds and thousands of cores require networks with unprecedented energy/area efficiency and scalability. To address this, we propose Slim NoC (SN): a new on-chip network design that delivers significant improvements in efficiency and scalability compared to the state-of-the-art. The key idea is to use two concepts from graph and number theory, degree-diameter graphs combined with non-prime finite fields, to enable the smallest number of ports for a given core count. SN is inspired by state-of-the-art off-chip topologies; it identifies and distills their advantages for NoC settings while solving several key issues that lead to significant overheads on-chip. SN provides NoC-specific layouts, which further enhance area/energy efficiency. We show how to augment SN with state-of-the-art router microarchitecture schemes such as Elastic Links, to make the network even more scalable and efficient. Our extensive experimental evaluations show that SN outperforms both traditional low-radix topologies (e.g., meshes and tori) and modern high-radix networks (e.g., various Flattened Butterflies) in area, latency, throughput, and static/dynamic power consumption for both synthetic and real workloads. SN provides a promising direction in scalable and energy-efficient NoC topologies.

References

[1]

N. Abeyratne, R. Das, Q. Li, K. Sewell, B. Giridhar, R. G. Dreslinski, D. Blaauw, and T. Mudge. Scaling Towards Kilo-Core Processors with Asymmetric High-Radix Topologies. HPCA, 2013.

Digital Library

[2]

T. Agerwala, J. Martin, J. Mirza, D. Sadler, D. Dias, and M. Snir. SP2 System Architecture. IBM Systems Journal, 1995.

Digital Library

[3]

J. Ahn, S. Hong, S. Yoo, O. Mutlu, and K. Choi. A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing. ISCA, 2015.

Digital Library

[4]

J. H. Ahn, N. Binkert, A. Davis, M. McLaren, and R. S. Schreiber. HyperX: Topology, Routing, and Packaging of Efficient Large-Scale Networks. SC, 2009.

Digital Library

[5]

J. H. Ahn, Y. H. Son, and J. Kim. Scalable High-Radix Router Microarchitecture Using a Network Switch Organization. ACM TACO, 2008.

Digital Library

[6]

R. Alverson, D. Roweth, and L. Kaplan. The Gemini System Interconnect. HOTI, 2010.

Digital Library

[7]

R. Ausavarungnirun, C. Fallin, X. Yu, K. Chang, G. Nazario, R. Das, G. H. Loh, and O. Mutlu. Design and Evaluation of Hierarchical Rings with Deflection Routing. SBAC-PAD, 2014.

Digital Library

[8]

R. Ausavarungnirun, C. Fallin, X. Yu, K. Chang, G. Nazario, R. Das, G. H. Loh, and O. Mutlu. A Case for Hierarchical Rings with Deflection Routing. PARCO, 2016.

Digital Library

[9]

J. Balfour and W. J. Dally. Design Tradeoffs for Tiled CMP On-Chip Networks. ICS, 2006.

Digital Library

[10]

M. Besta, S. M. Hassan, S. Yalamanchili, R. Ausavarungnirun, O. Mutlu, and T. Hoefler. Slim NoC: A Low-Diameter On-Chip Network Topology for High Energy-Efficiency and Scalability. Technical report, 2017.

[11]

M. Besta and T. Hoefler. Slim Fly: A Cost Effective Low-Diameter Network Topology. SC, 2014.

Digital Library

[12]

Y. Cai, K. Mai, and O. Mutlu. Comparative Evaluation of FPGA and ASIC Implementations of Bufferless and Buffered Routing Algorithms for On-Chip Networks. ISQED, 2015.

[13]

A. Ceyhan, M. Jung, S. Panth, S. K. Lim, and A. Naeemi. Impact of Size Effects in Local Interconnects for Future Technology Nodes: A Study Based on Full-Chip Layouts. IITC/AMC, 2014.

[14]

K. K.-W. Chang, R. Ausavarungnirun, C. Fallin, and O. Mutlu. HAT: Heterogeneous Adaptive Throttling for On-Chip Networks. SBAC-PAD, 2012.

Digital Library

[15]

C.-H. O. Chen, S. Park, T. Krishna, S. Subramanian, A. P. Chandrakasan, and L.-S. Peh. SMART: A Single-Cycle Reconfigurable NoC for SoC Applications. DATE, 2013.

Digital Library

[16]

L. Chen and T. M. Pinkston. Worm-bubble flow control. HPCA, 2013.

Digital Library

[17]

L. Chen, R. Wang, and T. Pinkston. Critical Bubble Scheme: An Efficient Implementation of Globally Aware Network Flow Control. IPDPS, 2011.

Digital Library

[18]

C. Craik and O. Mutlu. Investigating the Viability of Bufferless NoCs in Modern Chip Multi-Processor Systems. Carnegie Mellon University Safari Technical Report, 2011.

[19]

W. Dally and B. Towles. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., 2003.

Digital Library

[20]

R. Das, S. Eachempati, A. Mishra, V. Narayanan, and C. Das. Design and Evaluation of a Hierarchical On-Chip Interconnect for Next-Generation CMPs. HPCA, 2009.

[21]

R. Das, O. Mutlu, T. Moscibroda, and C. Das. Application-Aware Prioritization Mechanisms for On-Chip Networks. MICRO, 2009.

Digital Library

[22]

R. Das, O. Mutlu, T. Moscibroda, and C. R. Das. Aérgia: Exploiting Packet Latency Slack in On-Chip Networks. In ISCA, 2010.

Digital Library

[23]

J. J. Dongarra, C. B. Moler, J. R. Bunch, and G. W. Stewart. LINPACK Users' Guide. SIAM, 1979.

[24]

EZchip Semiconductor Ltd. EZchip Introduces TILE-Mx100 World's Highest Core-Count ARM Processor Optimized for High-Performance Networking Applications. http://www.tilera.com/News/PressRelease/?ezchip=97, 2015.

[25]

C. Fallin, C. Craik, and O. Mutlu. CHIPPER: A Low-Complexity Bufferless Deflection Router. HPCA, 2011.

Digital Library

[26]

C. Fallin, G. Nazario, X. Yu, K. Chang, R. Ausavarungnirun, and O. Mutlu. MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect. NOCS, 2012.

Digital Library

[27]

C. Fallin, G. Nazario, X. Yu, K. Chang, R. Ausavarungnirun, and O. Mutlu. Bufferless and Minimally-Buffered Deflection Routing. Routing Algorithms in Networks-on-Chip, 2014.

[28]

H. Fu, J. Liao, J. Yang, L. Wang, Z. Song, X. Huang, C. Yang, W. Xue, F. Liu, F. Qiao, et al. The Sunway TaihuLight Supercomputer: System and Applications. Science China Information Sciences, 2016.

[29]

B. Grot, J. Hestness, S. Keckler, and O. Mutlu. Express Cube Topologies for On-Chip Interconnects. HPCA, 2009.

[30]

B. Grot, J. Hestness, S. Keckler, and O. Mutlu. Kilo-NoC: A Heterogeneous Network-on-Chip Architecture for Scalability and Service Guarantees. ISCA, 2011.

Digital Library

[31]

S. Hassan and S. Yalamanchili. Centralized Buffer Router: A Low Latency, Low Power Router for High Radix NoCs. NOCS, 2013.

[32]

S. Hassan and S. Yalamanchili. Bubble Sharing: Area and Energy Efficient Adaptive Routers using Centralized Buffers. NOCS, 2014.

[33]

IBM ILOG. User's Manual for CPLEX, v12.1. International Business Machines Corporation, 2009.

[34]

A. Jain, R. Parikh, and V. Bertacco. High-Radix On-Chip Networks with Low-Radix Routers. ICCAD, 2014.

Digital Library

[35]

N. Jiang, G. Michelogiannakis, D. Becker, B. Towles, and W. J. Dally. Booksim 2.0 User's Guide. Standford University, 2010.

[36]

Y.-H. Kao, M. Yang, N. S. Artan, and H. J. Chao. CNoC: High-Radix Clos Network-on-Chip. TCAD, 2011.

Digital Library

[37]

J. Kim. Low-Cost Router Microarchitecture for On-Chip Networks. MICRO, 2009.

Digital Library

[38]

J. Kim, W. J. Dally, and D. Abts. Flattened Butterfly: A Cost-Efficient Topology for High-Radix Networks. ISCA, 2007.

Digital Library

[39]

J. Kim, W. J. Dally, S. Scott, and D. Abts. Technology-Driven, Highly-Scalable Dragonfly Topology. ISCA, 2008.

Digital Library

[40]

A. K. Kodi, A. Sarathy, and A. Louri. iDEAL: Inter-Router Dual-Function Energy and Area-Efficient Links for Network-on-Chip (NoC) Architectures. ISCA, 2008.

Digital Library

[41]

A. Kumar, L.-S. Peh, P. Kundu, and N. Jha. Toward Ideal On-Chip Communication Using Express Virtual Channels. IEEE Micro, 2008.

Digital Library

[42]

C. E. Leiserson. Fat-Trees: Universal Networks for Hardware-Efficient Supercomputing. IEEE TC, 1985.

Digital Library

[43]

J. Liu and J. G. Delgado-Frias. A DAMQ Shared Buffer Scheme for Network-on-Chip. CSS, 2007.

Digital Library

[44]

R. Manevich, L. Polishuk, I. Cidon, and A. Kolodny. Designing Single-Cycle Long Links in Hierarchical NoCs. Microprocessors and Microsystems, 2014.

Digital Library

[45]

B. D. McKay, M. Miller, and J. vSirán. A Note on Large Graphs of Diameter Two and Given Maximum Degree. Journal of Combinatorial Theory, Series B, 1998.

Digital Library

[46]

G. Michelogiannakis, J. Balfour, and W. Dally. Elastic-Buffer Flow Control for On-Chip Networks. HPCA, 2009.

[47]

T. Moscibroda and O. Mutlu. A Case for Bufferless Routing in On-Chip Networks. ISCA, 2009.

Digital Library

[48]

C. Nicopoulos, D. Park, J. Kim, N. Vijaykrishnan, M. S. Yousif, and C. R. Das. ViChaR: A Dynamic Virtual Channel Regulator for Network-on-Chip Routers. MICRO, 2006.

Digital Library

[49]

G. Nychis, C. Fallin, T. Moscibroda, and O. Mutlu. Next Generation On-Chip Networks: What Kind of Congestion Control Do We Need? In HotNets, 2010.

Digital Library

[50]

G. P. Nychis, C. Fallin, T. Moscibroda, O. Mutlu, and S. Seshan. On-Chip Networks from a Networking Perspective: Congestion and Scalability in Many-Core Interconnects. SIGCOMM, 2012.

Digital Library

[51]

A. Olofsson. Epiphany-V: A 1024 Processor 64-bit RISC System-on-Chip. arXiv preprint arXiv:1610.01832, 2016.

[52]

Y. Pan, P. Kumar, J. Kim, G. Memik, Y. Zhang, and A. Choudhary. Firefly: Illuminating Future Network-on-Chip with Nanophotonics. ISCA, 2009.

Digital Library

[53]

L.-S. Peh and W. J. Dally. A Delay Model and Speculative Architecture for Pipelined Routers. HPCA, 2001.

Digital Library

[54]

Pezy Computing. PEZY-SC2. http://pezy.jp.

[55]

N. Pippenger and G. Lin. Fault-Tolerant Circuit-Switching Networks. SPAA, 1992.

Digital Library

[56]

V. Puente, R. Beivide, J. Gregorio, J. Prellezo, J. Duato, and C. Izu. Adaptive Bubble Router: A Design to Improve Performance in Torus Networks. ICPP, 1999.

Digital Library

[57]

R. Ramanujam, V. Soteriou, B. Lin, and L.-S. Peh. Design of a High-Throughput Distributed Shared-Buffer NoC Router. NOCS, 2010.

Digital Library

[58]

P. Rosenfeld, E. Cooper-Balis, and B. Jacob. DRAMSim2: A Cycle Accurate Memory System Simulator. IEEE CAL, 2011.

Digital Library

[59]

S. Scott, D. Abts, J. Kim, and W. J. Dally. The BlackWidow High-Radix Clos Network. ISCA, 2006.

Digital Library

[60]

I. Seitanidis, A. Psarras, G. Dimitrakopoulos, and C. Nicopoulos. ElastiStore: An Elastic Buffer Architecture for Network-on-Chip Routers. DATE, 2014.

Digital Library

[61]

K. Sewell, R. G. Dreslinski, T. Manville, S. Satpathy, N. Pinckney, G. Blake, M. Cieslak, R. Das, T. F. Wenisch, D. Sylvester, D. Blaauw, and T. Mudge. Swizzle-Switch Networks for Many-Core Systems. Emerging and Selected Topics in Circuits and Systems, 2012.

[62]

A. Singh. Load-Balanced Routing in Interconnection Networks. PhD thesis, Stanford University, 2005.

[63]

S. Skiena. Dijkstra's algorithm. Implementing Discrete Mathematics: Combinatorics and Graph Theory with Mathematica. Addison-Wesley, 1990.

Digital Library

[64]

A. Sodani. Knights Landing (KNL): 2nd Generation Intel® Xeon Phi Processor. HCS, 2015.

[65]

G. Strang. Introduction to Linear Algebra. Wellesley-Cambridge Press Wellesley, MA, 1993.

[66]

C. Sun, C. O. Chen, G. Kurian, L. Wei, J. E. Miller, A. Agarwal, L. Peh, and V. Stojanovic. DSENT - A Tool Connecting Emerging Photonics with Electronics for Opto-Electronic Networks-on-Chip Modeling. NOCS, 2012.

Digital Library

[67]

Y. Tamir and G. Frazier. Dynamically-Allocated Multi-Queue Buffers for VLSI Communication Switches. IEEE TC, 1992.

Digital Library

[68]

A. T. Tran and B. M. Baas. RoShaQ: High-Performance On-Chip Router with Shared Queues. ICCD, 2011.

Digital Library

[69]

A. N. Udipi, N. Muralimanohar, and R. Balasubramonian. Towards Scalable, Energy-Efficient, Bus-Based On-Chip Networks. HPCA, 2010.

[70]

J. Wang, J. Beu, R. Bheda, T. Conte, Z. Dong, C. Kersey, M. Rasquinha, G. Riley, W. Song, H. Xiao, P. Xu, and S. Yalamanchili. Manifold: A Parallel Simulation Framework for Multicore Systems. ISPASS, 2014.

[71]

R. Wang, L. Chen, and T. M. Pinkston. Bubble Coloring: Avoiding Routing- and Protocol-Induced Deadlocks with Minimal Virtual Channel Requirement. ICS, 2013.

Digital Library

[72]

X. Xiang, S. Ghose, O. Mutlu, and N.-F. Tzeng. A Model for Application Slowdown Estimation in On-Chip Networks and Its Use for Improving System Fairness and Performance. ICCD, 2016.

[73]

X. Xiang, W. Shi, S. Ghose, L. Peng, O. Mutlu, and N.-F. Tzeng. Carpool: A Bufferless On-Chip Network Supporting Adaptive Multicast and Hotspot Alleviation. ICS, 2017.

Digital Library

[74]

Y. Xu, Y. Du, B. Zhao, X. Zhou, Y. Zhang, and J. Yang. A Low-Radix and Low-Diameter 3D Interconnection Network Design. HPCA, 2009.

[75]

H. Yang, J. Tripathi, N. E. Jerger, and D. Gibson. Dodec: Random-Link, Low-Radix On-Chip Networks. MICRO, 2014.

Digital Library

[76]

X. Yuan. On Nonblocking Folded-Clos Networks in Computer Communication Environments. IPDPS, 2011.

Digital Library

Cited By

Asadi Y(2024)A comprehensive study and holistic review of empowering network-on-chip application mapping through machine learning techniquesDiscover Electronics10.1007/s44291-024-00027-w1:1Online publication date: 24-Oct-2024
https://doi.org/10.1007/s44291-024-00027-w
Ge MNi XChen SKang Y(2022)Generating Brain-Network-Inspired Topologies for Large-Scale NoCs on Monolithic 3D ICsIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2021.310734069:3(1552-1556)Online publication date: Mar-2022
https://doi.org/10.1109/TCSII.2021.3107340
Bera RKanellopoulos KBalachandran SNovo DOlgun ASadrosadati MMutlu OHardavellas NCampanoni SGrot BKarpuzcu U(2022)Hermes: Accelerating Long-Latency Load Requests via Perceptron-Based Off-Chip Load PredictionProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00015(1-18)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1109/MICRO56248.2022.00015
Show More Cited By

Index Terms

Slim NoC: A Low-Diameter On-Chip Network Topology for High Energy Efficiency and Scalability

Recommendations

Slim NoC: A Low-Diameter On-Chip Network Topology for High Energy Efficiency and Scalability
ASPLOS '18

Emerging chips with hundreds and thousands of cores require networks with unprecedented energy/area efficiency and scalability. To address this, we propose Slim NoC (SN): a new on-chip network design that delivers significant improvements in efficiency ...
3D NOC for many-core processors

With an increasing number of processors forming many-core chip multiprocessors (CMP), there exists a need for easily scalable, high-performance and low-power intra-chip communication infrastructure for emerging systems. In CMPs with hundreds of ...
A study of a wire-wireless hybrid NoC architecture with an energy-proportional multicast scheme for energy efficiency

The efficiency of interconnect network-on-chip (NoC) design significantly affects the thermal and energy-consumption problems. The wireless interconnect NoC (WiNoC) design provides a promising NoC architecture for multicast in chip multiprocessor (CMP) ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems

March 2018

827 pages

ISBN:9781450349116

DOI:10.1145/3173162

General Chairs:
Xipeng Shen
North Carolina State University, USA
,
James Tuck
North Carolina State University, USA
,
Program Chairs:
Ricardo Bianchini
Microsoft Research, USA
,
Vivek Sarkar
Georgia Institute of Technology, USA

ACM SIGPLAN Notices Volume 53, Issue 2
ASPLOS '18
February 2018
809 pages
ISSN:0362-1340
EISSN:1558-1160
DOI:10.1145/3296957
Editor:
Matthew Fluet
Rodchester Institude of Technology
Issue’s Table of Contents

Copyright © 2018 ACM.

Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 March 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ASPLOS '18

Sponsor:

ASPLOS '18: Architectural Support for Programming Languages and Operating Systems

March 24 - 28, 2018

VA, Williamsburg, USA

Acceptance Rates

ASPLOS '18 Paper Acceptance Rate 56 of 319 submissions, 18%;

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

28
Total Citations
View Citations
760
Total Downloads

Downloads (Last 12 months)64
Downloads (Last 6 weeks)5

Reflects downloads up to 23 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Asadi Y(2024)A comprehensive study and holistic review of empowering network-on-chip application mapping through machine learning techniquesDiscover Electronics10.1007/s44291-024-00027-w1:1Online publication date: 24-Oct-2024
https://doi.org/10.1007/s44291-024-00027-w
Ge MNi XChen SKang Y(2022)Generating Brain-Network-Inspired Topologies for Large-Scale NoCs on Monolithic 3D ICsIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2021.310734069:3(1552-1556)Online publication date: Mar-2022
https://doi.org/10.1109/TCSII.2021.3107340
Bera RKanellopoulos KBalachandran SNovo DOlgun ASadrosadati MMutlu OHardavellas NCampanoni SGrot BKarpuzcu U(2022)Hermes: Accelerating Long-Latency Load Requests via Perceptron-Based Off-Chip Load PredictionProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00015(1-18)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1109/MICRO56248.2022.00015
Bampire DMudaheranwa ENgaboyera EUrimubenshi FMasengo GNdayisenga G(2022)Design of New Scalable Network on Chip Architecture using Adaptive Group based Routing Algorithm2022 IEEE 7th International conference for Convergence in Technology (I2CT)10.1109/I2CT54291.2022.9824621(1-6)Online publication date: 7-Apr-2022
https://doi.org/10.1109/I2CT54291.2022.9824621
Ma WGao XGao YYu N(2021)A Latency-Optimized Network-on-Chip with Rapid Bypass ChannelsMicromachines10.3390/mi1206062112:6(621)Online publication date: 27-May-2021
https://doi.org/10.3390/mi12060621
Perez IVallejo EBeivide R(2021)S-SMART++: A Low-Latency NoC Leveraging Speculative Bypass RequestsIEEE Transactions on Computers10.1109/TC.2021.306861570:6(819-832)Online publication date: 1-Jun-2021
https://doi.org/10.1109/TC.2021.3068615
Zheng HWang KLouri A(2021)Adapt-NoC: A Flexible Network-on-Chip Design for Heterogeneous Manycore Architectures2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00066(723-735)Online publication date: Feb-2021
https://doi.org/10.1109/HPCA51647.2021.00066
Farrokhbakht HKao HHasan KGratz PKrishna TSan Miguel JJerger N(2021)Pitstop: Enabling a Virtual Network Free Network-on-Chip2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00063(682-695)Online publication date: Feb-2021
https://doi.org/10.1109/HPCA51647.2021.00063
Besta MSchneider MKonieczny MCynk KHenriksson EDi Girolamo SSingla AHoefler TCuicchi CQualters IKramer W(2020)FatPathsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3433701.3433736(1-18)Online publication date: 9-Nov-2020
https://dl.acm.org/doi/10.5555/3433701.3433736
Ou YAgwa SBatten C(2020)Implementing Low-Diameter On-Chip Networks for Manycore Processors Using a Tiled Physical Design Methodology2020 14th IEEE/ACM International Symposium on Networks-on-Chip (NOCS)10.1109/NOCS50636.2020.9241710(1-8)Online publication date: 24-Sep-2020
https://doi.org/10.1109/NOCS50636.2020.9241710
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten