skip to main content
10.1145/3173162.3177158acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Slim NoC: A Low-Diameter On-Chip Network Topology for High Energy Efficiency and Scalability

Published: 19 March 2018 Publication History

Abstract

Emerging chips with hundreds and thousands of cores require networks with unprecedented energy/area efficiency and scalability. To address this, we propose Slim NoC (SN): a new on-chip network design that delivers significant improvements in efficiency and scalability compared to the state-of-the-art. The key idea is to use two concepts from graph and number theory, degree-diameter graphs combined with non-prime finite fields, to enable the smallest number of ports for a given core count. SN is inspired by state-of-the-art off-chip topologies; it identifies and distills their advantages for NoC settings while solving several key issues that lead to significant overheads on-chip. SN provides NoC-specific layouts, which further enhance area/energy efficiency. We show how to augment SN with state-of-the-art router microarchitecture schemes such as Elastic Links, to make the network even more scalable and efficient. Our extensive experimental evaluations show that SN outperforms both traditional low-radix topologies (e.g., meshes and tori) and modern high-radix networks (e.g., various Flattened Butterflies) in area, latency, throughput, and static/dynamic power consumption for both synthetic and real workloads. SN provides a promising direction in scalable and energy-efficient NoC topologies.

References

[1]
N. Abeyratne, R. Das, Q. Li, K. Sewell, B. Giridhar, R. G. Dreslinski, D. Blaauw, and T. Mudge. Scaling Towards Kilo-Core Processors with Asymmetric High-Radix Topologies. HPCA, 2013.
[2]
T. Agerwala, J. Martin, J. Mirza, D. Sadler, D. Dias, and M. Snir. SP2 System Architecture. IBM Systems Journal, 1995.
[3]
J. Ahn, S. Hong, S. Yoo, O. Mutlu, and K. Choi. A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing. ISCA, 2015.
[4]
J. H. Ahn, N. Binkert, A. Davis, M. McLaren, and R. S. Schreiber. HyperX: Topology, Routing, and Packaging of Efficient Large-Scale Networks. SC, 2009.
[5]
J. H. Ahn, Y. H. Son, and J. Kim. Scalable High-Radix Router Microarchitecture Using a Network Switch Organization. ACM TACO, 2008.
[6]
R. Alverson, D. Roweth, and L. Kaplan. The Gemini System Interconnect. HOTI, 2010.
[7]
R. Ausavarungnirun, C. Fallin, X. Yu, K. Chang, G. Nazario, R. Das, G. H. Loh, and O. Mutlu. Design and Evaluation of Hierarchical Rings with Deflection Routing. SBAC-PAD, 2014.
[8]
R. Ausavarungnirun, C. Fallin, X. Yu, K. Chang, G. Nazario, R. Das, G. H. Loh, and O. Mutlu. A Case for Hierarchical Rings with Deflection Routing. PARCO, 2016.
[9]
J. Balfour and W. J. Dally. Design Tradeoffs for Tiled CMP On-Chip Networks. ICS, 2006.
[10]
M. Besta, S. M. Hassan, S. Yalamanchili, R. Ausavarungnirun, O. Mutlu, and T. Hoefler. Slim NoC: A Low-Diameter On-Chip Network Topology for High Energy-Efficiency and Scalability. Technical report, 2017.
[11]
M. Besta and T. Hoefler. Slim Fly: A Cost Effective Low-Diameter Network Topology. SC, 2014.
[12]
Y. Cai, K. Mai, and O. Mutlu. Comparative Evaluation of FPGA and ASIC Implementations of Bufferless and Buffered Routing Algorithms for On-Chip Networks. ISQED, 2015.
[13]
A. Ceyhan, M. Jung, S. Panth, S. K. Lim, and A. Naeemi. Impact of Size Effects in Local Interconnects for Future Technology Nodes: A Study Based on Full-Chip Layouts. IITC/AMC, 2014.
[14]
K. K.-W. Chang, R. Ausavarungnirun, C. Fallin, and O. Mutlu. HAT: Heterogeneous Adaptive Throttling for On-Chip Networks. SBAC-PAD, 2012.
[15]
C.-H. O. Chen, S. Park, T. Krishna, S. Subramanian, A. P. Chandrakasan, and L.-S. Peh. SMART: A Single-Cycle Reconfigurable NoC for SoC Applications. DATE, 2013.
[16]
L. Chen and T. M. Pinkston. Worm-bubble flow control. HPCA, 2013.
[17]
L. Chen, R. Wang, and T. Pinkston. Critical Bubble Scheme: An Efficient Implementation of Globally Aware Network Flow Control. IPDPS, 2011.
[18]
C. Craik and O. Mutlu. Investigating the Viability of Bufferless NoCs in Modern Chip Multi-Processor Systems. Carnegie Mellon University Safari Technical Report, 2011.
[19]
W. Dally and B. Towles. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., 2003.
[20]
R. Das, S. Eachempati, A. Mishra, V. Narayanan, and C. Das. Design and Evaluation of a Hierarchical On-Chip Interconnect for Next-Generation CMPs. HPCA, 2009.
[21]
R. Das, O. Mutlu, T. Moscibroda, and C. Das. Application-Aware Prioritization Mechanisms for On-Chip Networks. MICRO, 2009.
[22]
R. Das, O. Mutlu, T. Moscibroda, and C. R. Das. Aérgia: Exploiting Packet Latency Slack in On-Chip Networks. In ISCA, 2010.
[23]
J. J. Dongarra, C. B. Moler, J. R. Bunch, and G. W. Stewart. LINPACK Users' Guide. SIAM, 1979.
[24]
EZchip Semiconductor Ltd. EZchip Introduces TILE-Mx100 World's Highest Core-Count ARM Processor Optimized for High-Performance Networking Applications. http://www.tilera.com/News/PressRelease/?ezchip=97, 2015.
[25]
C. Fallin, C. Craik, and O. Mutlu. CHIPPER: A Low-Complexity Bufferless Deflection Router. HPCA, 2011.
[26]
C. Fallin, G. Nazario, X. Yu, K. Chang, R. Ausavarungnirun, and O. Mutlu. MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect. NOCS, 2012.
[27]
C. Fallin, G. Nazario, X. Yu, K. Chang, R. Ausavarungnirun, and O. Mutlu. Bufferless and Minimally-Buffered Deflection Routing. Routing Algorithms in Networks-on-Chip, 2014.
[28]
H. Fu, J. Liao, J. Yang, L. Wang, Z. Song, X. Huang, C. Yang, W. Xue, F. Liu, F. Qiao, et al. The Sunway TaihuLight Supercomputer: System and Applications. Science China Information Sciences, 2016.
[29]
B. Grot, J. Hestness, S. Keckler, and O. Mutlu. Express Cube Topologies for On-Chip Interconnects. HPCA, 2009.
[30]
B. Grot, J. Hestness, S. Keckler, and O. Mutlu. Kilo-NoC: A Heterogeneous Network-on-Chip Architecture for Scalability and Service Guarantees. ISCA, 2011.
[31]
S. Hassan and S. Yalamanchili. Centralized Buffer Router: A Low Latency, Low Power Router for High Radix NoCs. NOCS, 2013.
[32]
S. Hassan and S. Yalamanchili. Bubble Sharing: Area and Energy Efficient Adaptive Routers using Centralized Buffers. NOCS, 2014.
[33]
IBM ILOG. User's Manual for CPLEX, v12.1. International Business Machines Corporation, 2009.
[34]
A. Jain, R. Parikh, and V. Bertacco. High-Radix On-Chip Networks with Low-Radix Routers. ICCAD, 2014.
[35]
N. Jiang, G. Michelogiannakis, D. Becker, B. Towles, and W. J. Dally. Booksim 2.0 User's Guide. Standford University, 2010.
[36]
Y.-H. Kao, M. Yang, N. S. Artan, and H. J. Chao. CNoC: High-Radix Clos Network-on-Chip. TCAD, 2011.
[37]
J. Kim. Low-Cost Router Microarchitecture for On-Chip Networks. MICRO, 2009.
[38]
J. Kim, W. J. Dally, and D. Abts. Flattened Butterfly: A Cost-Efficient Topology for High-Radix Networks. ISCA, 2007.
[39]
J. Kim, W. J. Dally, S. Scott, and D. Abts. Technology-Driven, Highly-Scalable Dragonfly Topology. ISCA, 2008.
[40]
A. K. Kodi, A. Sarathy, and A. Louri. iDEAL: Inter-Router Dual-Function Energy and Area-Efficient Links for Network-on-Chip (NoC) Architectures. ISCA, 2008.
[41]
A. Kumar, L.-S. Peh, P. Kundu, and N. Jha. Toward Ideal On-Chip Communication Using Express Virtual Channels. IEEE Micro, 2008.
[42]
C. E. Leiserson. Fat-Trees: Universal Networks for Hardware-Efficient Supercomputing. IEEE TC, 1985.
[43]
J. Liu and J. G. Delgado-Frias. A DAMQ Shared Buffer Scheme for Network-on-Chip. CSS, 2007.
[44]
R. Manevich, L. Polishuk, I. Cidon, and A. Kolodny. Designing Single-Cycle Long Links in Hierarchical NoCs. Microprocessors and Microsystems, 2014.
[45]
B. D. McKay, M. Miller, and J. vSirán. A Note on Large Graphs of Diameter Two and Given Maximum Degree. Journal of Combinatorial Theory, Series B, 1998.
[46]
G. Michelogiannakis, J. Balfour, and W. Dally. Elastic-Buffer Flow Control for On-Chip Networks. HPCA, 2009.
[47]
T. Moscibroda and O. Mutlu. A Case for Bufferless Routing in On-Chip Networks. ISCA, 2009.
[48]
C. Nicopoulos, D. Park, J. Kim, N. Vijaykrishnan, M. S. Yousif, and C. R. Das. ViChaR: A Dynamic Virtual Channel Regulator for Network-on-Chip Routers. MICRO, 2006.
[49]
G. Nychis, C. Fallin, T. Moscibroda, and O. Mutlu. Next Generation On-Chip Networks: What Kind of Congestion Control Do We Need? In HotNets, 2010.
[50]
G. P. Nychis, C. Fallin, T. Moscibroda, O. Mutlu, and S. Seshan. On-Chip Networks from a Networking Perspective: Congestion and Scalability in Many-Core Interconnects. SIGCOMM, 2012.
[51]
A. Olofsson. Epiphany-V: A 1024 Processor 64-bit RISC System-on-Chip. arXiv preprint arXiv:1610.01832, 2016.
[52]
Y. Pan, P. Kumar, J. Kim, G. Memik, Y. Zhang, and A. Choudhary. Firefly: Illuminating Future Network-on-Chip with Nanophotonics. ISCA, 2009.
[53]
L.-S. Peh and W. J. Dally. A Delay Model and Speculative Architecture for Pipelined Routers. HPCA, 2001.
[54]
Pezy Computing. PEZY-SC2. http://pezy.jp.
[55]
N. Pippenger and G. Lin. Fault-Tolerant Circuit-Switching Networks. SPAA, 1992.
[56]
V. Puente, R. Beivide, J. Gregorio, J. Prellezo, J. Duato, and C. Izu. Adaptive Bubble Router: A Design to Improve Performance in Torus Networks. ICPP, 1999.
[57]
R. Ramanujam, V. Soteriou, B. Lin, and L.-S. Peh. Design of a High-Throughput Distributed Shared-Buffer NoC Router. NOCS, 2010.
[58]
P. Rosenfeld, E. Cooper-Balis, and B. Jacob. DRAMSim2: A Cycle Accurate Memory System Simulator. IEEE CAL, 2011.
[59]
S. Scott, D. Abts, J. Kim, and W. J. Dally. The BlackWidow High-Radix Clos Network. ISCA, 2006.
[60]
I. Seitanidis, A. Psarras, G. Dimitrakopoulos, and C. Nicopoulos. ElastiStore: An Elastic Buffer Architecture for Network-on-Chip Routers. DATE, 2014.
[61]
K. Sewell, R. G. Dreslinski, T. Manville, S. Satpathy, N. Pinckney, G. Blake, M. Cieslak, R. Das, T. F. Wenisch, D. Sylvester, D. Blaauw, and T. Mudge. Swizzle-Switch Networks for Many-Core Systems. Emerging and Selected Topics in Circuits and Systems, 2012.
[62]
A. Singh. Load-Balanced Routing in Interconnection Networks. PhD thesis, Stanford University, 2005.
[63]
S. Skiena. Dijkstra's algorithm. Implementing Discrete Mathematics: Combinatorics and Graph Theory with Mathematica. Addison-Wesley, 1990.
[64]
A. Sodani. Knights Landing (KNL): 2nd Generation Intel® Xeon Phi Processor. HCS, 2015.
[65]
G. Strang. Introduction to Linear Algebra. Wellesley-Cambridge Press Wellesley, MA, 1993.
[66]
C. Sun, C. O. Chen, G. Kurian, L. Wei, J. E. Miller, A. Agarwal, L. Peh, and V. Stojanovic. DSENT - A Tool Connecting Emerging Photonics with Electronics for Opto-Electronic Networks-on-Chip Modeling. NOCS, 2012.
[67]
Y. Tamir and G. Frazier. Dynamically-Allocated Multi-Queue Buffers for VLSI Communication Switches. IEEE TC, 1992.
[68]
A. T. Tran and B. M. Baas. RoShaQ: High-Performance On-Chip Router with Shared Queues. ICCD, 2011.
[69]
A. N. Udipi, N. Muralimanohar, and R. Balasubramonian. Towards Scalable, Energy-Efficient, Bus-Based On-Chip Networks. HPCA, 2010.
[70]
J. Wang, J. Beu, R. Bheda, T. Conte, Z. Dong, C. Kersey, M. Rasquinha, G. Riley, W. Song, H. Xiao, P. Xu, and S. Yalamanchili. Manifold: A Parallel Simulation Framework for Multicore Systems. ISPASS, 2014.
[71]
R. Wang, L. Chen, and T. M. Pinkston. Bubble Coloring: Avoiding Routing- and Protocol-Induced Deadlocks with Minimal Virtual Channel Requirement. ICS, 2013.
[72]
X. Xiang, S. Ghose, O. Mutlu, and N.-F. Tzeng. A Model for Application Slowdown Estimation in On-Chip Networks and Its Use for Improving System Fairness and Performance. ICCD, 2016.
[73]
X. Xiang, W. Shi, S. Ghose, L. Peng, O. Mutlu, and N.-F. Tzeng. Carpool: A Bufferless On-Chip Network Supporting Adaptive Multicast and Hotspot Alleviation. ICS, 2017.
[74]
Y. Xu, Y. Du, B. Zhao, X. Zhou, Y. Zhang, and J. Yang. A Low-Radix and Low-Diameter 3D Interconnection Network Design. HPCA, 2009.
[75]
H. Yang, J. Tripathi, N. E. Jerger, and D. Gibson. Dodec: Random-Link, Low-Radix On-Chip Networks. MICRO, 2014.
[76]
X. Yuan. On Nonblocking Folded-Clos Networks in Computer Communication Environments. IPDPS, 2011.

Cited By

View all
  • (2024)A comprehensive study and holistic review of empowering network-on-chip application mapping through machine learning techniquesDiscover Electronics10.1007/s44291-024-00027-w1:1Online publication date: 24-Oct-2024
  • (2022)Generating Brain-Network-Inspired Topologies for Large-Scale NoCs on Monolithic 3D ICsIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2021.310734069:3(1552-1556)Online publication date: Mar-2022
  • (2022)Hermes: Accelerating Long-Latency Load Requests via Perceptron-Based Off-Chip Load PredictionProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00015(1-18)Online publication date: 1-Oct-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS '18: Proceedings of the Twenty-Third International Conference on Architectural Support for Programming Languages and Operating Systems
March 2018
827 pages
ISBN:9781450349116
DOI:10.1145/3173162
  • cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 53, Issue 2
    ASPLOS '18
    February 2018
    809 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/3296957
    Issue’s Table of Contents
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 19 March 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. energy efficiency
  2. many-core systems
  3. on-chip-networks
  4. parallel processing
  5. scalability

Qualifiers

  • Research-article

Conference

ASPLOS '18

Acceptance Rates

ASPLOS '18 Paper Acceptance Rate 56 of 319 submissions, 18%;
Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)64
  • Downloads (Last 6 weeks)5
Reflects downloads up to 23 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A comprehensive study and holistic review of empowering network-on-chip application mapping through machine learning techniquesDiscover Electronics10.1007/s44291-024-00027-w1:1Online publication date: 24-Oct-2024
  • (2022)Generating Brain-Network-Inspired Topologies for Large-Scale NoCs on Monolithic 3D ICsIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2021.310734069:3(1552-1556)Online publication date: Mar-2022
  • (2022)Hermes: Accelerating Long-Latency Load Requests via Perceptron-Based Off-Chip Load PredictionProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00015(1-18)Online publication date: 1-Oct-2022
  • (2022)Design of New Scalable Network on Chip Architecture using Adaptive Group based Routing Algorithm2022 IEEE 7th International conference for Convergence in Technology (I2CT)10.1109/I2CT54291.2022.9824621(1-6)Online publication date: 7-Apr-2022
  • (2021)A Latency-Optimized Network-on-Chip with Rapid Bypass ChannelsMicromachines10.3390/mi1206062112:6(621)Online publication date: 27-May-2021
  • (2021)S-SMART++: A Low-Latency NoC Leveraging Speculative Bypass RequestsIEEE Transactions on Computers10.1109/TC.2021.306861570:6(819-832)Online publication date: 1-Jun-2021
  • (2021)Adapt-NoC: A Flexible Network-on-Chip Design for Heterogeneous Manycore Architectures2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00066(723-735)Online publication date: Feb-2021
  • (2021)Pitstop: Enabling a Virtual Network Free Network-on-Chip2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00063(682-695)Online publication date: Feb-2021
  • (2020)FatPathsProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.5555/3433701.3433736(1-18)Online publication date: 9-Nov-2020
  • (2020)Implementing Low-Diameter On-Chip Networks for Manycore Processors Using a Tiled Physical Design Methodology2020 14th IEEE/ACM International Symposium on Networks-on-Chip (NOCS)10.1109/NOCS50636.2020.9241710(1-8)Online publication date: 24-Sep-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media