research-article

Low-Power On-Chip Network Providing Guaranteed Services for Snoopy Coherent and Artificial Neural Network Systems

Authors:

Bhavya K. Daya,

Anantha P. ChandrakasanAuthors Info & Claims

DAC '17: Proceedings of the 54th Annual Design Automation Conference 2017

Article No.: 87, Pages 1 - 6

https://doi.org/10.1145/3061639.3062278

Published: 18 June 2017 Publication History

Abstract

During the transition to packet-switched on-chip networks we lose the relative timing and ordering of requests, which are essential for shared memory coherency and the communication of spikes in hardware-based artificial neural networks. We present a bufferless network architecture that enforces a time-based sharing of multi-hop single-cycle paths, providing guaranteed services at low cost. We guarantee ordered delivery of requests, fixed network latency, and jitter-free neural spikes. In a 64-node network, we achieve a 84% lower latency and 7.5x higher throughput than SCORPIO. Full-system 36-core simulations show a 9% lower runtime than SCORPIO, with 39% lower power and 36% lower area.

References

[1]

"First the tick, now the tock: Next generation Intel microarchitecture (Nehalem)." http://www.intel.com/content/dam/doc/white-paper/intel-microarchitecture-white-paper.pdf, 2008.

[2]

"Oracle's SPARC T5-2, SPARC T5-4, SPARC T5-8, and SPARC T5-1B Server Architecture." http://www.oracle.com/technetwork/server-storage/sun-sparc-enterprise/documentation/o13-024-sparc-t5-architecture-1920540.pdf.

[3]

"Intel Xeon Processor E7 Family." http://www.intel.com/content/www/us/en/processors/xeon/xeon-processor-e7-family.html.

[4]

S. Pande, F. Morgan, G. Smit, T. Bruintjes, J. Rutgers, B. McGinley, S. Cawley, J. Harkin, and L. McDaid, "Fixed latency on-chip interconnect for hardware spiking neural network architectures," Parallel Computing, vol. 39, no. 9, pp. 357--371, 2013.

[5]

K. Goossens, J. Dielissen, and A. Radulescu, "Aethereal network on chip: concepts, architectures, and implementations," IEEE Design Test of Computers, pp. 414--421, 2005.

Digital Library

[6]

T. Bjerregaard and J. Sparso, "A router architecture for connection-oriented service guarantees in the mango clockless network-on-chip," in Design, Automation and Test in Europe, pp. 1226--1231 Vol. 2, 2005.

Digital Library

[7]

B. K. Daya, C.-H. O. Chen, S. Subramanian, W.-C. Kwon, S. Park, T. Krishna, J. Holt, A. P. Chandrakasan, and L.-S. Peh, "Scorpio: A 36-core research chip demonstrating snoopy coherence on a scalable mesh noc with in-network ordering," in Proceeding of the 41st Annual International Symposium on Computer Architecuture, ISCA '14, 2014.

Digital Library

[8]

T. Krishna and L.-S. Peh, "Single-cycle collective communication over a shared network fabric," in Networks-on-Chip (NoCS), 2014 Eighth IEEE/ACM International Symposium on, pp. 1--8, Sept 2014.

[9]

T. Krishna, C.-H. Chen, W. C. Kwon, and L.-S. Peh, "Breaking the on-chip latency barrier using smart," in High Performance Computer Architecture, 2013 IEEE 19th International Symposium on, 2013.

Digital Library

[10]

W. S. McCulloch and W. Pitts, "Neurocomputing: Foundations of research," ch. A Logical Calculus of the Ideas Immanent in Nervous Activity, pp. 15--27, 1988.

Digital Library

[11]

C.-H. O. Chen, S. Park, T. Krishna, S. Subramanian, A. P. Chandrakasan, and L.-S. Peh, "Smart: A single-cycle reconfigurable noc for soc applications," in Design, Automation Test in Europe Conference Exhibition (DATE), 2013, pp. 338--343, March 2013.

Digital Library

[12]

N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, "The gem5 simulator," SIGARCH Comput. Archit. News, 2011.

Digital Library

[13]

N. Agarwal, T. Krishna, L.-S. Peh, and N. K. Jha, "GARNET: A Detailed On-Chip Network Model Inside a Full-System Simulator," in ISPASS, 2009.

[14]

W. J. Dally and B. Towles, Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers, 2004.

Digital Library

[15]

B. K. Daya, L.-S. Peh, and A. P. Chandrakasan, "Quest for high-performance bufferless nocs with single-cycle express paths and self-learning throttling," in Proceedings of the 53rd Annual Design Automation Conference, pp. 36:1--36:6, 2016.

Digital Library

[16]

P. McKinley, H. Xu, A.-H. Esfahanian, and L. Ni, "Unicast-based multicast communication in wormhole-routed networks," Parallel and Distributed Systems, IEEE Transactions on, pp. 1252--1265, 1994.

Digital Library

[17]

N. Jerger, L.-S. Peh, and M. Lipasti, "Virtual circuit tree multicasting: A case for on-chip hardware multicast support," in Computer Architecture, 2008. ISCA '08. 35th International Symposium on, pp. 229--240, June 2008.

Digital Library

[18]

L. Wang, Y. Jin, H. Kim, and E. J. Kim, "Recursive partitioning multicast: A bandwidth-efficient routing for networks-on-chip," in Proceedings of the 2009 3rd ACM/IEEE International Symposium on Networks-on-Chip, NOCS '09, pp. 64--73, 2009.

Digital Library

[19]

P. Abad, V. Puente, and J. Gregorio, "Mrr: Enabling fully adaptive multicast routing for cmp interconnection networks," in High Performance Computer Architecture, 2009. HPCA 2009. IEEE 15th International Symposium on, pp. 355--366, Feb 2009.

[20]

M. Daneshtalab, M. Ebrahimi, S. Mohammadi, and A. Afzali-Kusha, "Low-distance path-based multicast routing algorithm for network-on-chips," Computers Digital Techniques, IET, pp. 430--442, 2009.

[21]

M. Marty and M. Hill, "Coherence ordering for ring-based chip multiprocessors," in Microarchitecture, 2006. MICRO-39. 39th Annual IEEE/ACM International Symposium on, pp. 309--320, Dec 2006.

Digital Library

[22]

K. Strauss, X. Shen, and J. Torrellas, "Uncorq: Unconstrained Snoop Request Delivery in Embedded-Ring Multiprocessors," in MICRO, 2007.

Digital Library

[23]

C. Feng, Z. Lu, A. Jantsch, M. Zhang, and X. Yang, "Support efficient and fault-tolerant multicast in bufferless network-on-chip.," IEICE Transactions, pp. 1052--1061, 2012.

Cited By

Das APalesi MKim JPratim Pande P(2024)Chip and Package-Scale Interconnects for General-Purpose, Domain-Specific, and Quantum Computing Systems—Overview, Challenges, and OpportunitiesIEEE Journal on Emerging and Selected Topics in Circuits and Systems10.1109/JETCAS.2024.344582914:3(354-370)Online publication date: Sep-2024
https://doi.org/10.1109/JETCAS.2024.3445829
Gade SSinha MKumar MDeb S(2022)Scalable Hybrid Cache Coherence Using Emerging Links for Chiplet Architectures2022 35th International Conference on VLSI Design and 2022 21st International Conference on Embedded Systems (VLSID)10.1109/VLSID2022.2022.00029(92-97)Online publication date: Feb-2022
https://doi.org/10.1109/VLSID2022.2022.00029
Kale PHazarika PJain SBhowmik B(2022)Performance Evaluation in 2D NoCs Using ANNAdvanced Information Networking and Applications10.1007/978-3-030-99619-2_34(360-369)Online publication date: 31-Mar-2022
https://doi.org/10.1007/978-3-030-99619-2_34
Show More Cited By

Recommendations

Silicon-photonic network architectures for scalable, power-efficient multi-chip systems
ISCA '10

Scaling trends of logic, memories, and interconnect networks lead towards dense many-core chips. Unfortunately, process yields and reticle sizes limit the scalability of large single-chip systems. Multi-chip systems break free of these areal limits, but ...
Silicon-photonic network architectures for scalable, power-efficient multi-chip systems
ISCA '10: Proceedings of the 37th annual international symposium on Computer architecture

Scaling trends of logic, memories, and interconnect networks lead towards dense many-core chips. Unfortunately, process yields and reticle sizes limit the scalability of large single-chip systems. Multi-chip systems break free of these areal limits, but ...
Providing cost-effective on-chip network bandwidth in GPGPUs
ICCD '12: Proceedings of the 2012 IEEE 30th International Conference on Computer Design (ICCD 2012)

Network-on-chip (NoC) bandwidth has a significant impact on overall performance in throughput-oriented processors such as GPG-PUs. Although it has been commonly assumed that high NoC bandwidth can be provided through abundant on-chip wires, we show that ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '17: Proceedings of the 54th Annual Design Automation Conference 2017

June 2017

533 pages

ISBN:9781450349277

DOI:10.1145/3061639

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

EDAC: Electronic Design Automation Consortium
SIGDA: ACM Special Interest Group on Design Automation
IEEE-CEDA

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 18 June 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Research
Refereed limited

Conference

DAC '17

Sponsor:

EDAC
SIGDA

DAC '17: The 54th Annual Design Automation Conference 2017

June 18 - 22, 2017

TX, Austin, USA

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
288
Total Downloads

Downloads (Last 12 months)8
Downloads (Last 6 weeks)0

Reflects downloads up to 16 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Das APalesi MKim JPratim Pande P(2024)Chip and Package-Scale Interconnects for General-Purpose, Domain-Specific, and Quantum Computing Systems—Overview, Challenges, and OpportunitiesIEEE Journal on Emerging and Selected Topics in Circuits and Systems10.1109/JETCAS.2024.344582914:3(354-370)Online publication date: Sep-2024
https://doi.org/10.1109/JETCAS.2024.3445829
Gade SSinha MKumar MDeb S(2022)Scalable Hybrid Cache Coherence Using Emerging Links for Chiplet Architectures2022 35th International Conference on VLSI Design and 2022 21st International Conference on Embedded Systems (VLSID)10.1109/VLSID2022.2022.00029(92-97)Online publication date: Feb-2022
https://doi.org/10.1109/VLSID2022.2022.00029
Kale PHazarika PJain SBhowmik B(2022)Performance Evaluation in 2D NoCs Using ANNAdvanced Information Networking and Applications10.1007/978-3-030-99619-2_34(360-369)Online publication date: 31-Mar-2022
https://doi.org/10.1007/978-3-030-99619-2_34
Gade SDeb S(2021)A Novel Hybrid Cache Coherence with Global Snooping for Many-core ArchitecturesACM Transactions on Design Automation of Electronic Systems10.1145/346277527:1(1-31)Online publication date: 13-Sep-2021
https://dl.acm.org/doi/10.1145/3462775
Franques AKokolis AAbadal SFernando VMisailovic STorrellas J(2021)WiDir: A Wireless-Enabled Directory Cache Coherence Protocol2021 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA51647.2021.00034(304-317)Online publication date: Feb-2021
https://doi.org/10.1109/HPCA51647.2021.00034
Wang BZhou JWong WPeh L(2020)Shenjing: A low power reconfigurable neuromorphic accelerator with partial-sum and spike networks-on-chip2020 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE48585.2020.9116516(240-245)Online publication date: Mar-2020
https://doi.org/10.23919/DATE48585.2020.9116516
Tan CKarunaratne MMitra TPeh L(2018)StitchProceedings of the 45th Annual International Symposium on Computer Architecture10.1109/ISCA.2018.00054(575-587)Online publication date: 2-Jun-2018
https://dl.acm.org/doi/10.1109/ISCA.2018.00054

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten