research-article

On-chip interconnection network for accelerator-rich architectures

Authors:

Bo YuanAuthors Info & Claims

DAC '15: Proceedings of the 52nd Annual Design Automation Conference

Article No.: 8, Pages 1 - 6

https://doi.org/10.1145/2744769.2744879

Published: 07 June 2015 Publication History

Abstract

Modern processors have included hardware accelerators to provide high computation capability and low energy consumption. With specific hardware implementation, accelerators can improve performance and reduce energy consumption by orders of magnitude compared to general purpose cores. However, hardware accelerators cannot tolerate memory and communication latency through extensive multi-threading; this increases the demand for efficient memory interface and network-on-chip (NoC) designs.

In this paper we explore the global management of NoCs in accelerator-rich architectures to provide predictable performance and energy efficiency. Accelerator memory accesses exhibit predictable patterns, creating highly utilized network paths. Leveraging these observations we propose reserving NoC paths based on the timing information from the global manager. We further maximize the benefit of paths reservation by regularizing the communication traffic through TLB buffering and hybrid-switching. The combined effect of these optimizations reduces the total execution time by 11.3% over a packet-switched mesh NoC and 8.5% over the EVC [18] and a previous hybrid-switched NoC [29].

References

[1]

"The mobile robot programming toolkit." {Online}. Available: http://www.mrpt.org/

[2]

N. Agarwal et al., "Garnet: A detailed on-chip network model inside a full-system simulator," in ISPASS, April, pp. 33--42.

[3]

A. Bakhoda et al., "Throughput-effective on-chip networks for manycore accelerators," in MICRO, 2010.

Digital Library

[4]

C. Bienia, "Benchmarking modern multiprocessors," Ph.D. dissertation, Princeton University, January 2011.

Digital Library

[5]

N. Clark et al., "Veal: Virtualized execution accelerator for loops," in ISCA, 2008.

Digital Library

[6]

J. Cong et al., "Customizable domain-specific computing," Design Test of Computers, IEEE, pp. 6--15, March 2011.

Digital Library

[7]

J. Cong et al., "Architecture support for accelerator-rich cmps," in DAC, 2012.

Digital Library

[8]

J. Cong et al., "Architecture support for domain-specific accelerator-rich cmps," ACM TECS, vol. 13, no. 4s, pp. 131:1--131:26, 2014.

Digital Library

[9]

J. Cong et al., "Bin: a buffer-in-nuca scheme for accelerator-rich cmps," in ISLPED, 2012.

Digital Library

[10]

J. Cong et al., "Optimization of interconnects between accelerators and shared memories in dark silicon," in ICCAD, 2013.

Digital Library

[11]

C. F. Fajardo et al., "Buffer-integrated-cache: a cost-effective sram architecture for handheld and embedded platforms," in DAC, 2011.

Digital Library

[12]

H. Franke et al., "Introduction to the wire-speed processor and architecture," IBM Journal of Research and Development, vol. 54, no. 1, pp. 3--1, 2010.

Digital Library

[13]

K. Goossens et al., "Æthereal network on chip: concepts, architectures, and implementations," Design Test of Computers, IEEE, vol. 22, no. 5, pp. 414--421, 2005.

Digital Library

[14]

J. R. Hauser et al., "Garp: A mips processor with a reconfigurable coprocessor," in FPT, 1997.

[15]

R. Hou et al., "Efficient data streaming with on-chip accelerators: Opportunities and challenges," in HPCA, 2011.

Digital Library

[16]

N. D. E. Jerger et al., "Circuit-switched coherence," in NOCS, 2008.

Digital Library

[17]

A. B. Kahng et al., "Orion 2.0: A fast and accurate noc power and area model for early-stage design space exploration," in DATE, 2009.

Digital Library

[18]

A. Kumar et al., "Express virtual channels: Towards the ideal interconnection fabric," in ISCA, 2007.

Digital Library

[19]

M. J. Lyons et al., "The accelerator store: a shared memory framework for accelerator-based systems," TACO, vol. 8, no. 4, p. 48, 2012.

Digital Library

[20]

P. Magnusson et al., "Simics: A full system simulation platform," Computer, vol. 35, no. 2, pp. 50--58, Feb.

Digital Library

[21]

M. M. K. Martin et al., "Multifacet's general execution-driven multiprocessor simulator (gems) toolset," SIGARCH Computer Architecture News, 2005.

Digital Library

[22]

U. Nawathe et al., "An 8-core, 64-thread, 64-bit, power efficient sparc soc (niagara 2)," ISSCC, 2007.

[23]

H. Park et al., "Polymorphic pipeline array: a flexible multicore accelerator with virtualized execution for mobile multimedia applications," in MICRO, 2009.

Digital Library

[24]

L. Seiler et al., "Larrabee: a many-core x86 architecture for visual computing," ACM Transactions on Graphics (TOG), vol. 27, no. 3, p. 18, 2008.

Digital Library

[25]

S. R. Vangal et al., "An 80-tile sub-100-w teraflops processor in 65-nm cmos," Solid-State Circuits, vol. 43, no. 1, pp. 29--41, 2008.

[26]

D. Wentzlaff et al., "On-chip interconnection architecture of the tile processor," Micro, IEEE, pp. 15--31, 2007.

Digital Library

[27]

D. Wiklund et al., "Socbus: Switched network on chip for hard real time embedded systems," in IPDPS, 2003.

Digital Library

[28]

P. T. Wolkotte et al., "An energy-efficient reconfigurable circuit-switched network-on-chip," in IPDPS, 2005.

Digital Library

[29]

J. Yin et al., "Energy-efficient time-division multiplexed hybrid-switched noc for heterogeneous multicore systems," in IPDPS, 2014.

Digital Library

Cited By

Fang JWei ZLiu YHou Y(2023)TB-TBP: a task-based adaptive routing algorithm for network-on-chip in heterogenous CPU-GPU architecturesThe Journal of Supercomputing10.1007/s11227-023-05700-780:5(6311-6335)Online publication date: 23-Oct-2023
https://doi.org/10.1007/s11227-023-05700-7
Wu YWang LWang XHan JZhu JJiang HYin SWei SLiu L(2022)Upward Packet Popup for Deadlock Freedom in Modular Chiplet-Based Systems2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00076(986-1000)Online publication date: Apr-2022
https://doi.org/10.1109/HPCA53966.2022.00076
Jiao JHe YCao TKondo M(2022)Enabling circuit-switching in modern on-chip networksMicroprocessors and Microsystems10.1016/j.micpro.2022.10471295(104712)Online publication date: Nov-2022
https://doi.org/10.1016/j.micpro.2022.104712
Show More Cited By

Index Terms

On-chip interconnection network for accelerator-rich architectures
1. Hardware

Recommendations

A multilayer nanophotonic interconnection network for on-chip many-core communications
DAC '10: Proceedings of the 47th Design Automation Conference

Multi-core chips or chip multiprocessors (CMPs) are becoming the de facto architecture for scaling up performance and taking advantage of the increasing transistor count on the chip within reasonable power consumption levels. The projected increase in ...
Energy scalability of on-chip interconnection networks
Energy Efficient Chip-to-Chip Wireless Interconnection for Heterogeneous Architectures

Heterogeneous multichip architectures have gained significant interest in high-performance computing clusters to cater to a wide range of applications. In particular, heterogeneous systems with multiple multicore CPUs, GPUs, and memory have become ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DAC '15: Proceedings of the 52nd Annual Design Automation Conference

June 2015

1204 pages

ISBN:9781450335201

DOI:10.1145/2744769

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 June 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

C-FAR
National Science Foundation

Conference

DAC '15

Sponsor:

SIGDA

DAC '15: The 52nd Annual Design Automation Conference 2015

June 7 - 11, 2015

California, San Francisco

Acceptance Rates

Overall Acceptance Rate 1,770 of 5,499 submissions, 32%

Upcoming Conference

DAC '25

Sponsor:
sigda

62nd ACM/IEEE Design Automation Conference

June 22 - 26, 2025

San Francisco , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

24
Total Citations
View Citations
460
Total Downloads

Downloads (Last 12 months)26
Downloads (Last 6 weeks)1

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Fang JWei ZLiu YHou Y(2023)TB-TBP: a task-based adaptive routing algorithm for network-on-chip in heterogenous CPU-GPU architecturesThe Journal of Supercomputing10.1007/s11227-023-05700-780:5(6311-6335)Online publication date: 23-Oct-2023
https://doi.org/10.1007/s11227-023-05700-7
Wu YWang LWang XHan JZhu JJiang HYin SWei SLiu L(2022)Upward Packet Popup for Deadlock Freedom in Modular Chiplet-Based Systems2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00076(986-1000)Online publication date: Apr-2022
https://doi.org/10.1109/HPCA53966.2022.00076
Jiao JHe YCao TKondo M(2022)Enabling circuit-switching in modern on-chip networksMicroprocessors and Microsystems10.1016/j.micpro.2022.10471295(104712)Online publication date: Nov-2022
https://doi.org/10.1016/j.micpro.2022.104712
Li YLouri A(2021)ALPHA: A Learning-Enabled High-Performance Network-on-Chip Router Design for Heterogeneous Manycore ArchitecturesIEEE Transactions on Sustainable Computing10.1109/TSUSC.2020.29813406:2(274-288)Online publication date: 1-Apr-2021
https://doi.org/10.1109/TSUSC.2020.2981340
He YJiao JKondo M(2021)Local Traffic-Based Energy-Efficient Hybrid Switching for On-Chip Networks2021 29th Euromicro International Conference on Parallel, Distributed and Network-Based Processing (PDP)10.1109/PDP52278.2021.00039(198-206)Online publication date: Mar-2021
https://doi.org/10.1109/PDP52278.2021.00039
He YJiao JCao TKondo MMohsenin TZhao WChen YMutlu O(2020)Energy-Efficient On-Chip Networks through Profiled Hybrid SwitchingProceedings of the 2020 on Great Lakes Symposium on VLSI10.1145/3386263.3406934(241-246)Online publication date: 7-Sep-2020
https://dl.acm.org/doi/10.1145/3386263.3406934
Bhardwaj KHavasi MYao YBrooks DHernández-Lobato JWei GAlonso DQiu QReda SChen Y(2020)A comprehensive methodology to determine optimal coherence interfaces for many-accelerator SoCsProceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design10.1145/3370748.3406564(145-150)Online publication date: 10-Aug-2020
https://dl.acm.org/doi/10.1145/3370748.3406564
Yin JZhai A(2020)In-Network Memory Access Ordering for Heterogeneous Multicore Systems2020 14th IEEE/ACM International Symposium on Networks-on-Chip (NOCS)10.1109/NOCS50636.2020.9241583(1-8)Online publication date: 24-Sep-2020
https://doi.org/10.1109/NOCS50636.2020.9241583
Restuccia FPagani MBiondi AMarinoni MButtazzo G(2019)Is Your Bus Arbiter Really Fair? Restoring Fairness in AXI Interconnects for FPGA SoCsACM Transactions on Embedded Computing Systems10.1145/335818318:5s(1-22)Online publication date: 7-Oct-2019
https://dl.acm.org/doi/10.1145/3358183
Wang LLiu LHan JWang XYin SWei S(2019)Achieving Flexible Global Reconfiguration in NoCs using Reconfigurable RingsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2019.2940190(1-1)Online publication date: 2019
https://doi.org/10.1109/TPDS.2019.2940190
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten