research-article

Architecture Support for Domain-Specific Accelerator-Rich CMPs

Authors:

Mohammad Ali Ghodrat,

Beayna Grigorian,

Glenn ReinmanAuthors Info & Claims

ACM Transactions on Embedded Computing Systems (TECS), Volume 13, Issue 4s

Article No.: 131, Pages 1 - 26

https://doi.org/10.1145/2584664

Published: 01 April 2014 Publication History

Abstract

This work discusses hardware architectural support for domain-specific accelerator-rich CMPs. First, we present a hardware resource management scheme for sharing of loosely coupled accelerators and arbitration of multiple requesting cores. Second, we present a mechanism for accelerator virtualization. This allows multiple accelerators to efficiently compose a larger virtual accelerator out of multiple smaller accelerators, as well as to collaborate as multiple copies of a simple accelerator. All of this work is supported by a fully automated simulation tool-chain for both accelerator generation and management. We present the applicability of our approach to four different application domains: medical imaging, commercial, computer vision, and navigation. Our results demonstrate large performance improvements and energy savings over a software implementation. We also show additional improvements that result from enhanced load balancing and simplification of the communication between the core and the arbitration mechanism.

References

[1]

D. Bouris, A. Nikitakis, and I. Papaefstathiou. 2010. Fast and efficient fpga-based feature detection employing the surf algorithm. In Proceedings of the 18^th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM'10). 3--10.

Digital Library

[2]

N. Clark, A. Hormati, and S. Mahlke. 2008. VEAL: Virtualized execution accelerator for loops. In Proceedings of the 35^th Annual International Symposium on Computer Architecture (ISCA'08). 389--400.

Digital Library

[3]

Convey Computer. 2008. Convey computer. http://conveycomputer.com/.

[4]

J. Cong. 2009. FPGA-based hardware acceleration of lithographic aerial image simulation. ACM Trans. Reconfig. Technol. Syst. 3, 1--29.

Digital Library

[5]

J. Cong, B. Liu, S. Neuendorffer, J. Noguera, K. Vissers, and Z. Zhang. 2011. High-level synthesis for fpgas: From prototyping to deployment. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 30, 4, 473--491.

Digital Library

[6]

J. Cong, M. A. Ghodrat, M. Gill, B. Grigorian, and G. Reinman. 2012. Architecture support for accelerator-rich cmps. In Proceedings of the 49^th Annual Design Automation Conference (DAC'12).

Digital Library

[7]

J. W. Cooley and J. W. Tukey. 1965. An algorithm for the machine calculation of complex fourier series. Math. Comput. 19, 297--301.

[8]

M. Frigo and S. G. Johnson. 2005. The design and implementation of fftw3. Proc. IEEE 93, 2, 216--231.

[9]

P. Garcia and K. Compton. 2008. Kernel sharing on reconfigurable multiprocessor systems. In Proceedings of the International Conference on ICECE Technology (FPT'08). 225--232.

[10]

V. Govindaraju, C. H. Ho, and K. Sankaralingam. 2011. Dynamically specialized datapaths for energy efficient computing. In Proceedings of the 17^th International Symposium on High Performance Computer Architecture (HPCA'11). 503--514.

Digital Library

[11]

J. R. Hauser and J. Wawrzynek. 1997. Garp: A mips processor with a reconfigurable coprocessor. In Proceedings of the 5^th Annual IEEE Field-Programmable Custom Computing Machines (FCCM'97). 12--21.

Digital Library

[12]

ITRS. 2011. ITRS system drivers. http://www.itrs.net/Links/2011ITRS/2011Chapters/2011SysDrivers.pdf.

[13]

W. Jiang and V. K. Prasanna. 2009. Large-scale wire-speed packet classification on fpgas. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA'09). 219--228.

Digital Library

[14]

C. Johnson, D. H. Allen, J. Brown, S. Vanderwiel, R. Hoover, et al. 2010. A wire-speed power tm processor: 2.3ghz 45nm soi with 16 cores and 64 threads. In Proceedings of the IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC'10). 104--105.

[15]

T. Johnson and U. Nawathe. 2007. An 8-core, 64-thread, 64-bit power efficient sparc soc (niagara2). In Proceedings of the International Symposium on Physical Design (ISPD'07). 2--2.

Digital Library

[16]

S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42^nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'09). 469--480.

Digital Library

[17]

P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, et al. 2002. Simics: A full system simulation platform. Comput. 35, 2, 50--58.

Digital Library

[18]

M. M. K. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, et al. 2005. Multifacet's general execution-driven multiprocessor simulator toolset. SIGARCH Comput. Archit. News 33, 4, 92--99.

Digital Library

[19]

Nallatech. 2011. Nallatech fsb - Development systems. http://www.nallatech.com/Intel-Xeon-FSB-Socket- Fillers/fsb-development-systems.html.

[20]

H. Park, Y. Park, and S. Mahlke. 2009. Polymorphic pipeline array: A flexible multicore accelerator with virtualized execution for mobile multimedia application. In Proceedings of the 42^nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'09). 370--380.

Digital Library

[21]

M. Puschel, J. M. F. Moura, J. R. Johnson, D. Padua, M. M. Veloso, et al. 2005. SPIRAL: Code generation for dsp transforms. Proc. IEEE 2, 232--275.

[22]

A. Ramirez, F. Cabarcas, B. Juurlink, M. A. Mesa, F. Sanchez, et al. 2010. The sarc architecture. IEEE Micro. 30, 5, 16--29.

Digital Library

[23]

P. Schaumont and I. Verbauwhede. 2003. Domain-specific codesign for embedded security. Comput. 36, 4, 68--74.

Digital Library

[24]

L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, et al. 2009. Larrabee: A many-core x86 architecture for visual computing. IEEE Micro. 29, 10--21.

Digital Library

[25]

P. M. Stillwell, V. Chadha, O. Tickoo, S. Zhang, R., Illikkal et al. 2009. HiPPAI: High performance portable accelerator interface for socs. In Proceedings of the International Conference on High Performance Computing (HiPC'09). 109--118.

[26]

N. Sun and C.-C. Lin. 2007. Using the cryptographic accelerators in the ultrasparc t1 and t2 processors. Sun BluePrints Online, november. http://www.oracle.com/technetwork/systems/archive/a11-014-crypto-accelerators-439765.pdf.

[27]

Synopsys. 2013. Synopsys design compiler. http://www.synopsys.com/Tools/Pages/default.aspx.

[28]

G. Venkatesh, J. Sampson, N. Goulding, S. Garcia, V. Bryksin, et al. 2010. Conservation cores: Reducing the energy of mature computations. In Proceedings of the 15^th International Conference edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems (ASPLOS'10). 205--218.

Digital Library

[29]

P. H. Wang, J. D. Collins, G. N. Chinya, H. Jiang, X. Tian, et al. 2007. EXOCHI: Architecture and programming environment for a heterogeneous multi-core multithreaded system. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'07). 156--166.

Digital Library

Cited By

Gupta SDwarkadas S(2024)RELIEF: Relieving Memory Pressure In SoCs Via Data Movement-Aware Accelerator Scheduling2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00084(1063-1079)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00084
Siddiqui SDutta AChiriyath ASilbernagel DHemanth PMa OHoltom JLuhana NBliss D(2023)Hardware Implementation of RF Spectral Convergence System on DASH SoC2023 57th Asilomar Conference on Signals, Systems, and Computers10.1109/IEEECONF59524.2023.10477039(1552-1558)Online publication date: 29-Oct-2023
https://doi.org/10.1109/IEEECONF59524.2023.10477039
Venkataramani AChiriyath ADutta AHerschfelt ABliss D(2022)The DASH SoC: Enabling the Next Generation of Multi-Function RF Systems2022 56th Asilomar Conference on Signals, Systems, and Computers10.1109/IEEECONF56349.2022.10052029(905-912)Online publication date: 31-Oct-2022
https://doi.org/10.1109/IEEECONF56349.2022.10052029
Show More Cited By

Index Terms

Architecture Support for Domain-Specific Accelerator-Rich CMPs
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Heterogeneous (hybrid) systems

Recommendations

Accelerator-Rich Architectures: Opportunities and Progresses
DAC '14: Proceedings of the 51st Annual Design Automation Conference

To drastically improve energy efficiency, we believe future processors need to go beyond parallelization and provide architecture support for customization, enabling systems to adapt to different application domains. In particular, we believe future ...
Architecture support for accelerator-rich CMPs
DAC '12: Proceedings of the 49th Annual Design Automation Conference

This work discusses a hardware architectural support for accelerator-rich CMPs (ARC). First, we present a hardware resource management scheme for accelerator sharing. This scheme supports sharing and arbitration of multiple cores for a common set of ...
CHARM: a composable heterogeneous accelerator-rich microprocessor
ISLPED '12: Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design

This work discusses CHARM, a Composable Heterogeneous Accelerator-Rich Microprocessor design that provides scalability, flexibility, and design reuse in the space of accelerator-rich CMPs. CHARM features a hardware structure called the accelerator block ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems

ACM Transactions on Embedded Computing Systems Volume 13, Issue 4s

Special Issue on Real-Time and Embedded Technology and Applications, Domain-Specific Multicore Computing, Cross-Layer Dependable Embedded Systems, and Application of Concurrency to System Design (ACSD'13)

July 2014

571 pages

ISSN:1539-9087

EISSN:1558-3465

DOI:10.1145/2601432

Editors:
Sandeep K. Shukla
Virginia Tech, USA
,
Josep Carmona
Universitat Politècnica de Catalunya, Spain
,
Mihai Teodor Lazarescu
Politecnico di Torino, Italy
,
Marta Pietkiewicz-koutny
Newcastle University, UK

Issue’s Table of Contents

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 01 April 2014

Accepted: 01 December 2013

Revised: 01 November 2013

Received: 01 February 2013

Published in TECS Volume 13, Issue 4s

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

20
Total Citations
View Citations
953
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)3

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Gupta SDwarkadas S(2024)RELIEF: Relieving Memory Pressure In SoCs Via Data Movement-Aware Accelerator Scheduling2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00084(1063-1079)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00084
Siddiqui SDutta AChiriyath ASilbernagel DHemanth PMa OHoltom JLuhana NBliss D(2023)Hardware Implementation of RF Spectral Convergence System on DASH SoC2023 57th Asilomar Conference on Signals, Systems, and Computers10.1109/IEEECONF59524.2023.10477039(1552-1558)Online publication date: 29-Oct-2023
https://doi.org/10.1109/IEEECONF59524.2023.10477039
Venkataramani AChiriyath ADutta AHerschfelt ABliss D(2022)The DASH SoC: Enabling the Next Generation of Multi-Function RF Systems2022 56th Asilomar Conference on Signals, Systems, and Computers10.1109/IEEECONF56349.2022.10052029(905-912)Online publication date: 31-Oct-2022
https://doi.org/10.1109/IEEECONF56349.2022.10052029
Chiriyath AHerschfelt ASrinivas SBliss D(2021)Technological Advances to Facilitate Spectral Convergence2021 55th Asilomar Conference on Signals, Systems, and Computers10.1109/IEEECONF53345.2021.9723312(623-628)Online publication date: 31-Oct-2021
https://doi.org/10.1109/IEEECONF53345.2021.9723312
Nguyen TKumar ANeuendorffer SShannon L(2020)Maximizing the Serviceability of Partially Reconfigurable FPGA Systems in Multi-tenant EnvironmentProceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3373087.3375305(29-39)Online publication date: 23-Feb-2020
https://dl.acm.org/doi/10.1145/3373087.3375305
Arda SNK AGoksoy AMack JKumbhare NSartor AAkoglu AMarculescu ROgras U(2020)DS3: A System-Level Domain-Specific System-on-Chip Simulation FrameworkIEEE Transactions on Computers10.1109/TC.2020.2986963(1-1)Online publication date: 2020
https://doi.org/10.1109/TC.2020.2986963
Cong JFang ZHuang MWei PWu DYu C(2019)Customizable Computing—From Single Chip to DatacentersProceedings of the IEEE10.1109/JPROC.2018.2876372107:1(185-203)Online publication date: Jan-2019
https://doi.org/10.1109/JPROC.2018.2876372
Fang ZJavadi FCong JReinman G(2019)Understanding Performance Gains of Accelerator-Rich Architectures2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP)10.1109/ASAP.2019.00013(239-246)Online publication date: Jul-2019
https://doi.org/10.1109/ASAP.2019.00013
Yang PMarek-Sadowska M(2018)High-Performance Architecture Using Fast Dynamic Reconfigurable AcceleratorsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2018.281462726:7(1209-1222)Online publication date: 1-Jul-2018
https://dl.acm.org/doi/10.1109/TVLSI.2018.2814627
Le TNing RZhao DWu HBayoumi M(2017)Optimizing the heterogeneous network on-chip design in manycore architectures2017 30th IEEE International System-on-Chip Conference (SOCC)10.1109/SOCC.2017.8226033(184-189)Online publication date: Sep-2017
https://doi.org/10.1109/SOCC.2017.8226033
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Issue’s Table of Contents