skip to main content
research-article

Architecture Support for Domain-Specific Accelerator-Rich CMPs

Published: 01 April 2014 Publication History

Abstract

This work discusses hardware architectural support for domain-specific accelerator-rich CMPs. First, we present a hardware resource management scheme for sharing of loosely coupled accelerators and arbitration of multiple requesting cores. Second, we present a mechanism for accelerator virtualization. This allows multiple accelerators to efficiently compose a larger virtual accelerator out of multiple smaller accelerators, as well as to collaborate as multiple copies of a simple accelerator. All of this work is supported by a fully automated simulation tool-chain for both accelerator generation and management. We present the applicability of our approach to four different application domains: medical imaging, commercial, computer vision, and navigation. Our results demonstrate large performance improvements and energy savings over a software implementation. We also show additional improvements that result from enhanced load balancing and simplification of the communication between the core and the arbitration mechanism.

References

[1]
D. Bouris, A. Nikitakis, and I. Papaefstathiou. 2010. Fast and efficient fpga-based feature detection employing the surf algorithm. In Proceedings of the 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM'10). 3--10.
[2]
N. Clark, A. Hormati, and S. Mahlke. 2008. VEAL: Virtualized execution accelerator for loops. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA'08). 389--400.
[3]
Convey Computer. 2008. Convey computer. http://conveycomputer.com/.
[4]
J. Cong. 2009. FPGA-based hardware acceleration of lithographic aerial image simulation. ACM Trans. Reconfig. Technol. Syst. 3, 1--29.
[5]
J. Cong, B. Liu, S. Neuendorffer, J. Noguera, K. Vissers, and Z. Zhang. 2011. High-level synthesis for fpgas: From prototyping to deployment. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 30, 4, 473--491.
[6]
J. Cong, M. A. Ghodrat, M. Gill, B. Grigorian, and G. Reinman. 2012. Architecture support for accelerator-rich cmps. In Proceedings of the 49th Annual Design Automation Conference (DAC'12).
[7]
J. W. Cooley and J. W. Tukey. 1965. An algorithm for the machine calculation of complex fourier series. Math. Comput. 19, 297--301.
[8]
M. Frigo and S. G. Johnson. 2005. The design and implementation of fftw3. Proc. IEEE 93, 2, 216--231.
[9]
P. Garcia and K. Compton. 2008. Kernel sharing on reconfigurable multiprocessor systems. In Proceedings of the International Conference on ICECE Technology (FPT'08). 225--232.
[10]
V. Govindaraju, C. H. Ho, and K. Sankaralingam. 2011. Dynamically specialized datapaths for energy efficient computing. In Proceedings of the 17th International Symposium on High Performance Computer Architecture (HPCA'11). 503--514.
[11]
J. R. Hauser and J. Wawrzynek. 1997. Garp: A mips processor with a reconfigurable coprocessor. In Proceedings of the 5th Annual IEEE Field-Programmable Custom Computing Machines (FCCM'97). 12--21.
[12]
ITRS. 2011. ITRS system drivers. http://www.itrs.net/Links/2011ITRS/2011Chapters/2011SysDrivers.pdf.
[13]
W. Jiang and V. K. Prasanna. 2009. Large-scale wire-speed packet classification on fpgas. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA'09). 219--228.
[14]
C. Johnson, D. H. Allen, J. Brown, S. Vanderwiel, R. Hoover, et al. 2010. A wire-speed power tm processor: 2.3ghz 45nm soi with 16 cores and 64 threads. In Proceedings of the IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC'10). 104--105.
[15]
T. Johnson and U. Nawathe. 2007. An 8-core, 64-thread, 64-bit power efficient sparc soc (niagara2). In Proceedings of the International Symposium on Physical Design (ISPD'07). 2--2.
[16]
S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi. 2009. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'09). 469--480.
[17]
P. S. Magnusson, M. Christensson, J. Eskilson, D. Forsgren, G. Hallberg, et al. 2002. Simics: A full system simulation platform. Comput. 35, 2, 50--58.
[18]
M. M. K. Martin, D. J. Sorin, B. M. Beckmann, M. R. Marty, M. Xu, et al. 2005. Multifacet's general execution-driven multiprocessor simulator toolset. SIGARCH Comput. Archit. News 33, 4, 92--99.
[19]
Nallatech. 2011. Nallatech fsb - Development systems. http://www.nallatech.com/Intel-Xeon-FSB-Socket- Fillers/fsb-development-systems.html.
[20]
H. Park, Y. Park, and S. Mahlke. 2009. Polymorphic pipeline array: A flexible multicore accelerator with virtualized execution for mobile multimedia application. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'09). 370--380.
[21]
M. Puschel, J. M. F. Moura, J. R. Johnson, D. Padua, M. M. Veloso, et al. 2005. SPIRAL: Code generation for dsp transforms. Proc. IEEE 2, 232--275.
[22]
A. Ramirez, F. Cabarcas, B. Juurlink, M. A. Mesa, F. Sanchez, et al. 2010. The sarc architecture. IEEE Micro. 30, 5, 16--29.
[23]
P. Schaumont and I. Verbauwhede. 2003. Domain-specific codesign for embedded security. Comput. 36, 4, 68--74.
[24]
L. Seiler, D. Carmean, E. Sprangle, T. Forsyth, M. Abrash, et al. 2009. Larrabee: A many-core x86 architecture for visual computing. IEEE Micro. 29, 10--21.
[25]
P. M. Stillwell, V. Chadha, O. Tickoo, S. Zhang, R., Illikkal et al. 2009. HiPPAI: High performance portable accelerator interface for socs. In Proceedings of the International Conference on High Performance Computing (HiPC'09). 109--118.
[26]
N. Sun and C.-C. Lin. 2007. Using the cryptographic accelerators in the ultrasparc t1 and t2 processors. Sun BluePrints Online, november. http://www.oracle.com/technetwork/systems/archive/a11-014-crypto-accelerators-439765.pdf.
[27]
Synopsys. 2013. Synopsys design compiler. http://www.synopsys.com/Tools/Pages/default.aspx.
[28]
G. Venkatesh, J. Sampson, N. Goulding, S. Garcia, V. Bryksin, et al. 2010. Conservation cores: Reducing the energy of mature computations. In Proceedings of the 15th International Conference edition of ASPLOS on Architectural Support for Programming Languages and Operating Systems (ASPLOS'10). 205--218.
[29]
P. H. Wang, J. D. Collins, G. N. Chinya, H. Jiang, X. Tian, et al. 2007. EXOCHI: Architecture and programming environment for a heterogeneous multi-core multithreaded system. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'07). 156--166.

Cited By

View all
  • (2024)RELIEF: Relieving Memory Pressure In SoCs Via Data Movement-Aware Accelerator Scheduling2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00084(1063-1079)Online publication date: 2-Mar-2024
  • (2023)Hardware Implementation of RF Spectral Convergence System on DASH SoC2023 57th Asilomar Conference on Signals, Systems, and Computers10.1109/IEEECONF59524.2023.10477039(1552-1558)Online publication date: 29-Oct-2023
  • (2022)The DASH SoC: Enabling the Next Generation of Multi-Function RF Systems2022 56th Asilomar Conference on Signals, Systems, and Computers10.1109/IEEECONF56349.2022.10052029(905-912)Online publication date: 31-Oct-2022
  • Show More Cited By

Index Terms

  1. Architecture Support for Domain-Specific Accelerator-Rich CMPs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Embedded Computing Systems
    ACM Transactions on Embedded Computing Systems  Volume 13, Issue 4s
    Special Issue on Real-Time and Embedded Technology and Applications, Domain-Specific Multicore Computing, Cross-Layer Dependable Embedded Systems, and Application of Concurrency to System Design (ACSD'13)
    July 2014
    571 pages
    ISSN:1539-9087
    EISSN:1558-3465
    DOI:10.1145/2601432
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 01 April 2014
    Accepted: 01 December 2013
    Revised: 01 November 2013
    Received: 01 February 2013
    Published in TECS Volume 13, Issue 4s

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Chip multiprocessor
    2. accelerator sharing
    3. accelerator virtualization
    4. hardware accelerators

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)27
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 03 Mar 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)RELIEF: Relieving Memory Pressure In SoCs Via Data Movement-Aware Accelerator Scheduling2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00084(1063-1079)Online publication date: 2-Mar-2024
    • (2023)Hardware Implementation of RF Spectral Convergence System on DASH SoC2023 57th Asilomar Conference on Signals, Systems, and Computers10.1109/IEEECONF59524.2023.10477039(1552-1558)Online publication date: 29-Oct-2023
    • (2022)The DASH SoC: Enabling the Next Generation of Multi-Function RF Systems2022 56th Asilomar Conference on Signals, Systems, and Computers10.1109/IEEECONF56349.2022.10052029(905-912)Online publication date: 31-Oct-2022
    • (2021)Technological Advances to Facilitate Spectral Convergence2021 55th Asilomar Conference on Signals, Systems, and Computers10.1109/IEEECONF53345.2021.9723312(623-628)Online publication date: 31-Oct-2021
    • (2020)Maximizing the Serviceability of Partially Reconfigurable FPGA Systems in Multi-tenant EnvironmentProceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3373087.3375305(29-39)Online publication date: 23-Feb-2020
    • (2020)DS3: A System-Level Domain-Specific System-on-Chip Simulation FrameworkIEEE Transactions on Computers10.1109/TC.2020.2986963(1-1)Online publication date: 2020
    • (2019)Customizable Computing—From Single Chip to DatacentersProceedings of the IEEE10.1109/JPROC.2018.2876372107:1(185-203)Online publication date: Jan-2019
    • (2019)Understanding Performance Gains of Accelerator-Rich Architectures2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP)10.1109/ASAP.2019.00013(239-246)Online publication date: Jul-2019
    • (2018)High-Performance Architecture Using Fast Dynamic Reconfigurable AcceleratorsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2018.281462726:7(1209-1222)Online publication date: 1-Jul-2018
    • (2017)Optimizing the heterogeneous network on-chip design in manycore architectures2017 30th IEEE International System-on-Chip Conference (SOCC)10.1109/SOCC.2017.8226033(184-189)Online publication date: Sep-2017
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media