research-article

Performance and power of cache-based reconfigurable computing

Authors:

Eric Dellinger,

Prasanna Sundararajan,

Ralph WittigAuthors Info & Claims

ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture

Pages 395 - 405

https://doi.org/10.1145/1555754.1555804

Published: 20 June 2009 Publication History

Abstract

Many-cache is a memory architecture that efficiently supports caching in commercially available FPGAs. It facilitates FPGA programming for high-performance computing (HPC) developers by providing them with memory performance that is greater and power consumption that is less than their current CPU platforms, but without sacrificing their familiar, C-based programming environment.

Many-cache creates multiple, multi-banked caches on top of an FGPA's small, independent memories, each targeting a particular data structure or region of memory in an application and each customized for the memory operations that access it. The caches are automatically generated from C source by the CHiMPS C-to-FPGA compiler.

This paper presents the analyses and optimizations of the CHiMPS compiler that construct many-cache caches. An architectural evaluation of CHiMPS-generated FPGAs demonstrates a performance advantage of 7.8x (geometric mean) over CPU-only execution of the same source code, FPGA power usage that is on average 4.1x less, and consequently performance per watt that is also greater, by a geometric mean of 21.3x.

References

[1]

X. Fan, W. D. Weber, and L. A. Barroso, "Power Provisioning for a Warehouse-sized Computer," in ISCA: Proceedings of the 34th Annual International Symposium on Computer Architecture, 2007.

Digital Library

[2]

L. A. Barroso, "The Price of Performance," in Queue, vol. 3, no. 7, 2005.

Digital Library

[3]

Handel-C Language Reference Manual, 4th ed., Agility, 2007.

[4]

Catapult Synthesis Datasheet, 10th ed., Mentor Graphics, 2006.

[5]

D. Soderman and Y. Panchul, "Implementing C Algorithms in Reconfigurable Hardware Using C2Verilog," in FCCM: Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines, 1998.

Digital Library

[6]

D. Ku, G. De Micheli, "HardwareC: A language for hardware design," Stanford, Tech. Rep. CSTL-TR-90-419, 1990.

Digital Library

[7]

S. Möhl, "The Mitrion-C Programming Language, Mitrionics Inc., Tech. Rep., 2005.

[8]

Z. Guo, B. Buyukkurt, W. Na jjar, and K. Vissers, Optimized Generation of Data-Path from C Codes for FPGAs," in Proceedings of the Conference on Design, Automation and Test in Europe, 2005.

Digital Library

[9]

B. A. Draper, A. P. W. Böhm, J. Hammes, W. A. Najjar, J. R. Beveridge, C. Ross, M. Chawathe, M. Desai, and J. Bins, "Compiling SA-C Programs to FPGAs: Performance Results," in ICVS: Proceedings of the Second International Workshop on Computer Vision Systems, 2001.

Digital Library

[10]

Implementing a Virtex-4 FX C-to-HDL Hardware Coprocessor Accelerator in a PowerPC Design, 2nd ed., Xilinx, 2007.

[11]

M. B. Gokhale, J. M. Stone, J. Arnold, and M. Kalinowski, "Stream-Oriented FPGA Computing in the Streams-C High Level Language," in FCCM: Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines, 2000.

Digital Library

[12]

D. S. Poznanovic, "Application Development on the SRC Computers, Inc. Systems," in IPDPS: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium, 2005.

Digital Library

[13]

M. Budiu, G. Venkataramani, T. Chelcea, and S. C. Goldstein, "Spatial Computation," in SIGOPS Operating Systems Review, 2004.

[14]

M. Mishra, T. J. Callahan, T. Chelcea, G. Venkataramani, S. C. Goldstein, and M. Budiu, "Tartan: Evaluating Spatial Computation for Whole Program Execution," in ASPLOS-XII: Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, 2006.

Digital Library

[15]

M. Budiu, P. V. Artigas, and S. C. Goldstein, "Dataflow: A Complement to Superscalar," in ISPASS: IEEE International Symposium on Performance Analysis of Systems and Software, 2005.

Digital Library

[16]

A. Putnam, D. Bennett, E. Dellinger, J. Mason, P. Sundarara jan, and S. Eggers, "CHiMPS: A High-Level Compilation Flow for Hybrid CPU/FPGA Architectures'," in FPL: International Conference on Field Programmable Logic and Applications, 2008.

Digital Library

[17]

A. H. Veen, The Misconstrued Semicolon: Reconciling Imperative Languages and Dataflow Machines. Mathematish Centrum, 1980.

Digital Library

[18]

I. MindShare and T. Shanley, The Unabridged Pentium 4. Addison-Wesley, 2005.

[19]

HyperTransport I/O Technology Overview, HyperTransportTM Consortium, 2004.

[20]

DRC RPU110 Datasheet, 1st ed., DRC Computer, 2007.

[21]

XD2000F FPGA Co-processor for AMD Socket F, 1st ed., XtremeData, 2007.

[22]

D. Slogsnat, A. Giese, and U. Brüning, "A Versatile, Low Latency HyperTransport Core," in FPGA: Proceedings of the 2007 ACM/SIGDA 15th International Symposium on Field Programmable Gate Arrays, 2007.

Digital Library

[23]

FSB-FPGA Integrated Development Platform Overview", Nallatech, 2008.

[24]

Latency Comparison Between HyperTransport and PCI-Express In Communication Systems", HyperTransport Consortium, 2006.

[25]

S. Trimberger, "Redefining the FPGA," in FPL: International Conference on Field Programmable Logic and Applications, 2007.

[26]

C. Chang, J. Wawrzynek, and R. W. Brodersen, "BEE2: A High-End Reconfigurable Computing System," in IEEE Design&Test of Computers, 2005.

Digital Library

[27]

Virtex-5 FPGA User Guide, 4th ed., Xilinx, 2008.

[28]

Quad-Core Intel Xeon Processor 7300 Series, Intel, 2007.

[29]

CoCentric SystemC Compiler RTL User and Modeling Guide, 2003 ed., Synopsys, 2003.

[30]

R. Razdan and M. D. Smith, "A High-Performance Microarchitecture with Hardware-Programmable Functional Units," in MICRO: Proceedings of the 27th Annual International Symposium on Microarchitecture, 1994.

Digital Library

[31]

N. Clark, A. Hormati, and S. Mahlke, "VEAL: Virtualized Execution Accelerator for Loops," in ISCA: Proceedings of the 35th International Symposium on Computer Architecture, 2008.

Digital Library

[32]

J. Hauser and J. Wawrzynek, "Garp: a MIPS Processor with a Reconfigurable Coprocessor," in FCCM: Proceedings of the 5th Annual IEEE Symposium on FPGAs for Custom Computing Machines, 1997.

Digital Library

[33]

Y. Li, T. Callahan, E. Darnell, R. Harr, U. Kurkure, and J. Stockwood, "Hardware-Software Co-Design of Embedded Reconfigurable Architectures," in DAC: Proceedings of the 37th Conference on Design Automation, 2000.

Digital Library

[34]

M. B. Gokhale and J. M. Stone, "NAPA C: Compiling for a Hybrid RISC/FPGA Architecture," in FCCM: Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines, 1998.

Digital Library

[35]

Nios II C2H Compiler Users Guide, 1st ed., Altera, 2007.

[36]

S. C. Goldstein, H. Schmit, M. Moe, M. Budiu, S. Cadambi, R. R. Taylor, and R. Laufer, "PipeRench: a Co-Processor for Streaming Multimedia Acceleration," in ISCA: Proceedings of the 26th Annual International Symposium on Computer Architecture, 1999.

Digital Library

Cited By

Margerm SSharifian AGuha AShriraman APokam GOskin MInoue K(2018)TAPASProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00028(245-257)Online publication date: 20-Oct-2018
https://dl.acm.org/doi/10.1109/MICRO.2018.00028
Ciobanu CStramondo Gde Laat CVarbanescu A(2018)MAX-PolyMem: High-Bandwidth Polymorphic Parallel Memories for DFEs2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2018.00025(107-114)Online publication date: May-2018
https://doi.org/10.1109/IPDPSW.2018.00025
Jhamb MSharma RGupta A(2017)A high level implementation and performance evaluation of level-I asynchronous cache on FPGAJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2015.06.00329:3(410-425)Online publication date: Jul-2017
https://doi.org/10.1016/j.jksuci.2015.06.003
Show More Cited By

Index Terms

Performance and power of cache-based reconfigurable computing
1. Computer systems organization
  1. Architectures
    1. Other architectures
2. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

Performance and power of cache-based reconfigurable computing

Many-cache is a memory architecture that efficiently supports caching in commercially available FPGAs. It facilitates FPGA programming for high-performance computing (HPC) developers by providing them with memory performance that is greater and power ...
Performance and power of cache-based reconfigurable computing
FPGA '09: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

CHiMPS is a C-based compiler for high-performance computing (HPC) on heterogeneous CPU-FPGA computing platforms. CHiMPS efficiently supports random accesses to main memory through the many-cache memory model, enabling a broader range of applications to ...
LINQits: big data on little clients
ICSA '13

We present LINQits, a flexible hardware template that can be mapped onto programmable logic or ASICs in a heterogeneous system-on-chip for a mobile device or server. Unlike fixed-function accelerators, LINQits accelerates a domain-specific query ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture

June 2009

510 pages

ISBN:9781605585260

DOI:10.1145/1555754

General Chair:
Steve Keckler
University of Texas at Austin
,
Program Chair:
Luiz André Barroso
Google Inc.

ACM SIGARCH Computer Architecture News Volume 37, Issue 3
June 2009
495 pages
ISSN:0163-5964
DOI:10.1145/1555815
Issue’s Table of Contents

Copyright © 2009 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2009

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ISCA '09

Sponsor:

ISCA '09: The 36th Annual International Symposium on Computer Architecture

June 20 - 24, 2009

TX, Austin, USA

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Sponsor:
sigarch

The 52nd Annual International Symposium on Computer Architecture

June 21 - 25, 2025

Tokyo , Japan

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

34
Total Citations
View Citations
1,558
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)0

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Margerm SSharifian AGuha AShriraman APokam GOskin MInoue K(2018)TAPASProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00028(245-257)Online publication date: 20-Oct-2018
https://dl.acm.org/doi/10.1109/MICRO.2018.00028
Ciobanu CStramondo Gde Laat CVarbanescu A(2018)MAX-PolyMem: High-Bandwidth Polymorphic Parallel Memories for DFEs2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2018.00025(107-114)Online publication date: May-2018
https://doi.org/10.1109/IPDPSW.2018.00025
Jhamb MSharma RGupta A(2017)A high level implementation and performance evaluation of level-I asynchronous cache on FPGAJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2015.06.00329:3(410-425)Online publication date: Jul-2017
https://doi.org/10.1016/j.jksuci.2015.06.003
Chen TSuh GHsu WYang CLipasti MLee H(2016)Efficient data supply for hardware accelerators with prefetching and access/execute decouplingThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195694(1-12)Online publication date: 15-Oct-2016
https://dl.acm.org/doi/10.5555/3195638.3195694
Chen TSuh G(2016)Efficient data supply for hardware accelerators with prefetching and access/execute decoupling2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO.2016.7783749(1-12)Online publication date: Oct-2016
https://doi.org/10.1109/MICRO.2016.7783749
Chen RShao ZLi T(2016)Bridging the I/O performance gap for big data workloads: A new NVDIMM-based approach2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO.2016.7783712(1-12)Online publication date: Oct-2016
https://doi.org/10.1109/MICRO.2016.7783712
Kumar SShriraman AVedula N(2015)FusionACM SIGARCH Computer Architecture News10.1145/2872887.275042143:3S(733-745)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2872887.2750421
Kumar SShriraman AVedula NMarr DAlbonesi D(2015)FusionProceedings of the 42nd Annual International Symposium on Computer Architecture10.1145/2749469.2750421(733-745)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2749469.2750421
Farmahini-Farahani AKim NMorrow K(2014)Energy-efficient reconfigurable cache architectures for accelerator-enabled embedded systems2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS.2014.6844485(211-220)Online publication date: Mar-2014
https://doi.org/10.1109/ISPASS.2014.6844485
Kapre NDeHon A(2012)${\rm SPICE}^2$IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2011.217319931:1(9-22)Online publication date: 1-Jan-2012
https://dl.acm.org/doi/10.1109/TCAD.2011.2173199
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten