skip to main content
10.1145/1555754.1555804acmconferencesArticle/Chapter ViewAbstractPublication PagesiscaConference Proceedingsconference-collections
research-article

Performance and power of cache-based reconfigurable computing

Published: 20 June 2009 Publication History

Abstract

Many-cache is a memory architecture that efficiently supports caching in commercially available FPGAs. It facilitates FPGA programming for high-performance computing (HPC) developers by providing them with memory performance that is greater and power consumption that is less than their current CPU platforms, but without sacrificing their familiar, C-based programming environment.
Many-cache creates multiple, multi-banked caches on top of an FGPA's small, independent memories, each targeting a particular data structure or region of memory in an application and each customized for the memory operations that access it. The caches are automatically generated from C source by the CHiMPS C-to-FPGA compiler.
This paper presents the analyses and optimizations of the CHiMPS compiler that construct many-cache caches. An architectural evaluation of CHiMPS-generated FPGAs demonstrates a performance advantage of 7.8x (geometric mean) over CPU-only execution of the same source code, FPGA power usage that is on average 4.1x less, and consequently performance per watt that is also greater, by a geometric mean of 21.3x.

References

[1]
X. Fan, W. D. Weber, and L. A. Barroso, "Power Provisioning for a Warehouse-sized Computer," in ISCA: Proceedings of the 34th Annual International Symposium on Computer Architecture, 2007.
[2]
L. A. Barroso, "The Price of Performance," in Queue, vol. 3, no. 7, 2005.
[3]
Handel-C Language Reference Manual, 4th ed., Agility, 2007.
[4]
Catapult Synthesis Datasheet, 10th ed., Mentor Graphics, 2006.
[5]
D. Soderman and Y. Panchul, "Implementing C Algorithms in Reconfigurable Hardware Using C2Verilog," in FCCM: Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines, 1998.
[6]
D. Ku, G. De Micheli, "HardwareC: A language for hardware design," Stanford, Tech. Rep. CSTL-TR-90-419, 1990.
[7]
S. Möhl, "The Mitrion-C Programming Language, Mitrionics Inc., Tech. Rep., 2005.
[8]
Z. Guo, B. Buyukkurt, W. Na jjar, and K. Vissers, Optimized Generation of Data-Path from C Codes for FPGAs," in Proceedings of the Conference on Design, Automation and Test in Europe, 2005.
[9]
B. A. Draper, A. P. W. Böhm, J. Hammes, W. A. Najjar, J. R. Beveridge, C. Ross, M. Chawathe, M. Desai, and J. Bins, "Compiling SA-C Programs to FPGAs: Performance Results," in ICVS: Proceedings of the Second International Workshop on Computer Vision Systems, 2001.
[10]
Implementing a Virtex-4 FX C-to-HDL Hardware Coprocessor Accelerator in a PowerPC Design, 2nd ed., Xilinx, 2007.
[11]
M. B. Gokhale, J. M. Stone, J. Arnold, and M. Kalinowski, "Stream-Oriented FPGA Computing in the Streams-C High Level Language," in FCCM: Proceedings of the 2000 IEEE Symposium on Field-Programmable Custom Computing Machines, 2000.
[12]
D. S. Poznanovic, "Application Development on the SRC Computers, Inc. Systems," in IPDPS: Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium, 2005.
[13]
M. Budiu, G. Venkataramani, T. Chelcea, and S. C. Goldstein, "Spatial Computation," in SIGOPS Operating Systems Review, 2004.
[14]
M. Mishra, T. J. Callahan, T. Chelcea, G. Venkataramani, S. C. Goldstein, and M. Budiu, "Tartan: Evaluating Spatial Computation for Whole Program Execution," in ASPLOS-XII: Proceedings of the 12th International Conference on Architectural Support for Programming Languages and Operating Systems, 2006.
[15]
M. Budiu, P. V. Artigas, and S. C. Goldstein, "Dataflow: A Complement to Superscalar," in ISPASS: IEEE International Symposium on Performance Analysis of Systems and Software, 2005.
[16]
A. Putnam, D. Bennett, E. Dellinger, J. Mason, P. Sundarara jan, and S. Eggers, "CHiMPS: A High-Level Compilation Flow for Hybrid CPU/FPGA Architectures'," in FPL: International Conference on Field Programmable Logic and Applications, 2008.
[17]
A. H. Veen, The Misconstrued Semicolon: Reconciling Imperative Languages and Dataflow Machines. Mathematish Centrum, 1980.
[18]
I. MindShare and T. Shanley, The Unabridged Pentium 4. Addison-Wesley, 2005.
[19]
HyperTransport I/O Technology Overview, HyperTransportTM Consortium, 2004.
[20]
DRC RPU110 Datasheet, 1st ed., DRC Computer, 2007.
[21]
XD2000F FPGA Co-processor for AMD Socket F, 1st ed., XtremeData, 2007.
[22]
D. Slogsnat, A. Giese, and U. Brüning, "A Versatile, Low Latency HyperTransport Core," in FPGA: Proceedings of the 2007 ACM/SIGDA 15th International Symposium on Field Programmable Gate Arrays, 2007.
[23]
FSB-FPGA Integrated Development Platform Overview", Nallatech, 2008.
[24]
Latency Comparison Between HyperTransport and PCI-Express In Communication Systems", HyperTransport Consortium, 2006.
[25]
S. Trimberger, "Redefining the FPGA," in FPL: International Conference on Field Programmable Logic and Applications, 2007.
[26]
C. Chang, J. Wawrzynek, and R. W. Brodersen, "BEE2: A High-End Reconfigurable Computing System," in IEEE Design&Test of Computers, 2005.
[27]
Virtex-5 FPGA User Guide, 4th ed., Xilinx, 2008.
[28]
Quad-Core Intel Xeon Processor 7300 Series, Intel, 2007.
[29]
CoCentric SystemC Compiler RTL User and Modeling Guide, 2003 ed., Synopsys, 2003.
[30]
R. Razdan and M. D. Smith, "A High-Performance Microarchitecture with Hardware-Programmable Functional Units," in MICRO: Proceedings of the 27th Annual International Symposium on Microarchitecture, 1994.
[31]
N. Clark, A. Hormati, and S. Mahlke, "VEAL: Virtualized Execution Accelerator for Loops," in ISCA: Proceedings of the 35th International Symposium on Computer Architecture, 2008.
[32]
J. Hauser and J. Wawrzynek, "Garp: a MIPS Processor with a Reconfigurable Coprocessor," in FCCM: Proceedings of the 5th Annual IEEE Symposium on FPGAs for Custom Computing Machines, 1997.
[33]
Y. Li, T. Callahan, E. Darnell, R. Harr, U. Kurkure, and J. Stockwood, "Hardware-Software Co-Design of Embedded Reconfigurable Architectures," in DAC: Proceedings of the 37th Conference on Design Automation, 2000.
[34]
M. B. Gokhale and J. M. Stone, "NAPA C: Compiling for a Hybrid RISC/FPGA Architecture," in FCCM: Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines, 1998.
[35]
Nios II C2H Compiler Users Guide, 1st ed., Altera, 2007.
[36]
S. C. Goldstein, H. Schmit, M. Moe, M. Budiu, S. Cadambi, R. R. Taylor, and R. Laufer, "PipeRench: a Co-Processor for Streaming Multimedia Acceleration," in ISCA: Proceedings of the 26th Annual International Symposium on Computer Architecture, 1999.

Cited By

View all
  • (2018)TAPASProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00028(245-257)Online publication date: 20-Oct-2018
  • (2018)MAX-PolyMem: High-Bandwidth Polymorphic Parallel Memories for DFEs2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2018.00025(107-114)Online publication date: May-2018
  • (2017)A high level implementation and performance evaluation of level-I asynchronous cache on FPGAJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2015.06.00329:3(410-425)Online publication date: Jul-2017
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture
June 2009
510 pages
ISBN:9781605585260
DOI:10.1145/1555754
  • cover image ACM SIGARCH Computer Architecture News
    ACM SIGARCH Computer Architecture News  Volume 37, Issue 3
    June 2009
    495 pages
    ISSN:0163-5964
    DOI:10.1145/1555815
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 June 2009

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. c-to-gates
  2. c-to-hardware
  3. caches
  4. co-processor accelerator
  5. fpga
  6. many-cache
  7. synthesis compiler

Qualifiers

  • Research-article

Conference

ISCA '09
Sponsor:

Acceptance Rates

Overall Acceptance Rate 543 of 3,203 submissions, 17%

Upcoming Conference

ISCA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)0
Reflects downloads up to 28 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2018)TAPASProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00028(245-257)Online publication date: 20-Oct-2018
  • (2018)MAX-PolyMem: High-Bandwidth Polymorphic Parallel Memories for DFEs2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW.2018.00025(107-114)Online publication date: May-2018
  • (2017)A high level implementation and performance evaluation of level-I asynchronous cache on FPGAJournal of King Saud University - Computer and Information Sciences10.1016/j.jksuci.2015.06.00329:3(410-425)Online publication date: Jul-2017
  • (2016)Efficient data supply for hardware accelerators with prefetching and access/execute decouplingThe 49th Annual IEEE/ACM International Symposium on Microarchitecture10.5555/3195638.3195694(1-12)Online publication date: 15-Oct-2016
  • (2016)Efficient data supply for hardware accelerators with prefetching and access/execute decoupling2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO.2016.7783749(1-12)Online publication date: Oct-2016
  • (2016)Bridging the I/O performance gap for big data workloads: A new NVDIMM-based approach2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO.2016.7783712(1-12)Online publication date: Oct-2016
  • (2015)FusionACM SIGARCH Computer Architecture News10.1145/2872887.275042143:3S(733-745)Online publication date: 13-Jun-2015
  • (2015)FusionProceedings of the 42nd Annual International Symposium on Computer Architecture10.1145/2749469.2750421(733-745)Online publication date: 13-Jun-2015
  • (2014)Energy-efficient reconfigurable cache architectures for accelerator-enabled embedded systems2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS.2014.6844485(211-220)Online publication date: Mar-2014
  • (2012)${\rm SPICE}^2$IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2011.217319931:1(9-22)Online publication date: 1-Jan-2012
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media