research-article

Area-Efficient Near-Associative Memories on FPGAs

Authors:
Udit Dhawan

University of Pennsylvania, Philadelphia, PA

University of Pennsylvania, Philadelphia, PA
View Profile

,
André Dehon

University of Pennsylvania, Philadelphia, PA

University of Pennsylvania, Philadelphia, PA
View Profile

ACM Transactions on Reconfigurable Technology and Systems Volume 7 Issue 4Article No.: 30pp 1–22https://doi.org/10.1145/2629471

Published:23 January 2015Publication History

ACM Transactions on Reconfigurable Technology and Systems

Abstract

Associative memories can map sparsely used keys to values with low latency but can incur heavy area overheads. The lack of customized hardware for associative memories in today’s mainstream FPGAs exacerbates the overhead cost of building these memories using the fixed address match BRAMs. In this article, we develop a new, FPGA-friendly, memory system architecture based on a multiple hash scheme that is able to achieve near-associative performance without the area-delay overheads of a fully associative memory on FPGAs. At the same time, we develop a novel memory management algorithm that allows us to statistically mimic an associative memory. Using the proposed architecture as a 64KB L1 data cache, we show that it is able to achieve near-associative miss rates while consuming 3--13 × fewer FPGA memory resources for a set of benchmark programs from the SPEC CPU2006 suite than fully associative memories generated by the Xilinx Coregen tool. Benefits for our architecture increase with key width, allowing area reduction up to 100 ×. Mapping delay is also reduced to 3.7ns for a 1,024-entry flat version or 6.1ns for an area-efficient version compared to 17.6ns for a fully associative memory for a 64-bit key on a Xilinx Virtex 6 device.

Supplemental Material

Available for Download

zip

dhawan.zip (91.8 KB)

Supplemental movie, appendix, image and software files for, Area-Efficient Near-Associative Memories on FPGAs

References

Yossi Azar, Andrei Z. Border, Anna R. Karlin, and Eli Upfal. 1994. Balanced allocation. In Proceedings of the ACM Symposium on Theory of Computing. 593--602. Google ScholarDigital Library
Steven Battle, Andrew D. Hilton, Mark Hempstead, and Amir Roth. 2012. Flexible register management using reference counting. In Proceedings of the International Symposium on High-Performance Computer Architecture. IEEE, 273--284. DOI: http://dx.doi.org/10.1109/HPCA.2012.6169033 Google ScholarDigital Library
Burton H. Bloom. 1970. Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13, 7 (July 1970), 422--426. Google ScholarDigital Library
Bluespec, Inc. 2012. Bluespec SystemVerilog 2012.01.A. Retrieved from http://www.bluespec.com.Google Scholar
Suzanne Bunton and Gaetano Borriello. 1992. Practical dictionary management for hardware data compression. Commun. ACM 35, 1 (1992), 95--104. DOI: http://dx.doi.org/10.1145/129617.129622. Google ScholarDigital Library
Bernard Chazelle, Joe Kilian, Ronitt Rubinfeld, and Ayellet Tal. 2004. The Bloomier filter: An efficient data structure for static support lookup tables. In Proceedings of ACM-SIAM Symposium on Discrete Algorithms (SODA’04). Society for Industrial and Applied Mathematics, Philadelphia, PA, 30--39. Google ScholarDigital Library
Zbigniew J. Czech, George Havas, and Bohdan S. Majewski. 1992. An optimal algorithm for generating minimal perfect hash functions. Inform. Process. Lett. 43, 5 (1992), 257--264. Google ScholarDigital Library
Udit Dhawan and André DeHon. 2013. Area-efficient near-associative memories on FPGAs. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. 191--200. Google ScholarDigital Library
Udit Dhawan, Albert Kwon, Edin Kadric, Cătălin Hriţcu, Benjamin C. Pierce, Jonathan M. Smith, Gregory Malecha, Greg Morrisett, Thomas F. Knight, Jr., Andrew Sutherland, Tom Hawkins, Amanda Zyxnfryx, David Wittenberg, Peter Trei, Sumit Ray, Greg Sullivan, and André DeHon. 2012. Hardware support for safety interlocks and introspection. In Proceedings of the SASO Workshop on Adaptive Host and Network Security. http://ic.ese.upenn.edu/pdf/interlocks_ahns2012.pdf. Google ScholarDigital Library
Li Fan, Pai Cao, Jussara Almeida, and Andrei Z. Border. 2000. Summary cache: A scalable wide-area web cache sharing protocol. IEEE/ACM Trans. Networking 8, 3 (2000), 281--293. Google ScholarDigital Library
John L. Henning. 2006. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News 34, 4 (September 2006), 1--17. DOI: http://dx.doi.org/10.1145/1186736.1186737 Google ScholarDigital Library
J. Ho and G. Lemieux. 2008. PERG: A scalable FPGA-based pattern-matching engine with consolidated Bloomier filters. In Proceedings of the International Conference on Field-Programmable Technology. 73--80. DOI: http://dx.doi.org/10.1109/FPT.2008.4762368Google Scholar
Adam Kirsch and Michael Mitzenmacher. 2010. The power of one move: Hashing schemes for hardware. IEEE/ACM Trans. Networking 18, 6 (2010), 1752--1765. Google ScholarDigital Library
Charles Eric LaForest and Gregory Steffan. 2012. Octavo: An FPGA-centric processor family. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. 97--106. Google ScholarDigital Library
Shih-Lien L. Lu, Peter Yiannacouras, Taeweon Suh, Rolf Kassa, and Michael Konow. 2008. A desktop computer with a reconfigurable Pentium. ACM Transactions on Reconfigurable Technology and Systems 1, 1 (March 2008). Google ScholarDigital Library
Michael Mitzenmacher. 1999. Studying balanced allocation with differential equations. Combin. Probab. Comput. 8, 5 (1999), 473--482. Google ScholarDigital Library
Jad Naous, David Erickson, G. Adam Covington, Guido Appenzeller, and Nick McKeown. 2008. Implementing an OpenFlow switch on the NetFPGA platform. In Proceedings of the ACM/IEEE Symposium on Architectures for Networking and Communications Systems. 1--9. DOI: http://dx.doi.org/10.1145/1477942.1477944 Google ScholarDigital Library
Daniel Sanchez and Christos Kozyrakis. 2010. The ZCache: Decoupling ways and associativity. In Proceedings of the International Symposium on Microarchitecture. 196--207. Google ScholarDigital Library
André Seznec. 1993. A case for two-way skewed-associative caches. In Proceedings of the International Symposium on Computer Architecture. 169--178. Google ScholarDigital Library
André Seznec and François Bodin. 1993. Skewed-associative caches. In Parallel Architectures and Languages Europe. 304--316. DOI: http://dx.doi.org/10.1007/3-540-56891-3_24 Google ScholarDigital Library
Haoyu Song, Sarang Dharmapurikar, Jonathan Turner, and John Lockwood. 2005. Fast hash table lookup using extended bloom filter: An aid to network processing. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications. 181--192. DOI: http://dx.doi.org/10.1145/1080091.1080114 Google ScholarDigital Library
John Wawrzynek, David Patterson, Mark Oskin, Shih-Lien Lu, Christoforos Kozyrakis, James C. Hoe, Derek Chiou, and Krste Asanović. 2007. RAMP: Research accelerator for multiple processors. IEEE Micro 27, 2 (2007), 46--57. Google ScholarDigital Library
Sewook Wee, Jared Casper, Njuguna Njoroge, Yuriy Tesylar, Daxia Ge, Christos Kozyrakis, and Kunle Olukotun. 2007. A practical FPGA based framework for novel CMP research. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. 116--125. Google ScholarDigital Library
Rondald Wunderlich and James C. Hoe. 2004. In-system FPGA prototyping of an itanium microarchitecture. In Proceedings of the International Conference on Computer Design. 288--294. Google ScholarDigital Library
Xilinx, Inc. 2011a. Parameterizable Content-Addressable Memory. Xilinx, Inc., 2100 Logic Drive, San Jose, CA 95124. XAPP 1151 http://www.xilinx.com/support/documentation/application_notes/xapp1 151_Param_CAM.pdf.Google Scholar
Xilinx, Inc. 2011b. Virtex-6 FPGA Data Sheet: DC and Switching Characteristics. Xilinx, Inc., 2100 Logic Drive, San Jose, CA 95124.Google Scholar
Peter Yiannacouras and Jonathan Rose. 2003. A parameterized automatic cache generator for FPGAs. In Proceedings of the International Conference on Field-Programmable Technology. 324--327.Google ScholarCross Ref
Peter Yiannacouras, J. Gregory Steffan, and Jonathan Rose. 2007. Exploration and customization of FPGA-based soft processors. IEEE Transactions on Computer-Aided Design 26, 2 (2007), 266--277. Google ScholarDigital Library

Index Terms

Area-Efficient Near-Associative Memories on FPGAs
1. Hardware
  1. Very large scale integration design
    1. Application-specific VLSI designs
2. Information systems
  1. Information storage systems
    1. Record storage systems
      1. Record storage alternatives
        Hashed file organization

Recommendations

Area-efficient near-associative memories on FPGAs
FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

Associative memories can map sparsely used keys to values with low latency but can incur heavy area overheads. The lack of customized hardware for associative memories in today's mainstream FPGAs exacerbates the overhead cost of building these memories ...
Read More
Efficient multi-ported memories for FPGAs
FPGA '10: Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays

Multi-ported memories are challenging to implement with FPGAs since the provided block RAMs typically have only two ports. We present a thorough exploration of the design space of FPGA-based soft multi-ported memories by evaluating conventional ...
Read More
A High Speed Reconfigurable Firewall Based On Parameterizable FPGA-based Content Addressable Memories

A technique for implementing a Content Addressable Memory (CAM) on an FPGA is described. The CAM is highly parameterizable, allowing varying word widths, memory depths and operations to be implemented depending upon the requirements of the target ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
ACM Transactions on Reconfigurable Technology and Systems Volume 7, Issue 4
January 2015
213 pages
ISSN:1936-7406
EISSN:1936-7414
DOI:10.1145/2699137
Editor:
Steve Wilton
Department of Electrical and Computer Engineering/University of British Columbia/Kaiser, Main Mall/Vancouver, Canada
Issue’s Table of Contents
Copyright © 2015 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 January 2015
- Accepted: 1 January 2014
- Revised: 1 October 2013
- Received: 1 June 2013
Published in trets Volume 7, Issue 4

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
BRAM
CAM
FPGA
associative memory
cache
hashing
Qualifiers
- research-article
- Research
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 7
  Total Citations
  View Citations
- 383
  Total Downloads
- Downloads (Last 12 months)28
- Downloads (Last 6 weeks)2
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Area-Efficient Near-Associative Memories on FPGAs

ACM Transactions on Reconfigurable Technology and Systems

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Area-efficient near-associative memories on FPGAs

Efficient multi-ported memories for FPGAs

A High Speed Reconfigurable Firewall Based On Parameterizable FPGA-based Content Addressable Memories

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Area-Efficient Near-Associative Memories on FPGAs

ACM Transactions on Reconfigurable Technology and Systems

Abstract

Supplemental Material

Available for Download

References

Cited By

Index Terms

Recommendations

Area-efficient near-associative memories on FPGAs

Efficient multi-ported memories for FPGAs

A High Speed Reconfigurable Firewall Based On Parameterizable FPGA-based Content Addressable Memories

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media