Abstract
Associative memories can map sparsely used keys to values with low latency but can incur heavy area overheads. The lack of customized hardware for associative memories in today’s mainstream FPGAs exacerbates the overhead cost of building these memories using the fixed address match BRAMs. In this article, we develop a new, FPGA-friendly, memory system architecture based on a multiple hash scheme that is able to achieve near-associative performance without the area-delay overheads of a fully associative memory on FPGAs. At the same time, we develop a novel memory management algorithm that allows us to statistically mimic an associative memory. Using the proposed architecture as a 64KB L1 data cache, we show that it is able to achieve near-associative miss rates while consuming 3--13 × fewer FPGA memory resources for a set of benchmark programs from the SPEC CPU2006 suite than fully associative memories generated by the Xilinx Coregen tool. Benefits for our architecture increase with key width, allowing area reduction up to 100 ×. Mapping delay is also reduced to 3.7ns for a 1,024-entry flat version or 6.1ns for an area-efficient version compared to 17.6ns for a fully associative memory for a 64-bit key on a Xilinx Virtex 6 device.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Area-Efficient Near-Associative Memories on FPGAs
- Yossi Azar, Andrei Z. Border, Anna R. Karlin, and Eli Upfal. 1994. Balanced allocation. In Proceedings of the ACM Symposium on Theory of Computing. 593--602. Google ScholarDigital Library
- Steven Battle, Andrew D. Hilton, Mark Hempstead, and Amir Roth. 2012. Flexible register management using reference counting. In Proceedings of the International Symposium on High-Performance Computer Architecture. IEEE, 273--284. DOI: http://dx.doi.org/10.1109/HPCA.2012.6169033 Google ScholarDigital Library
- Burton H. Bloom. 1970. Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13, 7 (July 1970), 422--426. Google ScholarDigital Library
- Bluespec, Inc. 2012. Bluespec SystemVerilog 2012.01.A. Retrieved from http://www.bluespec.com.Google Scholar
- Suzanne Bunton and Gaetano Borriello. 1992. Practical dictionary management for hardware data compression. Commun. ACM 35, 1 (1992), 95--104. DOI: http://dx.doi.org/10.1145/129617.129622. Google ScholarDigital Library
- Bernard Chazelle, Joe Kilian, Ronitt Rubinfeld, and Ayellet Tal. 2004. The Bloomier filter: An efficient data structure for static support lookup tables. In Proceedings of ACM-SIAM Symposium on Discrete Algorithms (SODA’04). Society for Industrial and Applied Mathematics, Philadelphia, PA, 30--39. Google ScholarDigital Library
- Zbigniew J. Czech, George Havas, and Bohdan S. Majewski. 1992. An optimal algorithm for generating minimal perfect hash functions. Inform. Process. Lett. 43, 5 (1992), 257--264. Google ScholarDigital Library
- Udit Dhawan and André DeHon. 2013. Area-efficient near-associative memories on FPGAs. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. 191--200. Google ScholarDigital Library
- Udit Dhawan, Albert Kwon, Edin Kadric, Cătălin Hriţcu, Benjamin C. Pierce, Jonathan M. Smith, Gregory Malecha, Greg Morrisett, Thomas F. Knight, Jr., Andrew Sutherland, Tom Hawkins, Amanda Zyxnfryx, David Wittenberg, Peter Trei, Sumit Ray, Greg Sullivan, and André DeHon. 2012. Hardware support for safety interlocks and introspection. In Proceedings of the SASO Workshop on Adaptive Host and Network Security. http://ic.ese.upenn.edu/pdf/interlocks_ahns2012.pdf. Google ScholarDigital Library
- Li Fan, Pai Cao, Jussara Almeida, and Andrei Z. Border. 2000. Summary cache: A scalable wide-area web cache sharing protocol. IEEE/ACM Trans. Networking 8, 3 (2000), 281--293. Google ScholarDigital Library
- John L. Henning. 2006. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News 34, 4 (September 2006), 1--17. DOI: http://dx.doi.org/10.1145/1186736.1186737 Google ScholarDigital Library
- J. Ho and G. Lemieux. 2008. PERG: A scalable FPGA-based pattern-matching engine with consolidated Bloomier filters. In Proceedings of the International Conference on Field-Programmable Technology. 73--80. DOI: http://dx.doi.org/10.1109/FPT.2008.4762368Google Scholar
- Adam Kirsch and Michael Mitzenmacher. 2010. The power of one move: Hashing schemes for hardware. IEEE/ACM Trans. Networking 18, 6 (2010), 1752--1765. Google ScholarDigital Library
- Charles Eric LaForest and Gregory Steffan. 2012. Octavo: An FPGA-centric processor family. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. 97--106. Google ScholarDigital Library
- Shih-Lien L. Lu, Peter Yiannacouras, Taeweon Suh, Rolf Kassa, and Michael Konow. 2008. A desktop computer with a reconfigurable Pentium. ACM Transactions on Reconfigurable Technology and Systems 1, 1 (March 2008). Google ScholarDigital Library
- Michael Mitzenmacher. 1999. Studying balanced allocation with differential equations. Combin. Probab. Comput. 8, 5 (1999), 473--482. Google ScholarDigital Library
- Jad Naous, David Erickson, G. Adam Covington, Guido Appenzeller, and Nick McKeown. 2008. Implementing an OpenFlow switch on the NetFPGA platform. In Proceedings of the ACM/IEEE Symposium on Architectures for Networking and Communications Systems. 1--9. DOI: http://dx.doi.org/10.1145/1477942.1477944 Google ScholarDigital Library
- Daniel Sanchez and Christos Kozyrakis. 2010. The ZCache: Decoupling ways and associativity. In Proceedings of the International Symposium on Microarchitecture. 196--207. Google ScholarDigital Library
- André Seznec. 1993. A case for two-way skewed-associative caches. In Proceedings of the International Symposium on Computer Architecture. 169--178. Google ScholarDigital Library
- André Seznec and François Bodin. 1993. Skewed-associative caches. In Parallel Architectures and Languages Europe. 304--316. DOI: http://dx.doi.org/10.1007/3-540-56891-3_24 Google ScholarDigital Library
- Haoyu Song, Sarang Dharmapurikar, Jonathan Turner, and John Lockwood. 2005. Fast hash table lookup using extended bloom filter: An aid to network processing. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications. 181--192. DOI: http://dx.doi.org/10.1145/1080091.1080114 Google ScholarDigital Library
- John Wawrzynek, David Patterson, Mark Oskin, Shih-Lien Lu, Christoforos Kozyrakis, James C. Hoe, Derek Chiou, and Krste Asanović. 2007. RAMP: Research accelerator for multiple processors. IEEE Micro 27, 2 (2007), 46--57. Google ScholarDigital Library
- Sewook Wee, Jared Casper, Njuguna Njoroge, Yuriy Tesylar, Daxia Ge, Christos Kozyrakis, and Kunle Olukotun. 2007. A practical FPGA based framework for novel CMP research. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. 116--125. Google ScholarDigital Library
- Rondald Wunderlich and James C. Hoe. 2004. In-system FPGA prototyping of an itanium microarchitecture. In Proceedings of the International Conference on Computer Design. 288--294. Google ScholarDigital Library
- Xilinx, Inc. 2011a. Parameterizable Content-Addressable Memory. Xilinx, Inc., 2100 Logic Drive, San Jose, CA 95124. XAPP 1151 http://www.xilinx.com/support/documentation/application_notes/xapp1 151_Param_CAM.pdf.Google Scholar
- Xilinx, Inc. 2011b. Virtex-6 FPGA Data Sheet: DC and Switching Characteristics. Xilinx, Inc., 2100 Logic Drive, San Jose, CA 95124.Google Scholar
- Peter Yiannacouras and Jonathan Rose. 2003. A parameterized automatic cache generator for FPGAs. In Proceedings of the International Conference on Field-Programmable Technology. 324--327.Google ScholarCross Ref
- Peter Yiannacouras, J. Gregory Steffan, and Jonathan Rose. 2007. Exploration and customization of FPGA-based soft processors. IEEE Transactions on Computer-Aided Design 26, 2 (2007), 266--277. Google ScholarDigital Library
Index Terms
- Area-Efficient Near-Associative Memories on FPGAs
Recommendations
Area-efficient near-associative memories on FPGAs
FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arraysAssociative memories can map sparsely used keys to values with low latency but can incur heavy area overheads. The lack of customized hardware for associative memories in today's mainstream FPGAs exacerbates the overhead cost of building these memories ...
Efficient multi-ported memories for FPGAs
FPGA '10: Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arraysMulti-ported memories are challenging to implement with FPGAs since the provided block RAMs typically have only two ports. We present a thorough exploration of the design space of FPGA-based soft multi-ported memories by evaluating conventional ...
A High Speed Reconfigurable Firewall Based On Parameterizable FPGA-based Content Addressable Memories
A technique for implementing a Content Addressable Memory (CAM) on an FPGA is described. The CAM is highly parameterizable, allowing varying word widths, memory depths and operations to be implemented depending upon the requirements of the target ...
Comments