skip to main content
research-article

Area-Efficient Near-Associative Memories on FPGAs

Published:23 January 2015Publication History
Skip Abstract Section

Abstract

Associative memories can map sparsely used keys to values with low latency but can incur heavy area overheads. The lack of customized hardware for associative memories in today’s mainstream FPGAs exacerbates the overhead cost of building these memories using the fixed address match BRAMs. In this article, we develop a new, FPGA-friendly, memory system architecture based on a multiple hash scheme that is able to achieve near-associative performance without the area-delay overheads of a fully associative memory on FPGAs. At the same time, we develop a novel memory management algorithm that allows us to statistically mimic an associative memory. Using the proposed architecture as a 64KB L1 data cache, we show that it is able to achieve near-associative miss rates while consuming 3--13 × fewer FPGA memory resources for a set of benchmark programs from the SPEC CPU2006 suite than fully associative memories generated by the Xilinx Coregen tool. Benefits for our architecture increase with key width, allowing area reduction up to 100 ×. Mapping delay is also reduced to 3.7ns for a 1,024-entry flat version or 6.1ns for an area-efficient version compared to 17.6ns for a fully associative memory for a 64-bit key on a Xilinx Virtex 6 device.

Skip Supplemental Material Section

Supplemental Material

References

  1. Yossi Azar, Andrei Z. Border, Anna R. Karlin, and Eli Upfal. 1994. Balanced allocation. In Proceedings of the ACM Symposium on Theory of Computing. 593--602. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Steven Battle, Andrew D. Hilton, Mark Hempstead, and Amir Roth. 2012. Flexible register management using reference counting. In Proceedings of the International Symposium on High-Performance Computer Architecture. IEEE, 273--284. DOI: http://dx.doi.org/10.1109/HPCA.2012.6169033 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Burton H. Bloom. 1970. Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13, 7 (July 1970), 422--426. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Bluespec, Inc. 2012. Bluespec SystemVerilog 2012.01.A. Retrieved from http://www.bluespec.com.Google ScholarGoogle Scholar
  5. Suzanne Bunton and Gaetano Borriello. 1992. Practical dictionary management for hardware data compression. Commun. ACM 35, 1 (1992), 95--104. DOI: http://dx.doi.org/10.1145/129617.129622. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Bernard Chazelle, Joe Kilian, Ronitt Rubinfeld, and Ayellet Tal. 2004. The Bloomier filter: An efficient data structure for static support lookup tables. In Proceedings of ACM-SIAM Symposium on Discrete Algorithms (SODA’04). Society for Industrial and Applied Mathematics, Philadelphia, PA, 30--39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Zbigniew J. Czech, George Havas, and Bohdan S. Majewski. 1992. An optimal algorithm for generating minimal perfect hash functions. Inform. Process. Lett. 43, 5 (1992), 257--264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Udit Dhawan and André DeHon. 2013. Area-efficient near-associative memories on FPGAs. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. 191--200. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Udit Dhawan, Albert Kwon, Edin Kadric, Cătălin Hriţcu, Benjamin C. Pierce, Jonathan M. Smith, Gregory Malecha, Greg Morrisett, Thomas F. Knight, Jr., Andrew Sutherland, Tom Hawkins, Amanda Zyxnfryx, David Wittenberg, Peter Trei, Sumit Ray, Greg Sullivan, and André DeHon. 2012. Hardware support for safety interlocks and introspection. In Proceedings of the SASO Workshop on Adaptive Host and Network Security. http://ic.ese.upenn.edu/pdf/interlocks_ahns2012.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Li Fan, Pai Cao, Jussara Almeida, and Andrei Z. Border. 2000. Summary cache: A scalable wide-area web cache sharing protocol. IEEE/ACM Trans. Networking 8, 3 (2000), 281--293. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. John L. Henning. 2006. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News 34, 4 (September 2006), 1--17. DOI: http://dx.doi.org/10.1145/1186736.1186737 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Ho and G. Lemieux. 2008. PERG: A scalable FPGA-based pattern-matching engine with consolidated Bloomier filters. In Proceedings of the International Conference on Field-Programmable Technology. 73--80. DOI: http://dx.doi.org/10.1109/FPT.2008.4762368Google ScholarGoogle Scholar
  13. Adam Kirsch and Michael Mitzenmacher. 2010. The power of one move: Hashing schemes for hardware. IEEE/ACM Trans. Networking 18, 6 (2010), 1752--1765. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Charles Eric LaForest and Gregory Steffan. 2012. Octavo: An FPGA-centric processor family. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. 97--106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Shih-Lien L. Lu, Peter Yiannacouras, Taeweon Suh, Rolf Kassa, and Michael Konow. 2008. A desktop computer with a reconfigurable Pentium. ACM Transactions on Reconfigurable Technology and Systems 1, 1 (March 2008). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Michael Mitzenmacher. 1999. Studying balanced allocation with differential equations. Combin. Probab. Comput. 8, 5 (1999), 473--482. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jad Naous, David Erickson, G. Adam Covington, Guido Appenzeller, and Nick McKeown. 2008. Implementing an OpenFlow switch on the NetFPGA platform. In Proceedings of the ACM/IEEE Symposium on Architectures for Networking and Communications Systems. 1--9. DOI: http://dx.doi.org/10.1145/1477942.1477944 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Daniel Sanchez and Christos Kozyrakis. 2010. The ZCache: Decoupling ways and associativity. In Proceedings of the International Symposium on Microarchitecture. 196--207. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. André Seznec. 1993. A case for two-way skewed-associative caches. In Proceedings of the International Symposium on Computer Architecture. 169--178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. André Seznec and François Bodin. 1993. Skewed-associative caches. In Parallel Architectures and Languages Europe. 304--316. DOI: http://dx.doi.org/10.1007/3-540-56891-3_24 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Haoyu Song, Sarang Dharmapurikar, Jonathan Turner, and John Lockwood. 2005. Fast hash table lookup using extended bloom filter: An aid to network processing. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications. 181--192. DOI: http://dx.doi.org/10.1145/1080091.1080114 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. John Wawrzynek, David Patterson, Mark Oskin, Shih-Lien Lu, Christoforos Kozyrakis, James C. Hoe, Derek Chiou, and Krste Asanović. 2007. RAMP: Research accelerator for multiple processors. IEEE Micro 27, 2 (2007), 46--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Sewook Wee, Jared Casper, Njuguna Njoroge, Yuriy Tesylar, Daxia Ge, Christos Kozyrakis, and Kunle Olukotun. 2007. A practical FPGA based framework for novel CMP research. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. 116--125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Rondald Wunderlich and James C. Hoe. 2004. In-system FPGA prototyping of an itanium microarchitecture. In Proceedings of the International Conference on Computer Design. 288--294. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Xilinx, Inc. 2011a. Parameterizable Content-Addressable Memory. Xilinx, Inc., 2100 Logic Drive, San Jose, CA 95124. XAPP 1151 http://www.xilinx.com/support/documentation/application_notes/xapp1 151_Param_CAM.pdf.Google ScholarGoogle Scholar
  26. Xilinx, Inc. 2011b. Virtex-6 FPGA Data Sheet: DC and Switching Characteristics. Xilinx, Inc., 2100 Logic Drive, San Jose, CA 95124.Google ScholarGoogle Scholar
  27. Peter Yiannacouras and Jonathan Rose. 2003. A parameterized automatic cache generator for FPGAs. In Proceedings of the International Conference on Field-Programmable Technology. 324--327.Google ScholarGoogle ScholarCross RefCross Ref
  28. Peter Yiannacouras, J. Gregory Steffan, and Jonathan Rose. 2007. Exploration and customization of FPGA-based soft processors. IEEE Transactions on Computer-Aided Design 26, 2 (2007), 266--277. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Area-Efficient Near-Associative Memories on FPGAs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Reconfigurable Technology and Systems
        ACM Transactions on Reconfigurable Technology and Systems  Volume 7, Issue 4
        January 2015
        213 pages
        ISSN:1936-7406
        EISSN:1936-7414
        DOI:10.1145/2699137
        • Editor:
        • Steve Wilton
        Issue’s Table of Contents

        Copyright © 2015 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 23 January 2015
        • Accepted: 1 January 2014
        • Revised: 1 October 2013
        • Received: 1 June 2013
        Published in trets Volume 7, Issue 4

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader