skip to main content
10.1145/3626202.3637575acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections

CompressedLUT: An Open Source Tool for Lossless Compression of Lookup Tables for Function Evaluation and Beyond

Authors Info & Claims
Published:02 April 2024Publication History

ABSTRACT

Lookup tables are widely used in hardware to store arrays of constant values. For instance, complex mathematical functions in hardware are typically implemented through table-based methods such as plain tabulation, piecewise linear approximation, and bipartite or multipartite table methods, which primarily rely on lookup tables to evaluate the functions. Storing extensive tables of constant values, however, can lead to excessive hardware costs in resource-constrained edge devices such as FPGAs. In this paper, we propose a method, called CompressedLUT, as a lossless compression scheme to compress arrays of arbitrary data, implemented as lookup tables. Our method exploits decomposition, self-similarities, higher-bit compression, and multilevel compression techniques to maximize table size savings with no accuracy loss. CompressedLUT uses addition and arithmetic right shift beside several small lookup tables to retrieve original data during the decoding phase. Using such cost-effective elements helps our method use low area and deliver high throughput. For evaluation purposes, we compressed a number of different lookup tables, either obtained by direct tabulation of 12-bit elementary functions or generated by other table-based methods for approximating functions at higher resolutions, such as multipartite table method at 24-bit, piecewise polynomial approximation method at 36-bit, and hls4ml library at 18-bit resolutions. We implemented the compressed tables on FPGAs using HLS to show the efficiency of our method in terms of hardware costs compared to previous works. Our method demonstrated 60% table size compression and achieved 2.33 times higher throughput per slice than conventional implementations on average. In comparison, previous TwoTable and LDTC works compressed the lookup tables on average by 33% and 37%, which resulted in 1.63 and 1.29 times higher throughput than the conventional implementations, respectively. CompressedLUT is available as an open source tool.

References

  1. Nelson Campos, Slava Chesnokov, Eran Edirisinghe, and Alexis Lluis. 2021. FPGA Implementation of Custom Floating-Point Logarithm and Division. In Applied Reconfigurable Computing. Architectures, Tools, and Applications, , Steven Derrien, Frank Hannig, Pedro C. Diniz, and Daniel Chillet (Eds.). Springer International Publishing, Cham, 295--304.Google ScholarGoogle Scholar
  2. Maxime Christ, Luc Forget, and Florent de Dinechin. 2022. Lossless Differential Table Compression for Hardware Function Evaluation. IEEE Transactions on Circuits and Systems II: Express Briefs, Vol. 69, 3 (2022), 1642--1646. https://doi.org/10.1109/TCSII.2021.3131405Google ScholarGoogle ScholarCross RefCross Ref
  3. Florent de Dinechin and Bogdan Pasca. 2011. Designing Custom Arithmetic Data Paths with FloPoCo. IEEE Design & Test of Computers , Vol. 28, 4 (2011), 18--27. https://doi.org/10.1109/MDT.2011.44Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. F. de Dinechin and A. Tisserand. 2005. Multipartite table methods. IEEE Trans. Comput. , Vol. 54, 3 (2005), 319--330. https://doi.org/10.1109/TC.2005.54Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Detrey and F. de Dinechin. 2005. Table-based polynomials for fast hardware function evaluation. In 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors (ASAP'05). 328--333. https://doi.org/10.1109/ASAP.2005.61Google ScholarGoogle ScholarCross RefCross Ref
  6. Javier Duarte et al. 2018. Fast inference of deep neural networks in FPGAs for particle physics. JINST, Vol. 13, 07 (2018), P07027. https://doi.org/10.1088/1748-0221/13/07/P07027 arxiv: 1804.06913 [physics.ins-det]Google ScholarGoogle ScholarCross RefCross Ref
  7. S. Rasoul Faraji and Kia Bazargan. 2019. Hybrid Binary-Unary Hardware Accelerator. In Proceedings of the 24th Asia and South Pacific Design Automation Conference (Tokyo, Japan) (ASPDAC '19). Association for Computing Machinery, New York, NY, USA, 210--215. https://doi.org/10.1145/3287624.3287706Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. FastML Team. 2023. fastmachinelearning/hls4ml. https://doi.org/10.5281/zenodo.1201549Google ScholarGoogle ScholarCross RefCross Ref
  9. Y. Serhan Gener, Sezer Gören, and H. Fatih Ugurdag. 2019. Lossless Look-Up Table Compression for Hardware Implementation of Transcendental Functions. In 2019 IFIP/IEEE 27th International Conference on Very Large Scale Integration (VLSI-SoC). 52--57. https://doi.org/10.1109/VLSI-SoC.2019.8920330Google ScholarGoogle ScholarCross RefCross Ref
  10. Shen-Fu Hsiao, Kun-Chih Chen, and Yi-Hau Chen. 2018. Optimization of Lookup Table Size in Table-Bound Design of Function Computation. In 2018 IEEE International Symposium on Circuits and Systems (ISCAS). 1--4. https://doi.org/10.1109/ISCAS.2018.8350933Google ScholarGoogle ScholarCross RefCross Ref
  11. Shen-Fu Hsiao, Chia-Sheng Wen, Yi-Hau Chen, and Kuei-Chun Huang. 2017. Hierarchical Multipartite Function Evaluation. IEEE Trans. Comput. , Vol. 66, 1 (2017), 89--99. https://doi.org/10.1109/TC.2016.2574314Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Shen-Fu Hsiao, Po-Han Wu, Chia-Sheng Wen, and Pramod Kumar Meher. 2015. Table Size Reduction Methods for Faithfully Rounded Lookup-Table-Based Multiplierless Function Evaluation. IEEE Transactions on Circuits and Systems II: Express Briefs, Vol. 62, 5 (2015), 466--470. https://doi.org/10.1109/TCSII.2014.2386232Google ScholarGoogle ScholarCross RefCross Ref
  13. Alireza Khataei, Gaurav Singh, and Kia Bazargan. 2023 a. Approximate Hybrid Binary-Unary Computing with Applications in BERT Language Model and Image Processing. In Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays (Monterey, CA, USA) (FPGA '23). Association for Computing Machinery, New York, NY, USA, 165--175. https://doi.org/10.1145/3543622.3573181Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Alireza Khataei, Gaurav Singh, and Kia Bazargan. 2023 b. Optimizing Hybrid Binary-Unary Hardware Accelerators Using Self-Similarity Measures. In 2023 IEEE 31st Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 105--113. https://doi.org/10.1109/FCCM57271.2023.00020Google ScholarGoogle ScholarCross RefCross Ref
  15. Martin Langhammer and Bogdan Pasca. 2016. Single Precision Natural Logarithm Architecture for Hard Floating-Point and DSP-Enabled FPGAs. In 2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH). 164--171. https://doi.org/10.1109/ARITH.2016.20Google ScholarGoogle ScholarCross RefCross Ref
  16. Jean-Michel Muller. 2020. Elementary Functions and Approximate Computing. Proc. IEEE, Vol. 108, 12 (2020), 2136--2149. https://doi.org/10.1109/JPROC.2020.2991885Google ScholarGoogle ScholarCross RefCross Ref
  17. Pragnesh Patel, Aman Arora, Earl Swartzlander, and Lizy John. 2022. LogGen: A Parameterized Generator for Designing Floating-Point Logarithm Units for Deep Learning. In 2022 23rd International Symposium on Quality Electronic Design (ISQED). 1--7. https://doi.org/10.1109/ISQED54688.2022.9806139Google ScholarGoogle ScholarCross RefCross Ref
  18. M.J. Schulte and J.E. Stine. 1999. Approximating elementary functions with symmetric bipartite tables. IEEE Trans. Comput. , Vol. 48, 8 (1999), 842--847. https://doi.org/10.1109/12.795125Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Yusheng Xie, Alex Noel Joseph Raj, Zhendong Hu, Shaohaohan Huang, Zhun Fan, and Miroslav Joler. 2020. A Twofold Lookup Table Architecture for Efficient Approximation of Activation Functions. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 28, 12 (2020), 2540--2550. https://doi.org/10.1109/TVLSI.2020.3015391Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. CompressedLUT: An Open Source Tool for Lossless Compression of Lookup Tables for Function Evaluation and Beyond

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Article Metrics

        • Downloads (Last 12 months)107
        • Downloads (Last 6 weeks)102

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader