research-article

CompressedLUT: An Open Source Tool for Lossless Compression of Lookup Tables for Function Evaluation and Beyond

Authors:
Alireza Khataei

Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN, USA

Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN, USA

0000-0002-9146-5684
View Profile

,
Kia Bazargan

Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN, USA

Department of Electrical and Computer Engineering, University of Minnesota, Minneapolis, MN, USA

0000-0003-3624-7366
View Profile

FPGA '24: Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate ArraysApril 2024Pages 2–11https://doi.org/10.1145/3626202.3637575

Published:02 April 2024Publication History

FPGA '24: Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays

Pages 2–11

ABSTRACT

Lookup tables are widely used in hardware to store arrays of constant values. For instance, complex mathematical functions in hardware are typically implemented through table-based methods such as plain tabulation, piecewise linear approximation, and bipartite or multipartite table methods, which primarily rely on lookup tables to evaluate the functions. Storing extensive tables of constant values, however, can lead to excessive hardware costs in resource-constrained edge devices such as FPGAs. In this paper, we propose a method, called CompressedLUT, as a lossless compression scheme to compress arrays of arbitrary data, implemented as lookup tables. Our method exploits decomposition, self-similarities, higher-bit compression, and multilevel compression techniques to maximize table size savings with no accuracy loss. CompressedLUT uses addition and arithmetic right shift beside several small lookup tables to retrieve original data during the decoding phase. Using such cost-effective elements helps our method use low area and deliver high throughput. For evaluation purposes, we compressed a number of different lookup tables, either obtained by direct tabulation of 12-bit elementary functions or generated by other table-based methods for approximating functions at higher resolutions, such as multipartite table method at 24-bit, piecewise polynomial approximation method at 36-bit, and hls4ml library at 18-bit resolutions. We implemented the compressed tables on FPGAs using HLS to show the efficiency of our method in terms of hardware costs compared to previous works. Our method demonstrated 60% table size compression and achieved 2.33 times higher throughput per slice than conventional implementations on average. In comparison, previous TwoTable and LDTC works compressed the lookup tables on average by 33% and 37%, which resulted in 1.63 and 1.29 times higher throughput than the conventional implementations, respectively. CompressedLUT is available as an open source tool.

References

Nelson Campos, Slava Chesnokov, Eran Edirisinghe, and Alexis Lluis. 2021. FPGA Implementation of Custom Floating-Point Logarithm and Division. In Applied Reconfigurable Computing. Architectures, Tools, and Applications, , Steven Derrien, Frank Hannig, Pedro C. Diniz, and Daniel Chillet (Eds.). Springer International Publishing, Cham, 295--304.Google Scholar
Maxime Christ, Luc Forget, and Florent de Dinechin. 2022. Lossless Differential Table Compression for Hardware Function Evaluation. IEEE Transactions on Circuits and Systems II: Express Briefs, Vol. 69, 3 (2022), 1642--1646. https://doi.org/10.1109/TCSII.2021.3131405Google ScholarCross Ref
Florent de Dinechin and Bogdan Pasca. 2011. Designing Custom Arithmetic Data Paths with FloPoCo. IEEE Design & Test of Computers , Vol. 28, 4 (2011), 18--27. https://doi.org/10.1109/MDT.2011.44Google ScholarDigital Library
F. de Dinechin and A. Tisserand. 2005. Multipartite table methods. IEEE Trans. Comput. , Vol. 54, 3 (2005), 319--330. https://doi.org/10.1109/TC.2005.54Google ScholarDigital Library
J. Detrey and F. de Dinechin. 2005. Table-based polynomials for fast hardware function evaluation. In 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors (ASAP'05). 328--333. https://doi.org/10.1109/ASAP.2005.61Google ScholarCross Ref
Javier Duarte et al. 2018. Fast inference of deep neural networks in FPGAs for particle physics. JINST, Vol. 13, 07 (2018), P07027. https://doi.org/10.1088/1748-0221/13/07/P07027 arxiv: 1804.06913 [physics.ins-det]Google ScholarCross Ref
S. Rasoul Faraji and Kia Bazargan. 2019. Hybrid Binary-Unary Hardware Accelerator. In Proceedings of the 24th Asia and South Pacific Design Automation Conference (Tokyo, Japan) (ASPDAC '19). Association for Computing Machinery, New York, NY, USA, 210--215. https://doi.org/10.1145/3287624.3287706Google ScholarDigital Library
FastML Team. 2023. fastmachinelearning/hls4ml. https://doi.org/10.5281/zenodo.1201549Google ScholarCross Ref
Y. Serhan Gener, Sezer Gören, and H. Fatih Ugurdag. 2019. Lossless Look-Up Table Compression for Hardware Implementation of Transcendental Functions. In 2019 IFIP/IEEE 27th International Conference on Very Large Scale Integration (VLSI-SoC). 52--57. https://doi.org/10.1109/VLSI-SoC.2019.8920330Google ScholarCross Ref
Shen-Fu Hsiao, Kun-Chih Chen, and Yi-Hau Chen. 2018. Optimization of Lookup Table Size in Table-Bound Design of Function Computation. In 2018 IEEE International Symposium on Circuits and Systems (ISCAS). 1--4. https://doi.org/10.1109/ISCAS.2018.8350933Google ScholarCross Ref
Shen-Fu Hsiao, Chia-Sheng Wen, Yi-Hau Chen, and Kuei-Chun Huang. 2017. Hierarchical Multipartite Function Evaluation. IEEE Trans. Comput. , Vol. 66, 1 (2017), 89--99. https://doi.org/10.1109/TC.2016.2574314Google ScholarDigital Library
Shen-Fu Hsiao, Po-Han Wu, Chia-Sheng Wen, and Pramod Kumar Meher. 2015. Table Size Reduction Methods for Faithfully Rounded Lookup-Table-Based Multiplierless Function Evaluation. IEEE Transactions on Circuits and Systems II: Express Briefs, Vol. 62, 5 (2015), 466--470. https://doi.org/10.1109/TCSII.2014.2386232Google ScholarCross Ref
Alireza Khataei, Gaurav Singh, and Kia Bazargan. 2023 a. Approximate Hybrid Binary-Unary Computing with Applications in BERT Language Model and Image Processing. In Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays (Monterey, CA, USA) (FPGA '23). Association for Computing Machinery, New York, NY, USA, 165--175. https://doi.org/10.1145/3543622.3573181Google ScholarDigital Library
Alireza Khataei, Gaurav Singh, and Kia Bazargan. 2023 b. Optimizing Hybrid Binary-Unary Hardware Accelerators Using Self-Similarity Measures. In 2023 IEEE 31st Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 105--113. https://doi.org/10.1109/FCCM57271.2023.00020Google ScholarCross Ref
Martin Langhammer and Bogdan Pasca. 2016. Single Precision Natural Logarithm Architecture for Hard Floating-Point and DSP-Enabled FPGAs. In 2016 IEEE 23nd Symposium on Computer Arithmetic (ARITH). 164--171. https://doi.org/10.1109/ARITH.2016.20Google ScholarCross Ref
Jean-Michel Muller. 2020. Elementary Functions and Approximate Computing. Proc. IEEE, Vol. 108, 12 (2020), 2136--2149. https://doi.org/10.1109/JPROC.2020.2991885Google ScholarCross Ref
Pragnesh Patel, Aman Arora, Earl Swartzlander, and Lizy John. 2022. LogGen: A Parameterized Generator for Designing Floating-Point Logarithm Units for Deep Learning. In 2022 23rd International Symposium on Quality Electronic Design (ISQED). 1--7. https://doi.org/10.1109/ISQED54688.2022.9806139Google ScholarCross Ref
M.J. Schulte and J.E. Stine. 1999. Approximating elementary functions with symmetric bipartite tables. IEEE Trans. Comput. , Vol. 48, 8 (1999), 842--847. https://doi.org/10.1109/12.795125Google ScholarDigital Library
Yusheng Xie, Alex Noel Joseph Raj, Zhendong Hu, Shaohaohan Huang, Zhun Fan, and Miroslav Joler. 2020. A Twofold Lookup Table Architecture for Efficient Approximation of Activation Functions. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 28, 12 (2020), 2540--2550. https://doi.org/10.1109/TVLSI.2020.3015391Google ScholarDigital Library

Index Terms

CompressedLUT: An Open Source Tool for Lossless Compression of Lookup Tables for Function Evaluation and Beyond
1. Hardware
  1. Electronic design automation
    1. High-level and register-transfer level synthesis
  2. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators

Recommendations

Lossless Compression Using Efficient Encoding of Bitmasks
ISVLSI '09: Proceedings of the 2009 IEEE Computer Society Annual Symposium on VLSI

Lossless compression is widely used to improve both memory requirement and communication bandwidth in embedded systems. Dictionary based compression techniques are very popular because of their good compression efficiency and fast decompression ...
Read More
Lossless-by-Lossy Coding for Scalable Lossless Image Compression

This paper presents a method of scalable lossless image compression by means of lossy coding. A progressive decoding capability and a full decoding for the lossless rendition are equipped with the losslessly encoded bit stream. Embedded coding is ...
Read More
Lossless and near-lossless compression of hyperspectral images based on distributed source coding

This paper addresses the problem of the lossless and near-lossless compression of hyperspectral images and presents two efficient algorithms based on distributed source coding, which perform the lossless compression by means of multilevel scalar codes. ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
FPGA '24: Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays
April 2024
300 pages
ISBN:9798400704185
DOI:10.1145/3626202
General Chair:
Zhiru Zhang
Cornell University, USA
,
Program Chair:
Andrew Putnam
Microsoft, USA
Copyright © 2024 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 2 April 2024
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Badges
Author Tags
function evaluation
hardware acceleration
high-level synthesis
lookup table
lossless compression
table size reduction
table-based method
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate125of627submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 107
  Total Downloads
- Downloads (Last 12 months)107
- Downloads (Last 6 weeks)102
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

CompressedLUT: An Open Source Tool for Lossless Compression of Lookup Tables for Function Evaluation and Beyond

FPGA '24: Proceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays

ABSTRACT

References

Cited By

Index Terms

Recommendations

Lossless Compression Using Efficient Encoding of Bitmasks

Lossless-by-Lossy Coding for Scalable Lossless Image Compression

Lossless and near-lossless compression of hyperspectral images based on distributed source coding