research-article

Approximate Hybrid Binary-Unary Computing with Applications in BERT Language Model and Image Processing

Authors:
Alireza Khataei

University of Minnesota, Minneapolis, MN, USA

University of Minnesota, Minneapolis, MN, USA

0000-0002-9146-5684
View Profile

,
Gaurav Singh

University of Minnesota, Minneapolis, MN, USA

University of Minnesota, Minneapolis, MN, USA

0000-0001-5232-8145
View Profile

,
Kia Bazargan

University of Minnesota, Minneapolis, MN, USA

University of Minnesota, Minneapolis, MN, USA

0000-0003-3624-7366
View Profile

FPGA '23: Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate ArraysFebruary 2023Pages 165–175https://doi.org/10.1145/3543622.3573181

Published:12 February 2023Publication History

FPGA '23: Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays

Pages 165–175

ABSTRACT

We propose a novel method for approximate hardware implementation of univariate math functions with significantly fewer hardware resources compared to previous approaches. Examples of such functions include exp(x) and the activation function GELU(x), both used in transformer networks, gamma(x), which is used in image processing, and other functions such as tanh(x), cosh(x), sq(x), and sqrt(x). The method builds on previous works on hybrid binary-unary computing. The novelty in our approach is that we break a function into a number of sub-functions such that implementing each sub-function becomes cheap, and converting the output of the sub-functions to binary becomes almost trivial. Our method also uses self-similarity in functions to further reduce the cost. We compare our method to the conventional binary, previous stochastic computing, and hybrid binary-unary methods on several functions at 8-, 12-, and 16-bit resolutions. While preserving high accuracy, our method outperforms previous works in terms of hardware cost, e.g., tolerating less than 0.01 mean absolute error, our method reduces the (area x latency) cost on average by 5, 7, and 2 orders of magnitude, compared to the conventional binary, stochastic computing, and hybrid binary-unary methods, respectively. Ultimately, we demonstrate the potential benefits of our method for natural language processing and image processing applications. We deploy our method to implement major blocks in an encoding layer of BERT language model, and also the Roberts Cross edge detection algorithm. Both include non-linear functions.

References

A. Alaghi, W. Qian, and J. P. Hayes. 2017. The Promise and Challenge of Stochastic Computing. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. PP, 99 (2017), 1--1.Google Scholar
Florent De Dinechin and Bogdan Pasca. 2011. Designing custom arithmetic data paths with FloPoCo. IEEE Design & Test of Computers, Vol. 28, 4 (2011), 18--27.Google ScholarDigital Library
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google Scholar
J. Duarte, S. Han, P. Harris, S. Jindariani, E. Kreinar, B. Kreis, J. Ngadiuba, M. Pierini, R. Rivera, N. Tran, and Z. Wu. 2018. Fast inference of deep neural networks in FPGAs for particle physics. Journal of Instrumentation, Vol. 13, 07 (jul 2018), P07027--P07027. https://doi.org/10.1088/1748-0221/13/07/p07027Google ScholarCross Ref
S. Rasoul. Faraji, Pierre Abillama, Gaurav Singh, and Kia Bazargan. 2020. HBUCNNA: Hybrid Binary-Unary Convolutional Neural Network Accelerator. In 2020 IEEE International Symposium on Circuits and Systems (ISCAS). https://doi.org/ISCAS.2020Google Scholar
S Rasoul Faraji and Kia Bazargan. 2020a. Hybrid binary-unary hardware accelerator. IEEE Trans. Comput., Vol. 69, 9 (2020), 1308--1319.Google ScholarCross Ref
S Rasoul Faraji and Kia Bazargan. 2020b. Hybrid binary-unary truncated multiplication for DSP Applications on FPGAs. In 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD). IEEE, 1--9.Google Scholar
S. Rasoul Faraji, Gaurav Singh, and Kia Bazargan. 2019. HBUNN - Hybrid Binary-Unary Neural Network: Realizing a Complete CNN on an FPGA. In IEEE International Conference on Computer Design (ICCD) (ICCD '19).Google ScholarCross Ref
N. Eamon Gaffney and Armin Alaghi. 2016. scsynth. https://github.com/arminalaghi/scsynthGoogle Scholar
B.R. Gaines. 1969. Stochastic Computing Systems. In Advances in Information Systems Science. Springer US, 37--172. http://dx.doi.org/10.1007/978--1--4899--5841--9_2Google ScholarCross Ref
Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N Dauphin. 2017. Convolutional sequence to sequence learning. In International conference on machine learning. PMLR, 1243--1252.Google Scholar
Alex Graves. 2012. Sequence transduction with recurrent neural networks. arXiv preprint arXiv:1211.3711 (2012).Google Scholar
Dan Hendrycks and Kevin Gimpel. 2016. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016).Google Scholar
Ruofei Hu, Binren Tian, Shouyi Yin, and Shaojun Wei. 2018. Efficient hardware architecture of softmax layer in deep neural network. In 2018 IEEE 23rd International Conference on Digital Signal Processing (DSP). IEEE, 1--5.Google ScholarCross Ref
Devon Jenson and Marc Riedel. 2016. A Deterministic Approach to Stochastic Computation. In Proceedings of the 35th International Conference on Computer-Aided Design (Austin, Texas) (ICCAD '16). New York, NY, USA, Article 102, 8 pages. https://doi.org/10.1145/2966986.2966988Google ScholarDigital Library
Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W Mahoney, and Kurt Keutzer. 2021. I-bert: Integer-only bert quantization. In International conference on machine learning. PMLR, 5506--5518.Google Scholar
Peng Li, D.J. Lilja, W. Qian, M.D. Riedel, and K. Bazargan. 2014. Logical Computation on Stochastic Bit Streams with Linear Finite-State Machines. Computers, IEEE Transactions on, Vol. 63, 6 (June 2014), 1474--1486. https://doi.org/10.1109/TC.2012.231Google ScholarDigital Library
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).Google Scholar
Soheil Mohajer, Zhiheng Wang, and Kia Bazargan. 2018. Routing Magic: Performing Computations Using Routing Networks and Voting Logic on Unary Encoded Data. In Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Monterey, CALIFORNIA, USA) (FPGA '18). ACM, New York, NY, USA, 77--86.Google ScholarDigital Library
Soheil Mohajer, Zhiheng Wang, Kia Bazargan, and Yuyang Li. 2020. Parallel unary computing based on function derivatives. ACM Transactions on Reconfigurable Technology and Systems (TRETS), Vol. 14, 1 (2020), 1--25.Google Scholar
M. H. Najafi, S. R. Faraji, B. Li, D. J. Lilja, and K. Bazargan. 2019. Accelerating Deterministic Bit-Stream Computing with Resolution Splitting. In 20th International Symposium on Quality Electronic Design (ISQED). 157--162. https://doi.org/10.1109/ISQED.2019.8697443Google ScholarCross Ref
M. Hassan Najafi, David J. Lilja, and Marc Riedel. 2018a. Deterministic Methods for Stochastic Computing Using Low-discrepancy Sequences. In Proceedings of the International Conference on Computer-Aided Design (San Diego, California) (ICCAD '18). ACM, New York, NY, USA, Article 51, 8 pages. https://doi.org/10.1145/3240765.3240797Google ScholarDigital Library
M. Hassan Najafi, D. J. Lilja, M. D. Riedel, and K. Bazargan. 2018b. Low-Cost Sorting Network Circuits Using Unary Processing. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 26, 8 (Aug 2018), 1471--1480.Google ScholarCross Ref
Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. 2019. fairseq: A fast, extensible toolkit for sequence modeling. arXiv preprint arXiv:1904.01038 (2019).Google Scholar
W.J. Poppelbaum, A. Dollas, J.B. Glickman, and C. O'Toole. 1987. Unary Processing. In Advances in Computers. Vol. 26. Elsevier, 47 -- 92.Google Scholar
W. J. Poppelbaum, C. Afuso, and J. W. Esch. 1967. Stochastic Computing Elements and Systems. In Proceedings of the Joint Computer Conference (Anaheim, California) (AFIPS '67 (Fall)). ACM, New York, NY, USA, 635--644. https://doi.org/10.1145/1465611.1465696Google ScholarDigital Library
Weikang Qian, Xin Li, Marc D. Riedel, Kia Bazargan, and David J. Lilja. 2011a. An Architecture for Fault-Tolerant Computation with Stochastic Logic. IEEE Trans. Comput., Vol. 60, 1 (2011), 93--105. https://doi.org/10.1109/TC.2010.202Google ScholarDigital Library
W. Qian and M.D. Riedel. 2008. The Synthesis of Robust Polynomial Arithmetic with Stochastic Logic. In 45th ACM/IEEE Design Automation Conference, DAC'08. 648--653.Google Scholar
Weikang Qian, Marc D. Riedel, and Ivo Rosenberg. 2011b. Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval. Eur. J. Comb., Vol. 32, 3 (April 2011), 448--463. https://doi.org/10.1016/j.ejc.2010.11.004Google ScholarDigital Library
Sayed Ahmad Salehi, Yin Liu, Marc D. Riedel, and Keshab K. Parhi. 2017. Computing Polynomials with Positive Coefficients Using Stochastic Logic by Double-NAND Expansion. In Proceedings of the on Great Lakes Symposium on VLSI 2017 (Banff, Alberta, Canada) (GLSVLSI '17). ACM, New York, NY, USA, 471--474. https://doi.org/10.1145/3060403.3060410Google ScholarDigital Library
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017).Google Scholar
vloncar, Sioni Summers, Javier Duarte, Nhan Tran, Ben Kreis, jngadiub, Nicolò Ghielmetti, Duc Hoang, EJ Kreinar, Kelvin Lin, Maksymilian Graczyk, Adrian Alan Pol, ngpaladi, Dejan Golubovic, Yutaro Iiyama, Zhenbin Wu, Delon, Paolo Cretaro, veyron8800, Anders Wind, David, GDG, Jovan Mitrevski, Konstantin Vinogradov, Konstantin Vinogradov, Petr Zejdl, Sarun Nuntaviriyakul, Thea Aarrestad, and drankincms. 2021. fastmachinelearning/hls4ml: coris. https://doi.org/10.5281/zenodo.5680908Google ScholarCross Ref
John Von Neumann. 1956. Probabilistic logics and the synthesis of reliable organisms from unreliable components. Automata studies, Vol. 34, 34 (1956), 43--98.Google Scholar
Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R Bowman. 2018. GLUE: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461 (2018).Google Scholar
Zhiheng Wang, Naman Saraf, Kia Bazargan, and Arnd Scheel. 2015. Randomness Meets Feedback: Stochastic Implementation of Logistic Map Dynamical System. In Proceedings of the 52Nd Annual Design Automation Conference (San Francisco, California) (DAC '15). ACM, New York, NY, USA, Article 132, 7 pages. https://doi.org/10.1145/2744769.2744898Google ScholarDigital Library

Index Terms

Approximate Hybrid Binary-Unary Computing with Applications in BERT Language Model and Image Processing
1. Computing methodologies
  1. Computer graphics
    1. Image manipulation
      1. Image processing
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. Hardware
  1. Integrated circuits
    1. Reconfigurable logic and FPGAs

Recommendations

Hybrid binary-unary hardware accelerator
ASPDAC '19: Proceedings of the 24th Asia and South Pacific Design Automation Conference

Stochastic computing has been used in recent years to create designs with significantly smaller area by harnessing unary encoding of data. However, the low area advantage comes at an exponential price in latency, making the area x delay cost ...
Read More
Approximate Constant-Coefficient Multiplication Using Hybrid Binary-Unary Computing for FPGAs
Multipliers are used in virtually all Digital Signal Processing (DSP) applications such as image and video processing. Multiplier efficiency has a direct impact on the overall performance of such applications, especially when real-time processing is ...
Read More
Towards energy-efficient CGRAs via stochastic computing
DATE '22: Proceedings of the 2022 Conference & Exhibition on Design, Automation & Test in Europe

Stochastic computing (SC) is a promising computing paradigm for low-power and low-cost applications with the added benefit of high error tolerance. Meanwhile, Coarse-Grained Re-configurable Architecture (CGRA) is also a promising platform for domain-...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
FPGA '23: Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays
February 2023
283 pages
ISBN:9781450394178
DOI:10.1145/3543622
General Chair:
Paolo Ienne
EPFL, Switzerland
,
Program Chair:
Zhiru Zhang
Cornell University, USA
Copyright © 2023 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 12 February 2023
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
BERT language model
approximate computing
hardware accelerators
image processing
stochastic computing
unary computing
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate125of627submissions,20%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 3
  Total Citations
  View Citations
- 563
  Total Downloads
- Downloads (Last 12 months)370
- Downloads (Last 6 weeks)33
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Approximate Hybrid Binary-Unary Computing with Applications in BERT Language Model and Image Processing

FPGA '23: Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays

ABSTRACT

References

Cited By

Index Terms

Recommendations

Hybrid binary-unary hardware accelerator

Approximate Constant-Coefficient Multiplication Using Hybrid Binary-Unary Computing for FPGAs

Towards energy-efficient CGRAs via stochastic computing