skip to main content
10.1145/3543622.3573181acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article

Approximate Hybrid Binary-Unary Computing with Applications in BERT Language Model and Image Processing

Published:12 February 2023Publication History

ABSTRACT

We propose a novel method for approximate hardware implementation of univariate math functions with significantly fewer hardware resources compared to previous approaches. Examples of such functions include exp(x) and the activation function GELU(x), both used in transformer networks, gamma(x), which is used in image processing, and other functions such as tanh(x), cosh(x), sq(x), and sqrt(x). The method builds on previous works on hybrid binary-unary computing. The novelty in our approach is that we break a function into a number of sub-functions such that implementing each sub-function becomes cheap, and converting the output of the sub-functions to binary becomes almost trivial. Our method also uses self-similarity in functions to further reduce the cost. We compare our method to the conventional binary, previous stochastic computing, and hybrid binary-unary methods on several functions at 8-, 12-, and 16-bit resolutions. While preserving high accuracy, our method outperforms previous works in terms of hardware cost, e.g., tolerating less than 0.01 mean absolute error, our method reduces the (area x latency) cost on average by 5, 7, and 2 orders of magnitude, compared to the conventional binary, stochastic computing, and hybrid binary-unary methods, respectively. Ultimately, we demonstrate the potential benefits of our method for natural language processing and image processing applications. We deploy our method to implement major blocks in an encoding layer of BERT language model, and also the Roberts Cross edge detection algorithm. Both include non-linear functions.

References

  1. A. Alaghi, W. Qian, and J. P. Hayes. 2017. The Promise and Challenge of Stochastic Computing. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. PP, 99 (2017), 1--1.Google ScholarGoogle Scholar
  2. Florent De Dinechin and Bogdan Pasca. 2011. Designing custom arithmetic data paths with FloPoCo. IEEE Design & Test of Computers, Vol. 28, 4 (2011), 18--27.Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).Google ScholarGoogle Scholar
  4. J. Duarte, S. Han, P. Harris, S. Jindariani, E. Kreinar, B. Kreis, J. Ngadiuba, M. Pierini, R. Rivera, N. Tran, and Z. Wu. 2018. Fast inference of deep neural networks in FPGAs for particle physics. Journal of Instrumentation, Vol. 13, 07 (jul 2018), P07027--P07027. https://doi.org/10.1088/1748-0221/13/07/p07027Google ScholarGoogle ScholarCross RefCross Ref
  5. S. Rasoul. Faraji, Pierre Abillama, Gaurav Singh, and Kia Bazargan. 2020. HBUCNNA: Hybrid Binary-Unary Convolutional Neural Network Accelerator. In 2020 IEEE International Symposium on Circuits and Systems (ISCAS). https://doi.org/ISCAS.2020Google ScholarGoogle Scholar
  6. S Rasoul Faraji and Kia Bazargan. 2020a. Hybrid binary-unary hardware accelerator. IEEE Trans. Comput., Vol. 69, 9 (2020), 1308--1319.Google ScholarGoogle ScholarCross RefCross Ref
  7. S Rasoul Faraji and Kia Bazargan. 2020b. Hybrid binary-unary truncated multiplication for DSP Applications on FPGAs. In 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD). IEEE, 1--9.Google ScholarGoogle Scholar
  8. S. Rasoul Faraji, Gaurav Singh, and Kia Bazargan. 2019. HBUNN - Hybrid Binary-Unary Neural Network: Realizing a Complete CNN on an FPGA. In IEEE International Conference on Computer Design (ICCD) (ICCD '19).Google ScholarGoogle ScholarCross RefCross Ref
  9. N. Eamon Gaffney and Armin Alaghi. 2016. scsynth. https://github.com/arminalaghi/scsynthGoogle ScholarGoogle Scholar
  10. B.R. Gaines. 1969. Stochastic Computing Systems. In Advances in Information Systems Science. Springer US, 37--172. http://dx.doi.org/10.1007/978--1--4899--5841--9_2Google ScholarGoogle ScholarCross RefCross Ref
  11. Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N Dauphin. 2017. Convolutional sequence to sequence learning. In International conference on machine learning. PMLR, 1243--1252.Google ScholarGoogle Scholar
  12. Alex Graves. 2012. Sequence transduction with recurrent neural networks. arXiv preprint arXiv:1211.3711 (2012).Google ScholarGoogle Scholar
  13. Dan Hendrycks and Kevin Gimpel. 2016. Gaussian error linear units (gelus). arXiv preprint arXiv:1606.08415 (2016).Google ScholarGoogle Scholar
  14. Ruofei Hu, Binren Tian, Shouyi Yin, and Shaojun Wei. 2018. Efficient hardware architecture of softmax layer in deep neural network. In 2018 IEEE 23rd International Conference on Digital Signal Processing (DSP). IEEE, 1--5.Google ScholarGoogle ScholarCross RefCross Ref
  15. Devon Jenson and Marc Riedel. 2016. A Deterministic Approach to Stochastic Computation. In Proceedings of the 35th International Conference on Computer-Aided Design (Austin, Texas) (ICCAD '16). New York, NY, USA, Article 102, 8 pages. https://doi.org/10.1145/2966986.2966988Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W Mahoney, and Kurt Keutzer. 2021. I-bert: Integer-only bert quantization. In International conference on machine learning. PMLR, 5506--5518.Google ScholarGoogle Scholar
  17. Peng Li, D.J. Lilja, W. Qian, M.D. Riedel, and K. Bazargan. 2014. Logical Computation on Stochastic Bit Streams with Linear Finite-State Machines. Computers, IEEE Transactions on, Vol. 63, 6 (June 2014), 1474--1486. https://doi.org/10.1109/TC.2012.231Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019).Google ScholarGoogle Scholar
  19. Soheil Mohajer, Zhiheng Wang, and Kia Bazargan. 2018. Routing Magic: Performing Computations Using Routing Networks and Voting Logic on Unary Encoded Data. In Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Monterey, CALIFORNIA, USA) (FPGA '18). ACM, New York, NY, USA, 77--86.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Soheil Mohajer, Zhiheng Wang, Kia Bazargan, and Yuyang Li. 2020. Parallel unary computing based on function derivatives. ACM Transactions on Reconfigurable Technology and Systems (TRETS), Vol. 14, 1 (2020), 1--25.Google ScholarGoogle Scholar
  21. M. H. Najafi, S. R. Faraji, B. Li, D. J. Lilja, and K. Bazargan. 2019. Accelerating Deterministic Bit-Stream Computing with Resolution Splitting. In 20th International Symposium on Quality Electronic Design (ISQED). 157--162. https://doi.org/10.1109/ISQED.2019.8697443Google ScholarGoogle ScholarCross RefCross Ref
  22. M. Hassan Najafi, David J. Lilja, and Marc Riedel. 2018a. Deterministic Methods for Stochastic Computing Using Low-discrepancy Sequences. In Proceedings of the International Conference on Computer-Aided Design (San Diego, California) (ICCAD '18). ACM, New York, NY, USA, Article 51, 8 pages. https://doi.org/10.1145/3240765.3240797Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. M. Hassan Najafi, D. J. Lilja, M. D. Riedel, and K. Bazargan. 2018b. Low-Cost Sorting Network Circuits Using Unary Processing. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 26, 8 (Aug 2018), 1471--1480.Google ScholarGoogle ScholarCross RefCross Ref
  24. Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. 2019. fairseq: A fast, extensible toolkit for sequence modeling. arXiv preprint arXiv:1904.01038 (2019).Google ScholarGoogle Scholar
  25. W.J. Poppelbaum, A. Dollas, J.B. Glickman, and C. O'Toole. 1987. Unary Processing. In Advances in Computers. Vol. 26. Elsevier, 47 -- 92.Google ScholarGoogle Scholar
  26. W. J. Poppelbaum, C. Afuso, and J. W. Esch. 1967. Stochastic Computing Elements and Systems. In Proceedings of the Joint Computer Conference (Anaheim, California) (AFIPS '67 (Fall)). ACM, New York, NY, USA, 635--644. https://doi.org/10.1145/1465611.1465696Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Weikang Qian, Xin Li, Marc D. Riedel, Kia Bazargan, and David J. Lilja. 2011a. An Architecture for Fault-Tolerant Computation with Stochastic Logic. IEEE Trans. Comput., Vol. 60, 1 (2011), 93--105. https://doi.org/10.1109/TC.2010.202Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. W. Qian and M.D. Riedel. 2008. The Synthesis of Robust Polynomial Arithmetic with Stochastic Logic. In 45th ACM/IEEE Design Automation Conference, DAC'08. 648--653.Google ScholarGoogle Scholar
  29. Weikang Qian, Marc D. Riedel, and Ivo Rosenberg. 2011b. Uniform Approximation and Bernstein Polynomials with Coefficients in the Unit Interval. Eur. J. Comb., Vol. 32, 3 (April 2011), 448--463. https://doi.org/10.1016/j.ejc.2010.11.004Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Sayed Ahmad Salehi, Yin Liu, Marc D. Riedel, and Keshab K. Parhi. 2017. Computing Polynomials with Positive Coefficients Using Stochastic Logic by Double-NAND Expansion. In Proceedings of the on Great Lakes Symposium on VLSI 2017 (Banff, Alberta, Canada) (GLSVLSI '17). ACM, New York, NY, USA, 471--474. https://doi.org/10.1145/3060403.3060410Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems, Vol. 30 (2017).Google ScholarGoogle Scholar
  32. vloncar, Sioni Summers, Javier Duarte, Nhan Tran, Ben Kreis, jngadiub, Nicolò Ghielmetti, Duc Hoang, EJ Kreinar, Kelvin Lin, Maksymilian Graczyk, Adrian Alan Pol, ngpaladi, Dejan Golubovic, Yutaro Iiyama, Zhenbin Wu, Delon, Paolo Cretaro, veyron8800, Anders Wind, David, GDG, Jovan Mitrevski, Konstantin Vinogradov, Konstantin Vinogradov, Petr Zejdl, Sarun Nuntaviriyakul, Thea Aarrestad, and drankincms. 2021. fastmachinelearning/hls4ml: coris. https://doi.org/10.5281/zenodo.5680908Google ScholarGoogle ScholarCross RefCross Ref
  33. John Von Neumann. 1956. Probabilistic logics and the synthesis of reliable organisms from unreliable components. Automata studies, Vol. 34, 34 (1956), 43--98.Google ScholarGoogle Scholar
  34. Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R Bowman. 2018. GLUE: A multi-task benchmark and analysis platform for natural language understanding. arXiv preprint arXiv:1804.07461 (2018).Google ScholarGoogle Scholar
  35. Zhiheng Wang, Naman Saraf, Kia Bazargan, and Arnd Scheel. 2015. Randomness Meets Feedback: Stochastic Implementation of Logistic Map Dynamical System. In Proceedings of the 52Nd Annual Design Automation Conference (San Francisco, California) (DAC '15). ACM, New York, NY, USA, Article 132, 7 pages. https://doi.org/10.1145/2744769.2744898Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Approximate Hybrid Binary-Unary Computing with Applications in BERT Language Model and Image Processing

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          FPGA '23: Proceedings of the 2023 ACM/SIGDA International Symposium on Field Programmable Gate Arrays
          February 2023
          283 pages
          ISBN:9781450394178
          DOI:10.1145/3543622

          Copyright © 2023 ACM

          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 12 February 2023

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate125of627submissions,20%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader