skip to main content
10.1145/3508352.3549418acmconferencesArticle/Chapter ViewAbstractPublication PagesiccadConference Proceedingsconference-collections
research-article

Squeezing Accumulators in Binary Neural Networks for Extremely Resource-Constrained Applications

Published: 22 December 2022 Publication History

Abstract

The cost and power consumption of BNN (Binarized Neural Network) hardware is dominated by additions. In particular, accumulators account for a large fraction of hardware overhead, which could be effectively reduced by using reduced-width accumulators. However, it is not straightforward to find the optimal accumulator width due to the complex interplay between width, scale, and the effect of training. In this paper we present algorithmic and hardware-level methods to find the optimal accumulator size for BNN hardware with minimal impact on the quality of result. First, we present partial sum scaling, a top-down approach to minimize the BNN accumulator size based on advanced quantization techniques. We also present an efficient, zero-overhead hardware design for partial sum scaling. Second, we evaluate a bottom-up approach that is to use saturating accumulator, which is more robust against overflows. Our experimental results using CIFAR-10 dataset demonstrate that our partial sum scaling along with our optimized accumulator architecture can reduce the area and power consumption of datapath by 15.50% and 27.03%, respectively, with little impact on inference performance (less than 2%), compared to using 16-bit accumulator.

References

[1]
Azat Azamat, Faaiz Asim, and Jongeun Lee. 2021. Quarry: Quantization-based ADC Reduction for ReRAM-based Deep Neural Network Accelerators. In 2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD). IEEE, 1--7.
[2]
Peng Chen, Jing Liu, Bohan Zhuang, Mingkui Tan, and Chunhua Shen. 2021. AQD: Towards Accurate Quantized Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 104--113.
[3]
Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1. arXiv preprint arXiv:1602.02830 (2016).
[4]
Barry de Bruin, Zoran Zivkovic, and Henk Corporaal. 2020. Quantization of Deep Neural Networks for Accumulator-constrained Processors. Microprocessors and Microsystems 72 (2020), 102872.
[5]
Steven K Esser, Jeffrey L McKinstry, Deepika Bablani, Rathinakumar Appuswamy, and Dharmendra S Modha. 2019. Learned Step Size Quantization. arXiv preprint arXiv:1902.08153 (2019).
[6]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770--778.
[7]
Chun-Hsian Huang. 2021. An FPGA-based Hardware/Software Design Using Binarized Neural Networks for Agricultural Applications: A Case Study. IEEE Access 9 (2021), 26523--26531.
[8]
Sugil Lee, Hyeonuk Sim, Jooyeon Choi, and Jongeun Lee. 2019. Successive Log Quantization for Cost-Efficient Neural Networks Using Stochastic Computing. In 2019 56th ACM/IEEE Design Automation Conference (DAC). IEEE, 1--6.
[9]
Yuhang Li, Xin Dong, and Wei Wang. 2019. Additive Powers-of-Two Quantization: An Efficient Non-uniform Discretization for Neural Networks. arXiv preprint arXiv:1909.13144 (2019).
[10]
Zechun Liu, Baoyuan Wu, Wenhan Luo, Xin Yang, Wei Liu, and Kwang-Ting Cheng. 2018. Bi-Real Net: Enhancing the Performance of 1-bit CNNs with Improved Representational Capability and Advanced Training Algorithm. In Proceedings of the European conference on computer vision (ECCV). 722--737.
[11]
Thierry Moreau, Tianqi Chen, Luis Vega, Jared Roesch, Eddie Yan, Lianmin Zheng, Josh Fromm, Ziheng Jiang, Luis Ceze, Carlos Guestrin, et al. 2019. A Hardware-Software Blueprint for Flexible Deep Learning Specialization. IEEE Micro 39, 5 (2019), 8--16.
[12]
Renkun Ni, Hong-min Chu, Oscar Castañeda Fernández, Ping-yeh Chiang, Christoph Studer, and Tom Goldstein. 2021. WrapNet: Neural Net Inference with Ultra-Low-Precision Arithmetic. In International Conference on Learning Representations ICLR 2021. OpenReview.
[13]
Eriko Nurvitadhi, David Sheffield, Jaewoong Sim, Asit Mishra, Ganesh Venkatesh, and Debbie Marr. 2016. Accelerating Binarized Neural Networks: Comparison of FPGA, CPU, GPU, and ASIC. In 2016 International Conference on Field-Programmable Technology (FPT). IEEE, 77--84.
[14]
Sangyun Oh, Hyeonuk Sim, Jounghyun Kim, and Jongeun Lee. 2022. Non-Uniform Step Size Quantization for Accurate Post-Training Quantization. In Proceedings of the 17th European Conference on Computer Vision (ECCV). Springer International Publishing.
[15]
Sangyun Oh, Hyeonuk Sim, Sugil Lee, and Jongeun Lee. 2021. Automated Log-Scale Quantization for Low-Cost Deep Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 742--751.
[16]
Charbel Sakr, Naigang Wang, Chia-Yu Chen, Jungwook Choi, Ankur Agrawal, Naresh Shanbhag, and Kailash Gopalakrishnan. 2019. Accumulation Bit-Width Scaling For Ultra-Low Precision Training Of Deep Networks. arXiv preprint arXiv:1901.06588 (2019).
[17]
Yaman Umuroglu, Nicholas J Fraser, Giulio Gambardella, Michaela Blott, Philip Leong, Magnus Jahre, and Kees Vissers. 2017. FINN: A Framework for Fast, Scalable Binarized Neural Network Inference. In Proceedings of the 2017 ACM/SIGDA international symposium on field-programmable gate arrays. 65--74.
[18]
Xiandong Zhao, Ying Wang, Xuyi Cai, Cheng Liu, and Lei Zhang. 2020. Linear Symmetric Quantization of Neural Networks for Low-precision Integer Hardware. In International Conference on Learning Representations ICLR 2020. OpenReview.

Cited By

View all
  • (2024)A2Q+Proceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692439(9275-9291)Online publication date: 21-Jul-2024
  • (2024)Extending Neural Processing Unit and Compiler for Advanced Binarized Neural NetworksProceedings of the 29th Asia and South Pacific Design Automation Conference10.1109/ASP-DAC58780.2024.10473822(115-120)Online publication date: 22-Jan-2024

Index Terms

  1. Squeezing Accumulators in Binary Neural Networks for Extremely Resource-Constrained Applications
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          ICCAD '22: Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design
          October 2022
          1467 pages
          ISBN:9781450392174
          DOI:10.1145/3508352
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Sponsors

          In-Cooperation

          • IEEE-EDS: Electronic Devices Society
          • IEEE CAS
          • IEEE CEDA

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 22 December 2022

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. accumulator
          2. adder tree
          3. binarized neural network
          4. neural network accelerator
          5. quantization
          6. saturating arithmetic

          Qualifiers

          • Research-article

          Conference

          ICCAD '22
          Sponsor:
          ICCAD '22: IEEE/ACM International Conference on Computer-Aided Design
          October 30 - November 3, 2022
          California, San Diego

          Acceptance Rates

          Overall Acceptance Rate 457 of 1,762 submissions, 26%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)41
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 28 Feb 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)A2Q+Proceedings of the 41st International Conference on Machine Learning10.5555/3692070.3692439(9275-9291)Online publication date: 21-Jul-2024
          • (2024)Extending Neural Processing Unit and Compiler for Advanced Binarized Neural NetworksProceedings of the 29th Asia and South Pacific Design Automation Conference10.1109/ASP-DAC58780.2024.10473822(115-120)Online publication date: 22-Jan-2024

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media