skip to main content
10.1145/3508352.3549368acmconferencesArticle/Chapter ViewAbstractPublication PagesiccadConference Proceedingsconference-collections
research-article
Public Access

Towards High Performance and Accurate BNN Inference on FPGA with Structured Fine-Grained Pruning

Published: 22 December 2022 Publication History

Abstract

As the extreme case of quantization networks, Binary Neural Networks (BNNs) have received tremendous attention due to many hardware-friendly properties in terms of storage and computation. To reach the limit of compact models, we attempt to combine binarization with pruning techniques, further exploring the redundancy of BNNs. However, coarse-grained pruning methods may cause server accuracy drops, while traditional fine-grained ones induce irregular sparsity hard to be utilized by hardware. In this paper, we propose two advanced fine-grained BNN pruning modules, i.e., structured channel-wise kernel pruning and dynamic spatial pruning, from a joint perspective of algorithm and hardware. The pruned BNN models are trained from scratch and present not only a higher precision but also a high degree of parallelism. Then, we develop an accelerator architecture that can effectively exploit the sparsity caused by our algorithm. Finally, we implement the pruned BNN models on an embedded FPGA (Ultra96v2). The results show that our software and hardware codesign achieves 5.4x inference-speedup than the baseline BNN, with higher resource and energy efficiency compared with prior FPGA implemented BNN works.

References

[1]
C. Yuan and S. S. Agaian, "A comprehensive review of binary neural network," ArXiv, vol. abs/2110.06804, 2021.
[2]
W. Romaszkan, T. Li, and P. Gupta, "3pxnet: Pruned-permuted-packed xnor networks for edge machine learning," ACM Trans. Embed. Comput. Syst., vol. 19, pp. 5:1--5:23, 2020.
[3]
Y. Li and F. Ren, "Bnn pruning: Pruning binary neural network guided by weight flipping frequency," 2020 21st International Symposium on Quality Electronic Design (ISQED), pp. 306--311, 2020.
[4]
Z. Liu, B. Wu, W. Luo, X. Yang, W. Liu, and K. Cheng, "Bi-real net: Enhancing the performance of 1-bit cnns with improved representational capability and advanced training algorithm," in ECCV, 2018.
[5]
Q. Wu, X. Lu, S. Xue, C. Wang, X. Wu, and J. Fan, "Sbnn: Slimming binarized neural network," Neurocomputing, vol. 401, pp. 113--122, 2020.
[6]
S. A. Munagala, A. Prabhu, and A. M. Namboodiri, "Stq-nets: Unifying network binarization and structured pruning," in BMVC, 2020.
[7]
S. Han, H. Mao, and W. J. Dally, "Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding," arXiv: Computer Vision and Pattern Recognition, 2016.
[8]
A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan, B. Khailany, J. S. Emer, S. W. Keckler, and W. J. Dally, "Scnn: An accelerator for compressed-sparse convolutional neural networks," 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pp. 27--40, 2017.
[9]
Z. Tan, J. Song, X. Ma, S. H. Tan, H. Chen, Y. Miao, Y. Wu, S. Ye, Y. Wang, D. Li, and K. Ma, "Pcnn: Pattern-based fine-grained regular pruning towards optimizing cnn accelerators," 2020 57th ACM/IEEE Design Automation Conference (DAC), pp. 1--6, 2020.
[10]
A. K. Mishra, J. A. Latorre, J. Pool, D. Stosic, D. Stosic, G. Venkatesh, C. Yu, and P. Micikevicius, "Accelerating sparse deep neural networks," ArXiv, vol. abs/2104.08378, 2021.
[11]
N. Liu, X. Ma, Z. Xu, Y. Wang, J. Tang, and J. Ye, "Autocompress: An automatic dnn structured pruning framework for ultra-high compression rates," in AAAI, 2020.
[12]
A. Zhou, Y. Ma, J. Zhu, J. Liu, Z. Zhang, K. Yuan, W. Sun, and H. Li, "Learning n:m fine-grained structured sparse neural networks from scratch," ArXiv, vol. abs/2102.04010, 2021.
[13]
Q. Liu, J. Gao, and J. Lai, "Tcp-net: Minimizing operation counts of binarized neural network inference," 2021 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1--5, 2021.
[14]
Q. Liu, J. Lai, and J. Gao, "An efficient channel-aware sparse binarized neural networks inference accelerator," IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 69, pp. 1637--1641, 2022.
[15]
J. Gao, Q. Liu, and J. Lai, "An approach of binary neural network energy-efficient implementation," Electronics, 2021.
[16]
T. Geng, A. Li, T. Wang, C. Wu, Y. Li, R. Shi, W. Wu, and M. C. Herbordt, "O3bnn-r: An out-of-order architecture for high-performance and regularized bnn inference," IEEE Transactions on Parallel and Distributed Systems, vol. 32, pp. 199--213, 2021.
[17]
M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, "Xnor-net: Imagenet classification using binary convolutional neural networks," in ECCV, 2016.
[18]
H. Qin, R. Gong, X. Liu, M. Shen, Z. Wei, F. Yu, and J. Song, "Forward and backward information retention for accurate binary neural networks," 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2247--2256, 2020.
[19]
Z. Liu, Z. Shen, M. Savvides, and K.-T. Cheng, "Reactnet: Towards precise binary neural network with generalized activation functions," ArXiv, vol. abs/2003.03488, 2020.
[20]
H. Alemdar, V. Leroy, A. Prost-Boucle, and F. Pétrot, "Ternary neural networks for resource-efficient ai applications," 2017 International Joint Conference on Neural Networks (IJCNN), pp. 2547--2554, 2017.
[21]
Y. He, X. Zhang, and J. Sun, "Channel pruning for accelerating very deep neural networks," 2017 IEEE International Conference on Computer Vision (ICCV), pp. 1398--1406, 2017.
[22]
Y. Li and F. Ren, "Build a compact binary neural network through bit-level sensitivity and data pruning. (2018)," arXiv preprint arxiv:1802.00904, 2018.
[23]
C. Fu, S. Zhu, H. Su, C.-E. Lee, and J. Zhao, "Towards fast and energy-efficient binarized neural network inference on fpga," Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019.
[24]
Y. Wang, Y. Yang, F. Sun, and A. Yao, "Sub-bit neural networks: Learning to compress and accelerate binary neural networks," 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5340--5349, 2021.
[25]
R. Zhao, W. Song, W. Zhang, T. Xing, J.-H. Lin, M. B. Srivastava, R. K. Gupta, and Z. Zhang, "Accelerating binarized convolutional neural networks with software-programmable fpgas," Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017.
[26]
Y. Ma, Y. Cao, S. B. K. Vrudhula, and J. sun Seo, "Optimizing the convolution operation to accelerate deep neural networks on fpga," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 26, pp. 1354--1367, 2018.
[27]
Y. Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. H. W. Leong, M. Jahre, and K. A. Vissers, "Finn: A framework for fast, scalable binarized neural network inference," Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017.
[28]
M. Blott, T. B. Preußer, N. J. Fraser, G. Gambardella, K. O'Brien, and Y. Umuroglu, "Finn-r: An end-to-end deep-learning framework for fast exploration of quantized neural networks," arXiv: Hardware Architecture, 2018.
[29]
T. Geng, T. Wang, C. Wu, C. Yang, S. Song, A. Li, and M. C. Herbordt, "Lp-bnn: Ultra-low-latency bnn inference with layer parallelism," 2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP), vol. 2160-052X, pp. 9--16, 2019.
[30]
E. Jang, S. S. Gu, and B. Poole, "Categorical reparameterization with gumbel-softmax," ArXiv, vol. abs/1611.01144, 2017.
[31]
T. Verelst and T. Tuytelaars, "Dynamic convolutions: Exploiting spatial sparsity for faster inference," 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2317--2326, 2020.
[32]
F. Li, G. Li, X. He, and J. Cheng, "Dynamic dual gating neural networks," 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5310--5319, 2021.
[33]
A. Krizhevsky, "Learning multiple layers of features from tiny images," 2009.
[34]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in CVPR, 2009.
[35]
K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770--778, 2016.
[36]
K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," CoRR, vol. abs/1409.1556, 2015.
[37]
R. Ding, T.-W. Chin, Z. D. Liu, and D. Marculescu, "Regularizing activation distribution for training binarized deep networks," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11 400--11 409, 2019.
[38]
M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Bengio, "Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or -1," arXiv: Learning, 2016.
[39]
Z. Xu, M. Lin, J. Liu, J. Chen, L. Shao, Y. Gao, Y. Tian, and R. Ji, "Recu: Reviving the dead weights in binary neural networks," 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5178--5188, 2021.
[40]
M. Courbariaux and Y. Bengio, "Binarynet: Training deep neural networks with weights and activations constrained to +1 or -1," ArXiv, vol. abs/1602.02830, 2016.
[41]
P. Guo, H. Ma, R. Chen, P. Li, S. Xie, and D. Wang, "Fbna: A fully binarized neural network accelerator," in 2018 28th International Conference on Field Programmable Logic and Applications (FPL). IEEE, 2018, pp. 51--513.
[42]
S. Zhou, Z. Ni, X. Zhou, H. Wen, Y. Wu, and Y. Zou, "Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients," ArXiv, vol. abs/1606.06160, 2016.

Cited By

View all
  • (2023)Binary Domain Generalization for Sparsifying Binary Neural NetworksMachine Learning and Knowledge Discovery in Databases: Research Track10.1007/978-3-031-43415-0_8(123-140)Online publication date: 18-Sep-2023

Index Terms

  1. Towards High Performance and Accurate BNN Inference on FPGA with Structured Fine-Grained Pruning
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Conferences
          ICCAD '22: Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design
          October 2022
          1467 pages
          ISBN:9781450392174
          DOI:10.1145/3508352
          Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

          Sponsors

          In-Cooperation

          • IEEE-EDS: Electronic Devices Society
          • IEEE CAS
          • IEEE CEDA

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 22 December 2022

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. FPGA
          2. accelerator architecture
          3. binary neural networks
          4. structured pruning

          Qualifiers

          • Research-article

          Funding Sources

          Conference

          ICCAD '22
          Sponsor:
          ICCAD '22: IEEE/ACM International Conference on Computer-Aided Design
          October 30 - November 3, 2022
          California, San Diego

          Acceptance Rates

          Overall Acceptance Rate 457 of 1,762 submissions, 26%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)167
          • Downloads (Last 6 weeks)11
          Reflects downloads up to 28 Feb 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2023)Binary Domain Generalization for Sparsifying Binary Neural NetworksMachine Learning and Knowledge Discovery in Databases: Research Track10.1007/978-3-031-43415-0_8(123-140)Online publication date: 18-Sep-2023

          View Options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          Login options

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media