research-article

Public Access

Towards High Performance and Accurate BNN Inference on FPGA with Structured Fine-Grained Pruning

Authors:

Xulong ShiAuthors Info & Claims

ICCAD '22: Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design

Article No.: 60, Pages 1 - 9

https://doi.org/10.1145/3508352.3549368

Published: 22 December 2022 Publication History

Abstract

As the extreme case of quantization networks, Binary Neural Networks (BNNs) have received tremendous attention due to many hardware-friendly properties in terms of storage and computation. To reach the limit of compact models, we attempt to combine binarization with pruning techniques, further exploring the redundancy of BNNs. However, coarse-grained pruning methods may cause server accuracy drops, while traditional fine-grained ones induce irregular sparsity hard to be utilized by hardware. In this paper, we propose two advanced fine-grained BNN pruning modules, i.e., structured channel-wise kernel pruning and dynamic spatial pruning, from a joint perspective of algorithm and hardware. The pruned BNN models are trained from scratch and present not only a higher precision but also a high degree of parallelism. Then, we develop an accelerator architecture that can effectively exploit the sparsity caused by our algorithm. Finally, we implement the pruned BNN models on an embedded FPGA (Ultra96v2). The results show that our software and hardware codesign achieves 5.4x inference-speedup than the baseline BNN, with higher resource and energy efficiency compared with prior FPGA implemented BNN works.

References

[1]

C. Yuan and S. S. Agaian, "A comprehensive review of binary neural network," ArXiv, vol. abs/2110.06804, 2021.

[2]

W. Romaszkan, T. Li, and P. Gupta, "3pxnet: Pruned-permuted-packed xnor networks for edge machine learning," ACM Trans. Embed. Comput. Syst., vol. 19, pp. 5:1--5:23, 2020.

Digital Library

[3]

Y. Li and F. Ren, "Bnn pruning: Pruning binary neural network guided by weight flipping frequency," 2020 21st International Symposium on Quality Electronic Design (ISQED), pp. 306--311, 2020.

[4]

Z. Liu, B. Wu, W. Luo, X. Yang, W. Liu, and K. Cheng, "Bi-real net: Enhancing the performance of 1-bit cnns with improved representational capability and advanced training algorithm," in ECCV, 2018.

[5]

Q. Wu, X. Lu, S. Xue, C. Wang, X. Wu, and J. Fan, "Sbnn: Slimming binarized neural network," Neurocomputing, vol. 401, pp. 113--122, 2020.

[6]

S. A. Munagala, A. Prabhu, and A. M. Namboodiri, "Stq-nets: Unifying network binarization and structured pruning," in BMVC, 2020.

[7]

S. Han, H. Mao, and W. J. Dally, "Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding," arXiv: Computer Vision and Pattern Recognition, 2016.

[8]

A. Parashar, M. Rhu, A. Mukkara, A. Puglielli, R. Venkatesan, B. Khailany, J. S. Emer, S. W. Keckler, and W. J. Dally, "Scnn: An accelerator for compressed-sparse convolutional neural networks," 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), pp. 27--40, 2017.

Digital Library

[9]

Z. Tan, J. Song, X. Ma, S. H. Tan, H. Chen, Y. Miao, Y. Wu, S. Ye, Y. Wang, D. Li, and K. Ma, "Pcnn: Pattern-based fine-grained regular pruning towards optimizing cnn accelerators," 2020 57th ACM/IEEE Design Automation Conference (DAC), pp. 1--6, 2020.

[10]

A. K. Mishra, J. A. Latorre, J. Pool, D. Stosic, D. Stosic, G. Venkatesh, C. Yu, and P. Micikevicius, "Accelerating sparse deep neural networks," ArXiv, vol. abs/2104.08378, 2021.

[11]

N. Liu, X. Ma, Z. Xu, Y. Wang, J. Tang, and J. Ye, "Autocompress: An automatic dnn structured pruning framework for ultra-high compression rates," in AAAI, 2020.

[12]

A. Zhou, Y. Ma, J. Zhu, J. Liu, Z. Zhang, K. Yuan, W. Sun, and H. Li, "Learning n:m fine-grained structured sparse neural networks from scratch," ArXiv, vol. abs/2102.04010, 2021.

[13]

Q. Liu, J. Gao, and J. Lai, "Tcp-net: Minimizing operation counts of binarized neural network inference," 2021 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 1--5, 2021.

[14]

Q. Liu, J. Lai, and J. Gao, "An efficient channel-aware sparse binarized neural networks inference accelerator," IEEE Transactions on Circuits and Systems II: Express Briefs, vol. 69, pp. 1637--1641, 2022.

[15]

J. Gao, Q. Liu, and J. Lai, "An approach of binary neural network energy-efficient implementation," Electronics, 2021.

[16]

T. Geng, A. Li, T. Wang, C. Wu, Y. Li, R. Shi, W. Wu, and M. C. Herbordt, "O3bnn-r: An out-of-order architecture for high-performance and regularized bnn inference," IEEE Transactions on Parallel and Distributed Systems, vol. 32, pp. 199--213, 2021.

[17]

M. Rastegari, V. Ordonez, J. Redmon, and A. Farhadi, "Xnor-net: Imagenet classification using binary convolutional neural networks," in ECCV, 2016.

[18]

H. Qin, R. Gong, X. Liu, M. Shen, Z. Wei, F. Yu, and J. Song, "Forward and backward information retention for accurate binary neural networks," 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2247--2256, 2020.

[19]

Z. Liu, Z. Shen, M. Savvides, and K.-T. Cheng, "Reactnet: Towards precise binary neural network with generalized activation functions," ArXiv, vol. abs/2003.03488, 2020.

[20]

H. Alemdar, V. Leroy, A. Prost-Boucle, and F. Pétrot, "Ternary neural networks for resource-efficient ai applications," 2017 International Joint Conference on Neural Networks (IJCNN), pp. 2547--2554, 2017.

[21]

Y. He, X. Zhang, and J. Sun, "Channel pruning for accelerating very deep neural networks," 2017 IEEE International Conference on Computer Vision (ICCV), pp. 1398--1406, 2017.

[22]

Y. Li and F. Ren, "Build a compact binary neural network through bit-level sensitivity and data pruning. (2018)," arXiv preprint arxiv:1802.00904, 2018.

[23]

C. Fu, S. Zhu, H. Su, C.-E. Lee, and J. Zhao, "Towards fast and energy-efficient binarized neural network inference on fpga," Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2019.

[24]

Y. Wang, Y. Yang, F. Sun, and A. Yao, "Sub-bit neural networks: Learning to compress and accelerate binary neural networks," 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5340--5349, 2021.

[25]

R. Zhao, W. Song, W. Zhang, T. Xing, J.-H. Lin, M. B. Srivastava, R. K. Gupta, and Z. Zhang, "Accelerating binarized convolutional neural networks with software-programmable fpgas," Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017.

[26]

Y. Ma, Y. Cao, S. B. K. Vrudhula, and J. sun Seo, "Optimizing the convolution operation to accelerate deep neural networks on fpga," IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 26, pp. 1354--1367, 2018.

Digital Library

[27]

Y. Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. H. W. Leong, M. Jahre, and K. A. Vissers, "Finn: A framework for fast, scalable binarized neural network inference," Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2017.

Digital Library

[28]

M. Blott, T. B. Preußer, N. J. Fraser, G. Gambardella, K. O'Brien, and Y. Umuroglu, "Finn-r: An end-to-end deep-learning framework for fast exploration of quantized neural networks," arXiv: Hardware Architecture, 2018.

[29]

T. Geng, T. Wang, C. Wu, C. Yang, S. Song, A. Li, and M. C. Herbordt, "Lp-bnn: Ultra-low-latency bnn inference with layer parallelism," 2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP), vol. 2160-052X, pp. 9--16, 2019.

[30]

E. Jang, S. S. Gu, and B. Poole, "Categorical reparameterization with gumbel-softmax," ArXiv, vol. abs/1611.01144, 2017.

[31]

T. Verelst and T. Tuytelaars, "Dynamic convolutions: Exploiting spatial sparsity for faster inference," 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2317--2326, 2020.

[32]

F. Li, G. Li, X. He, and J. Cheng, "Dynamic dual gating neural networks," 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5310--5319, 2021.

[33]

A. Krizhevsky, "Learning multiple layers of features from tiny images," 2009.

[34]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, "Imagenet: A large-scale hierarchical image database," in CVPR, 2009.

[35]

K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image recognition," 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770--778, 2016.

[36]

K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-scale image recognition," CoRR, vol. abs/1409.1556, 2015.

[37]

R. Ding, T.-W. Chin, Z. D. Liu, and D. Marculescu, "Regularizing activation distribution for training binarized deep networks," 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11 400--11 409, 2019.

[38]

M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Bengio, "Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or -1," arXiv: Learning, 2016.

[39]

Z. Xu, M. Lin, J. Liu, J. Chen, L. Shao, Y. Gao, Y. Tian, and R. Ji, "Recu: Reviving the dead weights in binary neural networks," 2021 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 5178--5188, 2021.

[40]

M. Courbariaux and Y. Bengio, "Binarynet: Training deep neural networks with weights and activations constrained to +1 or -1," ArXiv, vol. abs/1602.02830, 2016.

[41]

P. Guo, H. Ma, R. Chen, P. Li, S. Xie, and D. Wang, "Fbna: A fully binarized neural network accelerator," in 2018 28th International Conference on Field Programmable Logic and Applications (FPL). IEEE, 2018, pp. 51--513.

[42]

S. Zhou, Z. Ni, X. Zhou, H. Wen, Y. Wu, and Y. Zou, "Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients," ArXiv, vol. abs/1606.06160, 2016.

Cited By

Schiavone RGalati FZuluaga M(2023)Binary Domain Generalization for Sparsifying Binary Neural NetworksMachine Learning and Knowledge Discovery in Databases: Research Track10.1007/978-3-031-43415-0_8(123-140)Online publication date: 18-Sep-2023
https://dl.acm.org/doi/10.1007/978-3-031-43415-0_8

Index Terms

Towards High Performance and Accurate BNN Inference on FPGA with Structured Fine-Grained Pruning
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
      2. Reconfigurable computing
2. Hardware
  1. Emerging technologies
  2. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators

Index terms have been assigned to the content through auto-classification.

Recommendations

FracBNN: Accurate and FPGA-Efficient Binary Neural Networks with Fractional Activations
FPGA '21: The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Binary neural networks (BNNs) have 1-bit weights and activations. Such networks are well suited for FPGAs, as their dominant computations are bitwise arithmetic and the memory requirement is also significantly reduced. However, compared to start-of-the-...
A Fine-Grained Sparse Accelerator for Multi-Precision DNN
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Neural Networks (NNs) have made a significant breakthrough in many fields, while they also pose a great challenge to hardware platforms since the state-of-the-art neural networks are both communicational- and computational-intensive. Researchers ...
Structured Pruning with Automatic Pruning Rate Derivation for Image Processing Neural Networks
ISMSI '22: Proceedings of the 2022 6th International Conference on Intelligent Systems, Metaheuristics & Swarm Intelligence

Structured pruning has been proposed for network model compression. Because most of existing structured pruning methods assign pruning rate manually, finding appropriate pruning rate to suppress the degradation of pruned model accuracy is difficult. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICCAD '22: Proceedings of the 41st IEEE/ACM International Conference on Computer-Aided Design

October 2022

1467 pages

ISBN:9781450392174

DOI:10.1145/3508352

Conference Chair:
Tulika Mitra
National University of Singapore
,
Program Chairs:
Evangeline Young
The Chinese University of Hong Kong
,
Jinjun Xiong
University at Buffalo (UB)

Copyright © 2022 ACM.

Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of a national government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

In-Cooperation

IEEE-EDS: Electronic Devices Society
IEEE CAS
IEEE CEDA

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 December 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

ICCAD '22

Sponsor:

SIGDA

ICCAD '22: IEEE/ACM International Conference on Computer-Aided Design

October 30 - November 3, 2022

California, San Diego

Acceptance Rates

Overall Acceptance Rate 457 of 1,762 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
340
Total Downloads

Downloads (Last 12 months)167
Downloads (Last 6 weeks)11

Reflects downloads up to 28 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Schiavone RGalati FZuluaga M(2023)Binary Domain Generalization for Sparsifying Binary Neural NetworksMachine Learning and Knowledge Discovery in Databases: Research Track10.1007/978-3-031-43415-0_8(123-140)Online publication date: 18-Sep-2023
https://dl.acm.org/doi/10.1007/978-3-031-43415-0_8

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten