skip to main content
research-article

BISWSRBS: A Winograd-based CNN Accelerator with a Fine-grained Regular Sparsity Pattern and Mixed Precision Quantization

Published: 13 September 2021 Publication History

Abstract

Field-programmable Gate Array (FPGA) is a high-performance computing platform for Convolution Neural Networks (CNNs) inference. Winograd algorithm, weight pruning, and quantization are widely adopted to reduce the storage and arithmetic overhead of CNNs on FPGAs. Recent studies strive to prune the weights in the Winograd domain, however, resulting in irregular sparse patterns and leading to low parallelism and reduced utilization of resources. Besides, there are few works to discuss a suitable quantization scheme for Winograd.
In this article, we propose a regular sparse pruning pattern in the Winograd-based CNN, namely, Sub-row-balanced Sparsity (SRBS) pattern, to overcome the challenge of the irregular sparse pattern. Then, we develop a two-step hardware co-optimization approach to improve the model accuracy using the SRBS pattern. Based on the pruned model, we implement a mixed precision quantization to further reduce the computational complexity of bit operations. Finally, we design an FPGA accelerator that takes both the advantage of the SRBS pattern to eliminate low-parallelism computation and the irregular memory accesses, as well as the mixed precision quantization to get a layer-wise bit width. Experimental results on VGG16/VGG-nagadomi with CIFAR-10 and ResNet-18/34/50 with ImageNet show up to 11.8×/8.67× and 8.17×/8.31×/10.6× speedup, 12.74×/9.19× and 8.75×/8.81×/11.1× energy efficiency improvement, respectively, compared with the state-of-the-art dense Winograd accelerator [20] with negligible loss of model accuracy. We also show that our design has 4.11× speedup compared with the state-of-the-art sparse Winograd accelerator [19] on VGG16.

References

[1]
Wikipedia. 2018. Retrieved from https://en.wikipedia.org/wiki/Sparse_matrix.
[2]
Shijie Cao, Chen Zhang, Zhuliang Yao, Wencong Xiao, Lanshun Nie, Dechen Zhan et al. 2019. Efficient and effective sparse LSTM on FPGA with bank-balanced sparsity. In Proceedings of the ACM/SIGDA International Symposium on Field-programmable Gate Arrays. ACM, 63–72.
[3]
Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I.-Jen Chuang, Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. 2018. PACT: Parameterized clipping activation for quantized neural networks. Retrieved from http://arxiv.org/abs/1805.06085.
[4]
L. Deng, G. Li, S. Han, L. Shi, and Y. Xie. 2020. Model compression and hardware acceleration for neural networks: A comprehensive survey. Proc. IEEE 108, 4 (2020), 485–532.
[5]
Kaiyuan Guo, Shulin Zeng, Jincheng Yu, Yu Wang, and Huazhong Yang. 2019. [DL] A survey of FPGA-based neural network inference accelerators. ACM Trans. Reconfigurable Technol. Syst. 12, 1, Article 2 (Mar. 2019), 26 pages.
[6]
P. Gysel, J. Pimentel, M. Motamedi, and S. Ghiasi. 2018. Ristretto: A framework for empirical study of resource-efficient inference in convolutional neural networks. IEEE Trans. Neural Netw. Learn. Syst. 29, 11 (2018), 5784–5789.
[7]
Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient inference engine on compressed deep neural network. SIGARCH Comput. Archit. News 44, 3 (June 2016), 243254.
[8]
Song Han, Huizi Mao, and William J. Dally. 2016. Deep Compression: Compressing deep neural network with pruning, trained quantization and huffman coding. In Proceedings of the 4th International Conference on Learning Representations (ICLR'16).
[9]
Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'18).
[10]
Alex Krizhevsky. 2012. Learning multiple layers of features from tiny images. Technical report, University of Toronto.
[11]
Andrew Lavin and Scott Gray. 2016. Fast algorithms for convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'16).
[12]
G. Li, L. Liu, X. Wang, X. Ma, and X. Feng. 2020. Lance: Efficient low-precision quantized winograd convolution for neural networks based on graphics processing units. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'20). 3842–3846.
[13]
Hao Li, Asim Kadav, Igor Durdanovic, Hanan Samet, and Hans Peter Graf. 2017. Pruning filters for efficient convnets. In Proceedings of the 5th International Conference on Learning Representations (ICLR'17).
[14]
Sheng R. Li, Jongsoo Park, and Ping Tak Peter Tang. 2017. Enabling sparse winograd convolution by native pruning. Retrieved from https://abs/1702.08597.
[15]
Hanxiao Liu, Karen Simonyan, and Yiming Yang. 2018. DARTS: differentiable architecture search. Retrieved from http://arxiv.org/abs/1806.09055.
[16]
Weibo Liu, Zidong Wang, Xiaohui Liu, Nianyin Zeng, Yurong Liu, and Fuad E. Alsaadi. 2017. A survey of deep neural network architectures and their applications. Neurocomputing 234 (2017), 11–26.
[17]
Xingyu Liu, Jeff Pool, Song Han, and William J. Dally. 2018. Efficient sparse-winograd convolutional neural networks. In Proceedings of the 6th International Conference on Learning Representations (ICLR'18).
[18]
Zhuang Liu, Jianguo Li, Zhiqiang Shen, Gao Huang, Shoumeng Yan, and Changshui Zhang. 2017. Learning efficient convolutional networks through network slimming. In Proceedings of the IEEE International Conference on Computer Vision (ICCV'17). 2755–2763.
[19]
L. Lu and Y. Liang. 2018. SpWA: An efficient sparse winograd convolutional neural networks accelerator on FPGAs. In Proceedings of the 55th ACM/ESDA/IEEE Design Automation Conference (DAC'18). 1–6.
[20]
L. Lu, Y. Liang, Q. Xiao, and S. Yan. 2017. Evaluating fast algorithms for convolutional neural networks on FPGAs. In Proceedings of the IEEE 25th Annual International Symposium on Field-programmable Custom Computing Machines (FCCM'17). 101–108.
[21]
Wenjie Luo, Yujia Li, Raquel Urtasun, and Richard Zemel. 2016. Understanding the effective receptive field in deep convolutional neural networks. In Advances in Neural Information Processing Systems, D. Lee, M. Sugiyama, U. Luxburg, I. Guyon, and R. Garnett (Eds.), Vol. 29. Curran Associates, 4898–4906. Retrieved from https://proceedings.neurips.cc/paper/2016/file/c8067ad1937f728f51288b3eb986afaa-Paper.pdf.
[22]
H. Mao, S. Han, J. Pool, W. Li, X. Liu, Y. Wang, and W. J. Dally. 2017. Exploring the granularity of sparsity in convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW'17). 1927–1934.
[23]
Nagadomi. 2014. Code for kaggle-cifar10 competition. 5th place.Retrieved from https://github.com/nagadomi/kaggle-cifar10-torch7.
[24]
E. Park, D. Kim, and S. Yoo. 2018. Energy-efficient neural network accelerator based on outlier-aware low-precision computation. In Proceedings of the ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA'18). 688–698.
[25]
Adam Paszke et al. 2019. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, 8024–8035. Retrieved from http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf.
[26]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S. Bernstein, Alexander C. Berg, and Fei-Fei Li. 2014. ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115 (2014), 211–252.
[27]
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations (ICLR'15).
[28]
C. Szegedy, Wei Liu, Yangqing Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'15). 1–9.
[29]
Yaman Umuroglu and Magnus Jahre. 2017. Streamlined deployment for quantized neural networks. Retrieved from http://arxiv.org/abs/1709.04060.
[30]
Haonan Wang, Wenjian Liu, Tianyi Xu, Jun Lin, and Zhongfeng Wang. 2019. A low-latency sparse-winograd accelerator for convolutional neural networks. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP'19). IEEE, 1448–1452.
[31]
Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, and Song Han. 2019. HAQ: Hardware-aware automated quantization with mixed precision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR'19).
[32]
D. Williamson. 1991. Dynamically scaled fixed point arithmetic. In Proceedings of the IEEE Pacific Rim Conference on Communications, Computers and Signal Processing. 315–318vol.1.
[33]
T. Yang, Y. Liao, J. Shi, Y. Liang, N. Jing, and L. Jiang. 2020. A winograd-based CNN accelerator with a fine-grained regular sparsity pattern. In Proceedings of the 30th International Conference on Field-programmable Logic and Applications (FPL'20). 254–261.
[34]
Haibao Yu, Qi Han, Jianbo Li, Jianping Shi, Guangliang Cheng, and Bin Fan. 2020. Search What You Want: Barrier Panelty NAS for Mixed Precision Quantization. Retrieved from https://arXiv.cs.CV/2007.10026.
[35]
Jiecao Yu, Jongsoo Park, and Maxim Naumov. 2018. Spatial-winograd pruning enabling sparse winograd convolution. Retrieved from https://abs/1901.02132.
[36]
Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, and Yurong Chen. 2017. Incremental network quantization: Towards lossless cnns with low-precision weights. Retrieved from http://arxiv.org/abs/1702.03044.
[37]
Shuchang Zhou, Zekun Ni, Xinyu Zhou, He Wen, Yuxin Wu, and Yuheng Zou. 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. Retrieved from http://arxiv.org/abs/1606.06160.
[38]
Chenzhuo Zhu, Song Han, Huizi Mao, and William J. Dally. 2016. Trained ternary quantization. Retrieved from http://arxiv.org/abs/1612.01064.

Cited By

View all
  • (2025)A Precision-Scalable Sparse CNN Accelerator with Fine-Grained Mixed Bitwidth ConfigurabilityIEICE Electronics Express10.1587/elex.22.20240601Online publication date: 2025
  • (2024)Winols: A Large-Tiling Sparse Winograd CNN Accelerator on FPGAsACM Transactions on Architecture and Code Optimization10.1145/364368221:2(1-24)Online publication date: 23-Mar-2024
  • (2024)Advances in Pruning and Quantization for Natural Language ProcessingIEEE Access10.1109/ACCESS.2024.346563112(139113-139128)Online publication date: 2024
  • Show More Cited By

Index Terms

  1. BISWSRBS: A Winograd-based CNN Accelerator with a Fine-grained Regular Sparsity Pattern and Mixed Precision Quantization

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Reconfigurable Technology and Systems
      ACM Transactions on Reconfigurable Technology and Systems  Volume 14, Issue 4
      December 2021
      165 pages
      ISSN:1936-7406
      EISSN:1936-7414
      DOI:10.1145/3483341
      • Editor:
      • Deming Chen
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 13 September 2021
      Accepted: 01 May 2021
      Revised: 01 February 2021
      Received: 01 October 2020
      Published in TRETS Volume 14, Issue 4

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Winograd fast algorithm
      2. convolutional neural networks
      3. hardware-friendly sparsity
      4. mixed precision quantization
      5. FPGA

      Qualifiers

      • Research-article
      • Refereed

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)61
      • Downloads (Last 6 weeks)2
      Reflects downloads up to 07 Mar 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2025)A Precision-Scalable Sparse CNN Accelerator with Fine-Grained Mixed Bitwidth ConfigurabilityIEICE Electronics Express10.1587/elex.22.20240601Online publication date: 2025
      • (2024)Winols: A Large-Tiling Sparse Winograd CNN Accelerator on FPGAsACM Transactions on Architecture and Code Optimization10.1145/364368221:2(1-24)Online publication date: 23-Mar-2024
      • (2024)Advances in Pruning and Quantization for Natural Language ProcessingIEEE Access10.1109/ACCESS.2024.346563112(139113-139128)Online publication date: 2024
      • (2023)DTATrans: Leveraging Dynamic Token-Based Quantization With Accuracy Compensation Mechanism for Efficient Transformer ArchitectureIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.318154142:2(509-520)Online publication date: Feb-2023

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media