Skip to main content

Accelerating Large Kernel Convolutions with Nested Winograd Transformation

  • Conference paper
  • First Online:
VLSI-SoC 2023: Innovations for Trustworthy Artificial Intelligence (VLSI-SoC 2023)

Abstract

Recent literature has shown that convolutional neural networks (CNNs) with large kernels outperform vision transformers (ViTs) and CNNs with stacked small kernels in many computer vision tasks, such as object detection and image restoration. The Winograd transformation helps reduce the number of repetitive multiplications in convolution and is widely supported by many commercial AI processors. Researchers have proposed accelerating large kernel convolutions by linearly decomposing them into many small kernel convolutions and then sequentially accelerating each small kernel convolution with the Winograd algorithm. This work proposes a nested Winograd algorithm that iteratively decomposes a large kernel convolution into small kernel convolutions and proves it to be more effective than the linear decomposition Winograd transformation algorithm. Experiments show that compared to the linear decomposition Winograd algorithm, the proposed algorithm reduces the total number of multiplications by 1.4 to 10.5 times for computing 4 \(\times \) 4 to 31 \(\times \) 31 convolutions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Agarwal, R., Cooley, J.: New algorithms for digital convolution. IEEE Trans. Acoust. Speech Signal Process. 25(5), 392–410 (1977)

    Article  MATH  Google Scholar 

  2. Andri, R., Bussolino, B., Cipolletta, A., Cavigelli, L., Wang, Z.: Going further with winograd convolutions: Tap-wise quantization for efficient inference on 4x4 tiles. In: 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 582–598. IEEE (2022)

    Google Scholar 

  3. Budden, D., Matveev, A., Santurkar, S., Chaudhuri, S.R., Shavit, N.: Deep tensor convolution on multicores. In: International Conference on Machine Learning, pp. 615–624. PMLR (2017)

    Google Scholar 

  4. Chen, Y., Liu, J., Qi, X., Zhang, X., Sun, J., Jia, J.: Scaling up kernels in 3D CNNs. arXiv preprint arXiv:2206.10555 (2022)

  5. Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19(90), 297–301 (1965)

    Article  MathSciNet  MATH  Google Scholar 

  6. Ding, X., Zhang, X., Han, J., Ding, G.: Scaling up your kernels to 31\(\times \)31: revisiting large kernel design in CNNs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11963–11975 (2022). https://doi.org/10.1109/cvpr52688.2022.01166

  7. Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014 Part IV. LNCS, vol. 8692, pp. 184–199. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_13

    Chapter  Google Scholar 

  8. Huang, D., et al.: A decomposable winograd method for n-d convolution acceleration in video analysis. Int. J. Comput. Vision 129(10), 2806–2826 (2021)

    Article  MATH  Google Scholar 

  9. Huang, T., Chen, J., Jiang, L.: Ds-unext: depthwise separable convolution network with large convolutional kernel for medical image segmentation. SIViP 17(5), 1775–1783 (2023)

    Article  MATH  Google Scholar 

  10. Jouppi, N.P., Young, C., Patil, N., Patterson, D.: A domain-specific architecture for deep neural networks. Commun. ACM 61(9), 50–59 (2018)

    Article  MATH  Google Scholar 

  11. Lavin, A., Gray, S.: Fast algorithms for convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4013–4021 (2016)

    Google Scholar 

  12. Lee, H.H., Bao, S., Huo, Y., Landman, B.A.: 3D UX-Net: a large kernel volumetric convnet modernizing hierarchical transformer for medical image segmentation. arXiv preprint arXiv:2209.15076 (2022)

  13. Liu, C., et al.: Progressive neural architecture search. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 19–34 (2017)

    Google Scholar 

  14. Liu, S., et al.: More convnets in the 2020s: scaling up kernels beyond 51x51 using sparsity. arXiv preprint arXiv:2207.03620 (2022)

  15. Luo, P., Xiao, G., Gao, X., Wu, S.: LKD-Net: large kernel convolution network for single image dehazing. In: 2023 IEEE International Conference on Multimedia and Expo (ICME), pp. 1601–1606. IEEE (2023)

    Google Scholar 

  16. Luo, W., Li, Y., Urtasun, R., Zemel, R.: Understanding the effective receptive field in deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 29 (2016)

    Google Scholar 

  17. Peng, C., Zhang, X., Yu, G., Luo, G., Sun, J.: Large kernel matters–improve semantic segmentation by global convolutional network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1743–1751 (2017)

    Google Scholar 

  18. Shi, B., Tang, Z., Luo, G., Jiang, M.: Winograd-based real-time super-resolution system on FPGA. In: 2019 International Conference on Field-Programmable Technology (ICFPT), pp. 423–426. IEEE (2019)

    Google Scholar 

  19. Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009). https://doi.org/10.1145/1498765.1498785

    Article  Google Scholar 

  20. Winograd, S.: Arithmetic Complexity of Computations, vol. 33. Siam (1980)

    Google Scholar 

  21. Yang, C., Wang, Y., Wang, X., Geng, L.: WRA: A 2.2-to-6.3 tops highly unified dynamically reconfigurable accelerator using a novel winograd decomposition algorithm for convolutional neural networks. IEEE Trans. Circ. Syst. I: Regular Papers 66(9), 3480–3493 (2019)

    Google Scholar 

Download references

Acknowledgment

We would like to extend our sincere gratitude to the Hong Kong AI Chip Center for Emerging Smart Systems (ACCESS) for their pivotal support to our work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xizi Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 IFIP International Federation for Information Processing

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jiang, J., Chen, X., Tsui, CY. (2024). Accelerating Large Kernel Convolutions with Nested Winograd Transformation. In: Elfadel, I.(.M., Albasha, L. (eds) VLSI-SoC 2023: Innovations for Trustworthy Artificial Intelligence. VLSI-SoC 2023. IFIP Advances in Information and Communication Technology, vol 680. Springer, Cham. https://doi.org/10.1007/978-3-031-70947-0_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-70947-0_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-70946-3

  • Online ISBN: 978-3-031-70947-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics