Accelerating Large Kernel Convolutions with Nested Winograd Transformation

Jiang, Jingbo; Chen, Xizi; Tsui, Chi-Ying

doi:10.1007/978-3-031-70947-0_6

Jingbo Jiang¹⁷,
Xizi Chen¹⁸ &
Chi-Ying Tsui¹⁷

Part of the book series: IFIP Advances in Information and Communication Technology ((IFIPAICT,volume 680))

Included in the following conference series:

IFIP/IEEE International Conference on Very Large Scale Integration - System on a Chip

Abstract

Recent literature has shown that convolutional neural networks (CNNs) with large kernels outperform vision transformers (ViTs) and CNNs with stacked small kernels in many computer vision tasks, such as object detection and image restoration. The Winograd transformation helps reduce the number of repetitive multiplications in convolution and is widely supported by many commercial AI processors. Researchers have proposed accelerating large kernel convolutions by linearly decomposing them into many small kernel convolutions and then sequentially accelerating each small kernel convolution with the Winograd algorithm. This work proposes a nested Winograd algorithm that iteratively decomposes a large kernel convolution into small kernel convolutions and proves it to be more effective than the linear decomposition Winograd transformation algorithm. Experiments show that compared to the linear decomposition Winograd algorithm, the proposed algorithm reduces the total number of multiplications by 1.4 to 10.5 times for computing 4 $\times $ 4 to 31 $\times $ 31 convolutions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Hardcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

HetConv: Beyond Homogeneous Convolution Kernels for Deep CNNs

Article 18 November 2019

Convolution without multiplication: A general speed up strategy for CNNs

Article 11 November 2021

Dynamic Group Convolution for Accelerating Convolutional Neural Networks

References

Agarwal, R., Cooley, J.: New algorithms for digital convolution. IEEE Trans. Acoust. Speech Signal Process. 25(5), 392–410 (1977)
Article MATH Google Scholar
Andri, R., Bussolino, B., Cipolletta, A., Cavigelli, L., Wang, Z.: Going further with winograd convolutions: Tap-wise quantization for efficient inference on 4x4 tiles. In: 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 582–598. IEEE (2022)
Google Scholar
Budden, D., Matveev, A., Santurkar, S., Chaudhuri, S.R., Shavit, N.: Deep tensor convolution on multicores. In: International Conference on Machine Learning, pp. 615–624. PMLR (2017)
Google Scholar
Chen, Y., Liu, J., Qi, X., Zhang, X., Sun, J., Jia, J.: Scaling up kernels in 3D CNNs. arXiv preprint arXiv:2206.10555 (2022)
Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19(90), 297–301 (1965)
Article MathSciNet MATH Google Scholar
Ding, X., Zhang, X., Han, J., Ding, G.: Scaling up your kernels to 31$\times $31: revisiting large kernel design in CNNs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11963–11975 (2022). https://doi.org/10.1109/cvpr52688.2022.01166
Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014 Part IV. LNCS, vol. 8692, pp. 184–199. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_13
Chapter Google Scholar
Huang, D., et al.: A decomposable winograd method for n-d convolution acceleration in video analysis. Int. J. Comput. Vision 129(10), 2806–2826 (2021)
Article MATH Google Scholar
Huang, T., Chen, J., Jiang, L.: Ds-unext: depthwise separable convolution network with large convolutional kernel for medical image segmentation. SIViP 17(5), 1775–1783 (2023)
Article MATH Google Scholar
Jouppi, N.P., Young, C., Patil, N., Patterson, D.: A domain-specific architecture for deep neural networks. Commun. ACM 61(9), 50–59 (2018)
Article MATH Google Scholar
Lavin, A., Gray, S.: Fast algorithms for convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4013–4021 (2016)
Google Scholar
Lee, H.H., Bao, S., Huo, Y., Landman, B.A.: 3D UX-Net: a large kernel volumetric convnet modernizing hierarchical transformer for medical image segmentation. arXiv preprint arXiv:2209.15076 (2022)
Liu, C., et al.: Progressive neural architecture search. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 19–34 (2017)
Google Scholar
Liu, S., et al.: More convnets in the 2020s: scaling up kernels beyond 51x51 using sparsity. arXiv preprint arXiv:2207.03620 (2022)
Luo, P., Xiao, G., Gao, X., Wu, S.: LKD-Net: large kernel convolution network for single image dehazing. In: 2023 IEEE International Conference on Multimedia and Expo (ICME), pp. 1601–1606. IEEE (2023)
Google Scholar
Luo, W., Li, Y., Urtasun, R., Zemel, R.: Understanding the effective receptive field in deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Google Scholar
Peng, C., Zhang, X., Yu, G., Luo, G., Sun, J.: Large kernel matters–improve semantic segmentation by global convolutional network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1743–1751 (2017)
Google Scholar
Shi, B., Tang, Z., Luo, G., Jiang, M.: Winograd-based real-time super-resolution system on FPGA. In: 2019 International Conference on Field-Programmable Technology (ICFPT), pp. 423–426. IEEE (2019)
Google Scholar
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009). https://doi.org/10.1145/1498765.1498785
Article Google Scholar
Winograd, S.: Arithmetic Complexity of Computations, vol. 33. Siam (1980)
Google Scholar
Yang, C., Wang, Y., Wang, X., Geng, L.: WRA: A 2.2-to-6.3 tops highly unified dynamically reconfigurable accelerator using a novel winograd decomposition algorithm for convolutional neural networks. IEEE Trans. Circ. Syst. I: Regular Papers 66(9), 3480–3493 (2019)
Google Scholar

Download references

Acknowledgment

We would like to extend our sincere gratitude to the Hong Kong AI Chip Center for Emerging Smart Systems (ACCESS) for their pivotal support to our work.

Author information

Authors and Affiliations

Department of Electronic and Computer Engineering, Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong
Jingbo Jiang & Chi-Ying Tsui
College of Informatics, Huazhong Agricultural University, Wuhan, Hong Kong
Xizi Chen

Authors

Jingbo Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Xizi Chen
View author publications
You can also search for this author in PubMed Google Scholar
Chi-Ying Tsui
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xizi Chen .

Editor information

Editors and Affiliations

Khalifa University, Abu Dhabi, United Arab Emirates
Ibrahim (Abe) M. Elfadel
American University of Sharjah, Sharjah, United Arab Emirates
Lutfi Albasha

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jiang, J., Chen, X., Tsui, CY. (2024). Accelerating Large Kernel Convolutions with Nested Winograd Transformation. In: Elfadel, I.(.M., Albasha, L. (eds) VLSI-SoC 2023: Innovations for Trustworthy Artificial Intelligence. VLSI-SoC 2023. IFIP Advances in Information and Communication Technology, vol 680. Springer, Cham. https://doi.org/10.1007/978-3-031-70947-0_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-70947-0_6
Published: 29 December 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70946-3
Online ISBN: 978-3-031-70947-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Federation for Information Processing (opens in a new tab)

Accelerating Large Kernel Convolutions with Nested Winograd Transformation