Abstract
Recent literature has shown that convolutional neural networks (CNNs) with large kernels outperform vision transformers (ViTs) and CNNs with stacked small kernels in many computer vision tasks, such as object detection and image restoration. The Winograd transformation helps reduce the number of repetitive multiplications in convolution and is widely supported by many commercial AI processors. Researchers have proposed accelerating large kernel convolutions by linearly decomposing them into many small kernel convolutions and then sequentially accelerating each small kernel convolution with the Winograd algorithm. This work proposes a nested Winograd algorithm that iteratively decomposes a large kernel convolution into small kernel convolutions and proves it to be more effective than the linear decomposition Winograd transformation algorithm. Experiments show that compared to the linear decomposition Winograd algorithm, the proposed algorithm reduces the total number of multiplications by 1.4 to 10.5 times for computing 4 \(\times \) 4 to 31 \(\times \) 31 convolutions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agarwal, R., Cooley, J.: New algorithms for digital convolution. IEEE Trans. Acoust. Speech Signal Process. 25(5), 392–410 (1977)
Andri, R., Bussolino, B., Cipolletta, A., Cavigelli, L., Wang, Z.: Going further with winograd convolutions: Tap-wise quantization for efficient inference on 4x4 tiles. In: 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 582–598. IEEE (2022)
Budden, D., Matveev, A., Santurkar, S., Chaudhuri, S.R., Shavit, N.: Deep tensor convolution on multicores. In: International Conference on Machine Learning, pp. 615–624. PMLR (2017)
Chen, Y., Liu, J., Qi, X., Zhang, X., Sun, J., Jia, J.: Scaling up kernels in 3D CNNs. arXiv preprint arXiv:2206.10555 (2022)
Cooley, J.W., Tukey, J.W.: An algorithm for the machine calculation of complex Fourier series. Math. Comput. 19(90), 297–301 (1965)
Ding, X., Zhang, X., Han, J., Ding, G.: Scaling up your kernels to 31\(\times \)31: revisiting large kernel design in CNNs. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11963–11975 (2022). https://doi.org/10.1109/cvpr52688.2022.01166
Dong, C., Loy, C.C., He, K., Tang, X.: Learning a deep convolutional network for image super-resolution. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014 Part IV. LNCS, vol. 8692, pp. 184–199. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10593-2_13
Huang, D., et al.: A decomposable winograd method for n-d convolution acceleration in video analysis. Int. J. Comput. Vision 129(10), 2806–2826 (2021)
Huang, T., Chen, J., Jiang, L.: Ds-unext: depthwise separable convolution network with large convolutional kernel for medical image segmentation. SIViP 17(5), 1775–1783 (2023)
Jouppi, N.P., Young, C., Patil, N., Patterson, D.: A domain-specific architecture for deep neural networks. Commun. ACM 61(9), 50–59 (2018)
Lavin, A., Gray, S.: Fast algorithms for convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4013–4021 (2016)
Lee, H.H., Bao, S., Huo, Y., Landman, B.A.: 3D UX-Net: a large kernel volumetric convnet modernizing hierarchical transformer for medical image segmentation. arXiv preprint arXiv:2209.15076 (2022)
Liu, C., et al.: Progressive neural architecture search. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 19–34 (2017)
Liu, S., et al.: More convnets in the 2020s: scaling up kernels beyond 51x51 using sparsity. arXiv preprint arXiv:2207.03620 (2022)
Luo, P., Xiao, G., Gao, X., Wu, S.: LKD-Net: large kernel convolution network for single image dehazing. In: 2023 IEEE International Conference on Multimedia and Expo (ICME), pp. 1601–1606. IEEE (2023)
Luo, W., Li, Y., Urtasun, R., Zemel, R.: Understanding the effective receptive field in deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Peng, C., Zhang, X., Yu, G., Luo, G., Sun, J.: Large kernel matters–improve semantic segmentation by global convolutional network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1743–1751 (2017)
Shi, B., Tang, Z., Luo, G., Jiang, M.: Winograd-based real-time super-resolution system on FPGA. In: 2019 International Conference on Field-Programmable Technology (ICFPT), pp. 423–426. IEEE (2019)
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009). https://doi.org/10.1145/1498765.1498785
Winograd, S.: Arithmetic Complexity of Computations, vol. 33. Siam (1980)
Yang, C., Wang, Y., Wang, X., Geng, L.: WRA: A 2.2-to-6.3 tops highly unified dynamically reconfigurable accelerator using a novel winograd decomposition algorithm for convolutional neural networks. IEEE Trans. Circ. Syst. I: Regular Papers 66(9), 3480–3493 (2019)
Acknowledgment
We would like to extend our sincere gratitude to the Hong Kong AI Chip Center for Emerging Smart Systems (ACCESS) for their pivotal support to our work.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 IFIP International Federation for Information Processing
About this paper
Cite this paper
Jiang, J., Chen, X., Tsui, CY. (2024). Accelerating Large Kernel Convolutions with Nested Winograd Transformation. In: Elfadel, I.(.M., Albasha, L. (eds) VLSI-SoC 2023: Innovations for Trustworthy Artificial Intelligence. VLSI-SoC 2023. IFIP Advances in Information and Communication Technology, vol 680. Springer, Cham. https://doi.org/10.1007/978-3-031-70947-0_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-70947-0_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-70946-3
Online ISBN: 978-3-031-70947-0
eBook Packages: Computer ScienceComputer Science (R0)