Optimizing Winograd Convolution on GPUs via Partial Kernel Fusion

Tong, Gan; Yan, Run; Yang, Ling; Lan, Mengqiao; Zhang, Jing; Cheng, Yuanhu; Ma, Wentao; Lü, Yashuai; Ma, Sheng; Huang, Libo

doi:10.1007/978-3-031-21395-3_2

Gan Tong⁹,
Run Yan⁹,
Ling Yang⁹,
Mengqiao Lan⁹,
Jing Zhang⁹,
Yuanhu Cheng⁹,
Wentao Ma⁹,
Yashuai Lü¹⁰,
Sheng Ma⁹ &
…
Libo Huang⁹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13615))

Included in the following conference series:

IFIP International Conference on Network and Parallel Computing

1365 Accesses

Abstract

Convolution operations are the essential components in modern CNNs (Convolutional Neural Networks), which are also the most time-consuming. Several fast convolution algorithms include FFT and Winograd, have been proposed to solve this problem. Winograd convolution is used to improve the inference performance of the convolution operators with small kernels, which are the mainstream in the current popular CNNs. However, the implementations of Winograd convolution in many highly optimized deep neural network libraries and deep learning compilers are not efficient. Due to the complex data dependencies of the four stages of Winograd convolution, it is very challenging to optimize it. In this paper, we improve the inference performance of the Winograd convolution operator on GPUs. We propose a sync-free implementation of the calculation stage of Winograd and furtherly propose methods of PKF (Partial Kernel Fusion) utilizing different memory levels of GPUs. We implemented PKF-Reconstructor based on TVM for PKF Winograd convolution. Evaluations on convolution operators from real-world CNNs show that our method achieves a speedup of 8.22$\times $–13.69$\times $ compared to cuDNN and 4.89$\times $–9.10$\times $ to the fastest vanilla TVM Winograd implementation.

This work is supported in part by NSFC (No. 61872374, 62090023, 62172430), NSFHN (No. 2022JJ10064, 2021JJ10052) and NKRDP (No. 2021YFB0300300).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Im2win: An Efficient Convolution Paradigm on GPU

Efficient cuDNN-Compatible Convolution-Pooling on the GPU

DSC-Ghost-Conv: A compact convolution module for building efficient neural network architectures

Article 14 July 2023

References

Patel, R., Patel, S.: A comprehensive study of applying convolutional neural network for computer vision. Int. J. Adv. Sci. Technol. 6, 2161–2174 (2020)
Google Scholar
Rawat, W., Wang, Z.: Deep convolutional neural networks for image classification: a comprehensive review. Neural Comput. 29, 2352–2449 (2017)
Article MathSciNet MATH Google Scholar
Fathi, E., Shoja, B.M.: Deep neural networks for natural language processing. In: Handbook of Statistics, vol. 38, pp. 229–316 (2018)
Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. The MIT Press (2016)
Google Scholar
Sze, V., Chen, Y.H., Yang, T.J., Emer, J.S.: Efficient processing of deep neural networks: a tutorial and survey. Proc. IEEE 105(12), 2295–2329 (2017). https://doi.org/10.1109/JPROC.2017.2761740
Article Google Scholar
Jia, L., Liang, Y., Li, X., Lu, L., Yan, S.: Enabling efficient fast convolution algorithms on GPUs via MegaKernels. IEEE Trans. Comput. 69(7), 986–997 (2020). https://doi.org/10.1109/TC.2020.2973144
Article MathSciNet MATH Google Scholar
Lavin, A., Gray, S.: Fast algorithms for convolutional neural networks. Arxiv, September 2015
Google Scholar
Li, S., Park, J., Tang, P.T.P.: Enabling sparse Winograd convolution by native pruning. arXiv e-prints arXiv:1702.08597, February 2017
Meng, L., Brothers, J.: Efficient Winograd convolution via integer arithmetic. arXiv e-prints arXiv:1901.01965, January 2019
Barabasz, B., Gregg, D.: Winograd convolution for DNNs: beyond linear polynomials. In: Alviano, M., Greco, G., Scarcello, F. (eds.) AI*IA 2019. LNCS (LNAI), vol. 11946, pp. 307–320. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-35166-3_22
Chapter Google Scholar
Yan, D., Wang, W., Chu, X.: Optimizing batched Winograd convolution on GPUs. In: Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2020, pp. 32–44. Association for Computing Machinery, New York (2020). https://doi.org/10.1145/3332466.3374520
Huang, Y., Shen, J., Wang, Z., Wen, M., Zhang, C.: A high-efficiency FPGA-based accelerator for convolutional neural networks using Winograd algorithm. J. Phys. Conf. Ser. 1026, 012019, May 2018
Google Scholar
Wang, Z., Lan, Q., He, H., Zhang, C.: Winograd algorithm for 3D convolution neural networks. In: Lintas, A., Rovetta, S., Verschure, P.F.M.J., Villa, A.E.P. (eds.) ICANN 2017. LNCS, vol. 10614, pp. 609–616. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-68612-7_69
Chapter Google Scholar
Laine, S., Karras, T., Aila, T.: Megakernels considered harmful: wavefront path tracing on GPUs. In: Proceedings of the 5th High-Performance Graphics Conference, HPG 2013, pp. 137–143. Association for Computing Machinery, New York (2013). https://doi.org/10.1145/2492045.2492060
Kennedy, K., Allen, J.R.: Optimizing Compilers for Modern Architectures: A Dependence-Based Approach. Morgan Kaufmann Publishers Inc., San Francisco (2001)
Google Scholar
Chen, T., Moreau, T., Jiang, Z., Zheng, L., Yan, E., et al.: TVM: an automated end-to-end optimizing compiler for deep learning. In: USENIX OSDI 2018, pp. 579-594. USENIX, USA (2018)
Google Scholar
Chen, T., et al.: Learning to optimize tensor programs. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) NIPS 2018, vol. 31. Curran Associates, Inc. (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer, National University of Defense Technology, Changsha, 410073, China
Gan Tong, Run Yan, Ling Yang, Mengqiao Lan, Jing Zhang, Yuanhu Cheng, Wentao Ma, Sheng Ma & Libo Huang
Cambricon Technology Co., Beijing, 100191, China
Yashuai Lü

Authors

Gan Tong
View author publications
You can also search for this author in PubMed Google Scholar
Run Yan
View author publications
You can also search for this author in PubMed Google Scholar
Ling Yang
View author publications
You can also search for this author in PubMed Google Scholar
Mengqiao Lan
View author publications
You can also search for this author in PubMed Google Scholar
Jing Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yuanhu Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Wentao Ma
View author publications
You can also search for this author in PubMed Google Scholar
Yashuai Lü
View author publications
You can also search for this author in PubMed Google Scholar
Sheng Ma
View author publications
You can also search for this author in PubMed Google Scholar
Libo Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Libo Huang .

Editor information

Editors and Affiliations

PerceptIn, Inc., Fremont, CA, USA
Shaoshan Liu
Jilin University, Changchun, China
Xiaohui Wei

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tong, G. et al. (2022). Optimizing Winograd Convolution on GPUs via Partial Kernel Fusion. In: Liu, S., Wei, X. (eds) Network and Parallel Computing. NPC 2022. Lecture Notes in Computer Science, vol 13615. Springer, Cham. https://doi.org/10.1007/978-3-031-21395-3_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-21395-3_2
Published: 01 December 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21394-6
Online ISBN: 978-3-031-21395-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics