ConvDarts: a fast and exact convolutional algorithm selector for deep learning frameworks

Bai, Lu; Ji, Weixing; Li, Qinyuan; Yao, Xilai; Xin, Wei; Zhu, Wanyi

doi:10.1007/s42514-023-00167-7

ConvDarts: a fast and exact convolutional algorithm selector for deep learning frameworks

Regular Paper
Published: 20 September 2023

Volume 6, pages 32–44, (2024)
Cite this article

CCF Transactions on High Performance Computing Aims and scope Submit manuscript

Lu Bai¹,
Weixing Ji ORCID: orcid.org/0000-0002-3250-0435¹,
Qinyuan Li¹,
Xilai Yao¹,
Wei Xin² &
…
Wanyi Zhu²

189 Accesses
Explore all metrics

Abstract

Convolution is one of the most time-consuming operations in training deep neural networks. Existing convolutional algorithms, such as FFT, GEMM, Winograd and their varieties, have different performances in time and space. However, there is no best algorithm for all convolution configurations (parameters of convolutional operations). This paper addresses the problem of convolutional algorithm selection for given configurations and proposes a fast and exact selector ConvDarts. We propose an informed cache that is preset with common convolution configurations and their optimal algorithm indices. A lightweight machine learning model is also used to predict the optimal convolutional algorithm for cache missing configurations. Compared with the heuristics and profiling approaches exploited in cuDNN, it not only reduces the training time of classical deep learning networks but also reduces the required memory space. The selector ConvDarts proposed in this paper provides more possibilities for the training of network models in resource-constrained environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Automated CNN back-propagation pipeline generation for FPGA online training

Article Open access 23 July 2021

ACCDSE: A Design Space Exploration Framework for Convolutional Neural Network Accelerator

Multiple Algorithms Against Multiple Hardware Architectures: Data-Driven Exploration on Deep Convolution Neural Network

Data availability

The data that support the findings of this study are available on request from the corresponding author upon reasonable request.

References

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., et al.: Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)
Chen, T., Moreau, T., Jiang, Z., Zheng, L., Yan, E., Cowan, M., Shen, H., Wang, L., Hu, Y., Ceze, L., Guestrin, C., Krishnamurthy, A.: Tvm: An automated end-to-end optimizing compiler for deep learning. In: Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation. OSDI’18, pp. 579–594. USENIX Association, USA (2018)
Chen, M., Peng, H., Fu, J., Ling, H.: Autoformer: Searching transformers for visual recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 12270–12280 (2021)
Chetlur, S., Woolley, C., Vandermersch, P., Cohen, J., Tran, J., Catanzaro, B., Shelhamer, E.: cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014)
Collobert, R., Bengio, S., Mariéthoz, J.: Torch: a modular machine learning software library. Technical report, Idiap (2002)
Dukhan, M.: The indirect convolution algorithm. arXiv preprint arXiv:1907.02129 (2019)
Erickson, N., Mueller, J., Shirkov, A., Zhang, H., Larroy, P., Li, M., Smola, A.: Autogluon-tabular: Robust and accurate automl for structured data. arXiv preprint arXiv:2003.06505 (2020)
Goldsborough, P.: A tour of tensorflow. arXiv preprint arXiv:1610.01178 (2016)
Han, Y., Huang, G., Song, S., Yang, L., Wang, H., Wang, Y.: Dynamic Neural Networks: A Survey. arXiv (2021)
Han, Y., Huang, G., Song, S., Yang, L., Wang, H., Wang, Y.: Dynamic neural networks: A survey. arXiv preprint arXiv:2102.04906 (2021)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Jia, Z., Padon, O., Thomas, J., Warszawski, T., Zaharia, M., Aiken, A.: Taso: optimizing deep learning computation with automatic generation of graph substitutions, pp. 47–62 (2019)
Jia, Y.: Learning semantic image representations at a large scale. PhD thesis. University of California, Berkeley (2014)
Jordà, M., Valero-Lara, P., Peña, A.J.: cuconv: A cuda implementation of convolution for cnn inference. arXiv preprint arXiv:2103.16234 (2021)
Jorda, M., Valero-Lara, P., Pena, A.J.: Performance evaluation of cudnn convolution algorithms on nvidia volta gpus. IEEE Access 7, 70461–70473 (2019)
Article Google Scholar
Jordà, M., Valero-Lara, P., Peña, A.J.: Performance evaluation of cudnn convolution algorithms on nvidia volta gpus. IEEE Access 7, 70461–70473 (2019)
Article Google Scholar
Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., Liu, T.-Y.: Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems 30 (2017)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012)
Lavin, A., Gray, S.: Fast algorithms for convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4013–4021 (2016)
Li, X., Zhang, G., Huang, H.H., Wang, Z., Zheng, W.: Performance analysis of gpu-based convolutional neural networks. In: 2016 45th International Conference on Parallel Processing (ICPP), pp. 67–76 (2016)
Ma, Y., Yu, D., Wu, T., Wang, H.: Paddlepaddle: An open-source deep learning platform from industrial practice. Front. Data Domput. 1(1), 105–115 (2019)
Google Scholar
Mathieu, M., Henaff, M., LeCun, Y.: Fast training of convolutional networks through ffts. arXiv preprint arXiv:1312.5851 (2013)
Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., Gao, J.: Deep learning-based text classification: A comprehensive review. ACM Computing Surveys (CSUR) 54(3), 1–40 (2021)
Article Google Scholar
NVML API Reference Guide (2022). https://docs.nvidia.com/deploy/nvml-api/index.html
Oyama, Y., Ben-Nun, T., Hoefler, T., Matsuoka, S.: $\mu$-cudnn: Accelerating deep learning frameworks with micro-batching. arXiv preprint arXiv:1804.04806 (2018)
Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017)
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019)
Pourghassemi, B., Zhang, C., Lee, J.H., Chandramowlishwaran, A.: Brief announcement: On the limits of parallelizing convolutional neural networks on gpus. CoRR (2020)
PyTorch: What does torch.backends.cudnn.benchmark do? (2017). https://discuss.pytorch.org/t/what-does-torch-backends-cudnn-benchmark-do/5936. Accessed 22 Nov 2021
PyTorch: Cudnn.benchmark Slowing Execution Down (2018). https://discuss.pytorch.org/t/cudnn-benchmark-slowing-execution-down/31762
PyTorch: Set Torch.backends.cudnn.benchmark = True Consumes Huge Amount of Memory (2021). https://discuss.pytorch.org/t/set-torch-backends-cudnn-benchmark-true-consumes-huge-amount-of-memory/131010
Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., Amarasinghe, S.: Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation. PLDI ’13, pp. 519–530. Association for Computing Machinery, New York, NY, USA (2013)
Sun, F., Liu, J., Wu, J., Pei, C., Lin, X., Ou, W., Jiang, P.: Bert4rec: Sequential recommendation with bidirectional encoder representations from transformer. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 1441–1450 (2019)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Wang, H., Zhai, J., Gao, M., Ma, Z., Tang, S., Zheng, L., Li, Y., Rong, K., Chen, Y., Jia, Z.: Pet: Optimizing tensor programs with partially equivalent transformations and automated corrections. In: 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21), pp. 37–54 (2021)
Xu, R., Ma, S., Guo, Y.: Performance analysis of different convolution algorithms in gpu environment. In: 2018 IEEE International Conference on Networking, Architecture and Storage (NAS), pp. 1–10 (2018). IEEE

Download references

Author information

Authors and Affiliations

Beijing Institute of Technology, Beijing, China
Lu Bai, Weixing Ji, Qinyuan Li & Xilai Yao
Alibaba-inc China, Hangzhou, China
Wei Xin & Wanyi Zhu

Authors

Lu Bai
View author publications
You can also search for this author in PubMed Google Scholar
Weixing Ji
View author publications
You can also search for this author in PubMed Google Scholar
Qinyuan Li
View author publications
You can also search for this author in PubMed Google Scholar
Xilai Yao
View author publications
You can also search for this author in PubMed Google Scholar
Wei Xin
View author publications
You can also search for this author in PubMed Google Scholar
Wanyi Zhu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Weixing Ji.

Ethics declarations

Conflict of interest Statement

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Bai, L., Ji, W., Li, Q. et al. ConvDarts: a fast and exact convolutional algorithm selector for deep learning frameworks. CCF Trans. HPC 6, 32–44 (2024). https://doi.org/10.1007/s42514-023-00167-7

Download citation

Received: 07 July 2023
Accepted: 28 August 2023
Published: 20 September 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s42514-023-00167-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ConvDarts: a fast and exact convolutional algorithm selector for deep learning frameworks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Automated CNN back-propagation pipeline generation for FPGA online training

ACCDSE: A Design Space Exploration Framework for Convolutional Neural Network Accelerator

Multiple Algorithms Against Multiple Hardware Architectures: Data-Driven Exploration on Deep Convolution Neural Network

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest Statement

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

ConvDarts: a fast and exact convolutional algorithm selector for deep learning frameworks

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Automated CNN back-propagation pipeline generation for FPGA online training

ACCDSE: A Design Space Exploration Framework for Convolutional Neural Network Accelerator

Multiple Algorithms Against Multiple Hardware Architectures: Data-Driven Exploration on Deep Convolution Neural Network

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest Statement

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation