Skip to main content
Log in

ConvDarts: a fast and exact convolutional algorithm selector for deep learning frameworks

  • Regular Paper
  • Published:
CCF Transactions on High Performance Computing Aims and scope Submit manuscript

Abstract

Convolution is one of the most time-consuming operations in training deep neural networks. Existing convolutional algorithms, such as FFT, GEMM, Winograd and their varieties, have different performances in time and space. However, there is no best algorithm for all convolution configurations (parameters of convolutional operations). This paper addresses the problem of convolutional algorithm selection for given configurations and proposes a fast and exact selector ConvDarts. We propose an informed cache that is preset with common convolution configurations and their optimal algorithm indices. A lightweight machine learning model is also used to predict the optimal convolutional algorithm for cache missing configurations. Compared with the heuristics and profiling approaches exploited in cuDNN, it not only reduces the training time of classical deep learning networks but also reduces the required memory space. The selector ConvDarts proposed in this paper provides more possibilities for the training of network models in resource-constrained environments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

The data that support the findings of this study are available on request from the corresponding author upon reasonable request.

References

  • Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., et al.: Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467 (2016)

  • Chen, T., Moreau, T., Jiang, Z., Zheng, L., Yan, E., Cowan, M., Shen, H., Wang, L., Hu, Y., Ceze, L., Guestrin, C., Krishnamurthy, A.: Tvm: An automated end-to-end optimizing compiler for deep learning. In: Proceedings of the 13th USENIX Conference on Operating Systems Design and Implementation. OSDI’18, pp. 579–594. USENIX Association, USA (2018)

  • Chen, M., Peng, H., Fu, J., Ling, H.: Autoformer: Searching transformers for visual recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 12270–12280 (2021)

  • Chetlur, S., Woolley, C., Vandermersch, P., Cohen, J., Tran, J., Catanzaro, B., Shelhamer, E.: cudnn: Efficient primitives for deep learning. arXiv preprint arXiv:1410.0759 (2014)

  • Collobert, R., Bengio, S., Mariéthoz, J.: Torch: a modular machine learning software library. Technical report, Idiap (2002)

  • Dukhan, M.: The indirect convolution algorithm. arXiv preprint arXiv:1907.02129 (2019)

  • Erickson, N., Mueller, J., Shirkov, A., Zhang, H., Larroy, P., Li, M., Smola, A.: Autogluon-tabular: Robust and accurate automl for structured data. arXiv preprint arXiv:2003.06505 (2020)

  • Goldsborough, P.: A tour of tensorflow. arXiv preprint arXiv:1610.01178 (2016)

  • Han, Y., Huang, G., Song, S., Yang, L., Wang, H., Wang, Y.: Dynamic Neural Networks: A Survey. arXiv (2021)

  • Han, Y., Huang, G., Song, S., Yang, L., Wang, H., Wang, Y.: Dynamic neural networks: A survey. arXiv preprint arXiv:2102.04906 (2021)

  • He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  • Jia, Z., Padon, O., Thomas, J., Warszawski, T., Zaharia, M., Aiken, A.: Taso: optimizing deep learning computation with automatic generation of graph substitutions, pp. 47–62 (2019)

  • Jia, Y.: Learning semantic image representations at a large scale. PhD thesis. University of California, Berkeley (2014)

  • Jordà, M., Valero-Lara, P., Peña, A.J.: cuconv: A cuda implementation of convolution for cnn inference. arXiv preprint arXiv:2103.16234 (2021)

  • Jorda, M., Valero-Lara, P., Pena, A.J.: Performance evaluation of cudnn convolution algorithms on nvidia volta gpus. IEEE Access 7, 70461–70473 (2019)

    Article  Google Scholar 

  • Jordà, M., Valero-Lara, P., Peña, A.J.: Performance evaluation of cudnn convolution algorithms on nvidia volta gpus. IEEE Access 7, 70461–70473 (2019)

    Article  Google Scholar 

  • Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., Liu, T.-Y.: Lightgbm: A highly efficient gradient boosting decision tree. Advances in neural information processing systems 30 (2017)

  • Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25 (2012)

  • Lavin, A., Gray, S.: Fast algorithms for convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4013–4021 (2016)

  • Li, X., Zhang, G., Huang, H.H., Wang, Z., Zheng, W.: Performance analysis of gpu-based convolutional neural networks. In: 2016 45th International Conference on Parallel Processing (ICPP), pp. 67–76 (2016)

  • Ma, Y., Yu, D., Wu, T., Wang, H.: Paddlepaddle: An open-source deep learning platform from industrial practice. Front. Data Domput. 1(1), 105–115 (2019)

    Google Scholar 

  • Mathieu, M., Henaff, M., LeCun, Y.: Fast training of convolutional networks through ffts. arXiv preprint arXiv:1312.5851 (2013)

  • Minaee, S., Kalchbrenner, N., Cambria, E., Nikzad, N., Chenaghlu, M., Gao, J.: Deep learning-based text classification: A comprehensive review. ACM Computing Surveys (CSUR) 54(3), 1–40 (2021)

    Article  Google Scholar 

  • NVML API Reference Guide (2022). https://docs.nvidia.com/deploy/nvml-api/index.html

  • Oyama, Y., Ben-Nun, T., Hoefler, T., Matsuoka, S.: \(\mu\)-cudnn: Accelerating deep learning frameworks with micro-batching. arXiv preprint arXiv:1804.04806 (2018)

  • Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., DeVito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in pytorch (2017)

  • Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al.: Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems 32 (2019)

  • Pourghassemi, B., Zhang, C., Lee, J.H., Chandramowlishwaran, A.: Brief announcement: On the limits of parallelizing convolutional neural networks on gpus. CoRR (2020)

  • PyTorch: What does torch.backends.cudnn.benchmark do? (2017). https://discuss.pytorch.org/t/what-does-torch-backends-cudnn-benchmark-do/5936. Accessed 22 Nov 2021

  • PyTorch: Cudnn.benchmark Slowing Execution Down (2018). https://discuss.pytorch.org/t/cudnn-benchmark-slowing-execution-down/31762

  • PyTorch: Set Torch.backends.cudnn.benchmark = True Consumes Huge Amount of Memory (2021). https://discuss.pytorch.org/t/set-torch-backends-cudnn-benchmark-true-consumes-huge-amount-of-memory/131010

  • Ragan-Kelley, J., Barnes, C., Adams, A., Paris, S., Durand, F., Amarasinghe, S.: Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines. In: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation. PLDI ’13, pp. 519–530. Association for Computing Machinery, New York, NY, USA (2013)

  • Sun, F., Liu, J., Wu, J., Pei, C., Lin, X., Ou, W., Jiang, P.: Bert4rec: Sequential recommendation with bidirectional encoder representations from transformer. In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, pp. 1441–1450 (2019)

  • Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)

  • Wang, H., Zhai, J., Gao, M., Ma, Z., Tang, S., Zheng, L., Li, Y., Rong, K., Chen, Y., Jia, Z.: Pet: Optimizing tensor programs with partially equivalent transformations and automated corrections. In: 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI 21), pp. 37–54 (2021)

  • Xu, R., Ma, S., Guo, Y.: Performance analysis of different convolution algorithms in gpu environment. In: 2018 IEEE International Conference on Networking, Architecture and Storage (NAS), pp. 1–10 (2018). IEEE

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weixing Ji.

Ethics declarations

Conflict of interest Statement

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Bai, L., Ji, W., Li, Q. et al. ConvDarts: a fast and exact convolutional algorithm selector for deep learning frameworks. CCF Trans. HPC 6, 32–44 (2024). https://doi.org/10.1007/s42514-023-00167-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42514-023-00167-7

Keywords

Navigation