Accelerating CNNs Using Optimized Scheduling Strategy

Xu, Rui; Ma, Sheng; Li, Wenwu; Guo, Yang

doi:10.1007/978-3-030-05057-3_15

Rui Xu¹⁵,
Sheng Ma¹⁶,
Wenwu Li¹⁵ &
…
Yang Guo¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 11336))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

1626 Accesses
2 Citations

Abstract

Convolutional neural networks (CNNs) have a wide range of applications in image and video recognition, recommender systems and natural language processing. But CNNs are computationally intensive, and its computational cost is hard to accept. In order to speed up the calculations, people focus on optimizing convolution that account for most of the proportion of CNNs’ operation. So, many algorithms have been proposed to accelerate the operation of convolution layers. However, each algorithm has its advantages and disadvantages, and there is no one algorithm that can handle all situations. In this paper, we examine the performance of various algorithms in GPU environment. By building a customized CNN model, we have fully explored the impact of the neural structure on the performance of algorithms, including inference/training speed, and memory consumption. In addition to the algorithms, we also focus on how their implementations in GPU environment affect their performance. Finally, we summarize the characteristics of each algorithm., and design a strategy to assigns the appropriate implementation for different convolutional layers in CNNs. With our strategy, we can make AlexNet run 1.2x to 2.8x faster than other strategies in GPU environment. This work has very important meaning for understanding these algorithms and may provide insights for further optimizations of the architecture of GPUs and accelerators.

This work is supported by the National Natural Science Foundation of China (No. 61672526) and Research Project of NUDT (ZK17-03-06).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Lecun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
Article Google Scholar
Sze, V., et al.: Efficient processing of deep neural networks: a tutorial and survey. Proc. IEEE 105(12), 2295–2329 (2017)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(2), 84 (2012)
Article Google Scholar
Simard, P., Lecun, Y., Denker, J.S.: Efficient pattern recognition using a new transformation distance. In: Advances in Neural Information Processing Systems (NIPS 1992), pp. 50–58 (1992)
Google Scholar
Chetlur, S., et al.: cuDNN: efficient primitives for deep learning. Computer Science (2014)
Google Scholar
Mathieu, M., Henaff, M., Lecun, Y.: Fast training of convolutional networks through FFTs. Eprint Arxiv (2013)
Google Scholar
Lavin, A., Gray, S.: Fast algorithms for convolutional neural networks, pp. 4013–4021. Computer Science (2015)
Google Scholar
Cheng, J., Grossman, M., Mckercher, T.: Professional CUDA C Programming. Wiley, New York (2014)
Google Scholar
Jia, Y., et al.: Caffe: Convolutional Architecture for Fast Feature Embedding, pp. 675–678 (2014)
Google Scholar
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53
Chapter Google Scholar
Krizhevsky, A.: cuda-convnet2 (2014). https://github.com/akrizhevsky/cuda-convnet2/
NVIDIA: CUDNN User Guide (2017). https://developer.nvidia.com
Chen, T., et al.: MXNet: a flexible and efficient machine learning library for heterogeneous distributed systems. Statistics (2015)
Google Scholar
Collobert, R., Kavukcuoglu, K., Farabet, C.: Torch7: a Matlab-like environment for machine learning. In: BigLearn, NIPS Workshop (2011)
Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: Computer Vision and Pattern Recognition, pp. 1–9. IEEE (2015)
Google Scholar
Lecun, Y., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (2014)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Computer Science (2014)
Google Scholar
He, K., et al.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition, pp. 770–778. IEEE (2016)
Google Scholar
Vasilache, N., Johnson, J., Mathieu, M., et al.: Fast convolutional nets with FBFFT: a GPU performance evaluation (2014)
Google Scholar
Li, X., et al.: Performance analysis of GPU-based convolutional neural networks. In: International Conference on Parallel Processing, pp. 67–76. IEEE (2016)
Google Scholar
Kim, H., et al.: Performance analysis of CNN frameworks for GPUs. In: IEEE International Symposium on PERFORMANCE Analysis of Systems and Software, pp. 55–64. IEEE (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

College of Computer, National University of Defense Technology, Changsha, 410073, Hunan, China
Rui Xu, Wenwu Li & Yang Guo
The State Key Laboratory of High Performance Computing, National University of Defense Technology, Changsha, 410073, Hunan, China
Sheng Ma

Authors

Rui Xu
View author publications
You can also search for this author in PubMed Google Scholar
Sheng Ma
View author publications
You can also search for this author in PubMed Google Scholar
Wenwu Li
View author publications
You can also search for this author in PubMed Google Scholar
Yang Guo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sheng Ma .

Editor information

Editors and Affiliations

Rutgers University–Newark, Newark, NJ, USA
Jaideep Vaidya
Guangzhou University, Guangzhou, China
Jin Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, R., Ma, S., Li, W., Guo, Y. (2018). Accelerating CNNs Using Optimized Scheduling Strategy. In: Vaidya, J., Li, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2018. Lecture Notes in Computer Science(), vol 11336. Springer, Cham. https://doi.org/10.1007/978-3-030-05057-3_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-05057-3_15
Published: 07 December 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05056-6
Online ISBN: 978-3-030-05057-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics