Abstract
Convolutional neural networks (CNNs) have a wide range of applications in image and video recognition, recommender systems and natural language processing. But CNNs are computationally intensive, and its computational cost is hard to accept. In order to speed up the calculations, people focus on optimizing convolution that account for most of the proportion of CNNs’ operation. So, many algorithms have been proposed to accelerate the operation of convolution layers. However, each algorithm has its advantages and disadvantages, and there is no one algorithm that can handle all situations. In this paper, we examine the performance of various algorithms in GPU environment. By building a customized CNN model, we have fully explored the impact of the neural structure on the performance of algorithms, including inference/training speed, and memory consumption. In addition to the algorithms, we also focus on how their implementations in GPU environment affect their performance. Finally, we summarize the characteristics of each algorithm., and design a strategy to assigns the appropriate implementation for different convolutional layers in CNNs. With our strategy, we can make AlexNet run 1.2x to 2.8x faster than other strategies in GPU environment. This work has very important meaning for understanding these algorithms and may provide insights for further optimizations of the architecture of GPUs and accelerators.
This work is supported by the National Natural Science Foundation of China (No. 61672526) and Research Project of NUDT (ZK17-03-06).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Lecun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
Sze, V., et al.: Efficient processing of deep neural networks: a tutorial and survey. Proc. IEEE 105(12), 2295–2329 (2017)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Commun. ACM 60(2), 84 (2012)
Simard, P., Lecun, Y., Denker, J.S.: Efficient pattern recognition using a new transformation distance. In: Advances in Neural Information Processing Systems (NIPS 1992), pp. 50–58 (1992)
Chetlur, S., et al.: cuDNN: efficient primitives for deep learning. Computer Science (2014)
Mathieu, M., Henaff, M., Lecun, Y.: Fast training of convolutional networks through FFTs. Eprint Arxiv (2013)
Lavin, A., Gray, S.: Fast algorithms for convolutional neural networks, pp. 4013–4021. Computer Science (2015)
Cheng, J., Grossman, M., Mckercher, T.: Professional CUDA C Programming. Wiley, New York (2014)
Jia, Y., et al.: Caffe: Convolutional Architecture for Fast Feature Embedding, pp. 675–678 (2014)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10590-1_53
Krizhevsky, A.: cuda-convnet2 (2014). https://github.com/akrizhevsky/cuda-convnet2/
NVIDIA: CUDNN User Guide (2017). https://developer.nvidia.com
Chen, T., et al.: MXNet: a flexible and efficient machine learning library for heterogeneous distributed systems. Statistics (2015)
Collobert, R., Kavukcuoglu, K., Farabet, C.: Torch7: a Matlab-like environment for machine learning. In: BigLearn, NIPS Workshop (2011)
Szegedy, C., et al.: Going deeper with convolutions. In: Computer Vision and Pattern Recognition, pp. 1–9. IEEE (2015)
Lecun, Y., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (2014)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Computer Science (2014)
He, K., et al.: Deep residual learning for image recognition. In: Computer Vision and Pattern Recognition, pp. 770–778. IEEE (2016)
Vasilache, N., Johnson, J., Mathieu, M., et al.: Fast convolutional nets with FBFFT: a GPU performance evaluation (2014)
Li, X., et al.: Performance analysis of GPU-based convolutional neural networks. In: International Conference on Parallel Processing, pp. 67–76. IEEE (2016)
Kim, H., et al.: Performance analysis of CNN frameworks for GPUs. In: IEEE International Symposium on PERFORMANCE Analysis of Systems and Software, pp. 55–64. IEEE (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Xu, R., Ma, S., Li, W., Guo, Y. (2018). Accelerating CNNs Using Optimized Scheduling Strategy. In: Vaidya, J., Li, J. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2018. Lecture Notes in Computer Science(), vol 11336. Springer, Cham. https://doi.org/10.1007/978-3-030-05057-3_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-05057-3_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-05056-6
Online ISBN: 978-3-030-05057-3
eBook Packages: Computer ScienceComputer Science (R0)