Abstract
Nowadays, as the tremendous requirements of computation-efficient neural networks to deploy deep learning models on inexpensive and broadly-used devices, many lightweight networks have been presented, such as MobileNet series, ShuffleNet, etc. The computation-efficient models are specifically designed for very limited computational budget, e.g., 10–150 MFLOPs, and can run efficiently on ARM-based devices. These models have smaller CMR than the large networks, such as VGG, ResNet, Inception, etc.
However, it is quite efficient for inference on ARM, how about inference or training on GPU? Unfortunately, compact models usually cannot make full utilization of GPU, though it is fast for its small size. In this paper, we will present a series of extensive experiments on the training of compact models, including training on single host, with GPU and CPU, and distributed environment. Then we give some analysis and suggestions on the training.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
It is Chipset Qualcomm MSM8996 Snapdragon 821, CPU Quad-core (4\(\,\times \,\)2.15/2.16 GHz Kryo).
- 2.
Unlike the original papers, the computational complexity and the memory accesses also include the pooling, lateral and activation layers.
References
Szegedy, C., Liu, W., Jia, Y., et al.: Going deeper with convolutions, pp. 1–9 (2014)
Szegedy, C., Ioffe, S., Vanhoucke, V., et al.: Inception-v4, inception-resnet and the impact of residual connections on learning. AAAI, vol. 4, p. 12 (2017)
Chollet, F.: Xception: deep learning with depth wise separable convolutions. arXiv preprint (2016)
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Huang, G., Liu, Z., Weinberger, K.Q., et al.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, no. 2, p. 3 (2017)
Huang, J., Rathod, V., Sun, C., et al.: Speed/accuracy trade-offs for modern convolutional object detectors. In: IEEE CVPR (2017)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Computer Science (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, pp. 1097–1105. Curran Associates Inc. (2012)
Girshick, R., Donahue, J., Darrell, T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587. IEEE Computer Society (2014)
Ren, S., He, K., Girshick, R.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Computer Vision and Pattern Recognition, pp. 3431–3440. IEEE (2015)
Howard, A.G., Zhu, M., Chen, B., et al.: MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Sandler, M., Howard, A., Zhu, M., et al.: Inverted residuals and linear bottlenecks: mobile networks for classification, detection and segmentation. arXiv preprint arXiv:1801.04381 (2018)
Zhang, X., Zhou, X., Lin, M., et al.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. arXiv preprint arXiv:1707.01083 (2017)
Qin, Z., Zhang, Z., Chen, X., et al.: FD-MobileNet: improved MobileNet with a fast down sampling strategy. arXiv preprint arXiv:1802.03750 (2018)
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Goyal, P., Dollár, P., Girshick, R., et al.: Accurate, large minibatch SGD: training ImageNet in 1 hour. arXiv preprint arXiv:1706.02677 (2017)
You, Y., Zhang, Z., Hsieh, C.J., et al.: 100-epoch ImageNet training with AlexNet in 24 minutes. ArXiv e-prints (2017)
Gysel, P., Motamedi, M., Ghiasi, S.: Hardware-oriented approximation of convolutional neural networks (2016)
Mathew, M., Desappan, K., Swami, P.K., et al.: Sparse, quantized, full frame CNN for low power embedded devices. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 328–336. IEEE Computer Society (2017)
Li, M.: Scaling distributed machine learning with the parameter server, p. 1 (2014)
Chen, T., Li, M., Li, Y., et al.: MXNet: a flexible and efficient machine learning library for heterogeneous distributed systems. Statistics (2015)
InfiniBand Trade Association: InfiniBand Architecture Specification: Release 1.0 (2000)
Padovano, M.: System and method for accessing a storage area network as network attached storage: WO, US6606690[P] (2003)
Kågström, B., Ling, P., van Loan, C.: GEMM-based level 3 BLAS: high-performance model implementations and performance evaluation benchmark. ACM Trans. Math. Softw. (TOMS) 24(3), 268–302 (1998)
Williams, S., Patterson, D., Oliker, L., et al.: The roofline model: a pedagogical tool for auto-tuning kernels on multicore architectures. In: Hot Chips, vol. 20, pp. 24–26 (2008)
Sifre, L.: Rigid-motion scattering for image classification. Ph.D. thesis (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Yin, L., Chen, X., Qin, Z., Zhang, Z., Feng, J., Li, D. (2018). An Experimental Perspective for Computation-Efficient Neural Networks Training. In: Li, C., Wu, J. (eds) Advanced Computer Architecture. ACA 2018. Communications in Computer and Information Science, vol 908. Springer, Singapore. https://doi.org/10.1007/978-981-13-2423-9_13
Download citation
DOI: https://doi.org/10.1007/978-981-13-2423-9_13
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2422-2
Online ISBN: 978-981-13-2423-9
eBook Packages: Computer ScienceComputer Science (R0)