Accelerating Deep Learning with a Parallel Mechanism Using CPU + MIC

Fan, Sijiang; Fei, Jiawei; Shen, Li

doi:10.1007/s10766-017-0535-9

Accelerating Deep Learning with a Parallel Mechanism Using CPU + MIC

Published: 24 October 2017

Volume 46, pages 660–673, (2018)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Sijiang Fan¹,
Jiawei Fei¹ &
Li Shen¹

347 Accesses
5 Citations
Explore all metrics

Abstract

Deep neural networks (DNNs) is one of the most popular machine learning methods and is widely used in many modern applications. The training process of DNNs is a time-consuming process. Accelerating the training of DNNs has been the focus of many research works. In this paper, we speed up the training of DNNs applied for automatic speech recognition and the target architecture is heterogeneous (CPU + MIC). We apply asynchronous methods for I/O and communication operations and propose an adaptive load balancing method. Besides, we also employ a momentum idea to speed up the convergence of the gradient descent algorithm. Experimental results show that our optimized algorithm is able to acquire a 20-fold speedup on a CPU + MIC platform compared with the original sequential algorithm on a single-core CPU.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

oclCUB: an OpenCL parallel computing library for deep learning operators

Article 16 February 2024

Performance Issues of Parallel, Scalable Convolutional Neural Networks in Deep Learning

Research on CNN Parallel Computing and Learning Architecture Based on Real-Time Streaming Architecture

References

Chetlur, S., Woolley, C., Vandermersch, P., Cohen, J., Tran, J., Catanzaro, B., Shelhamer, E.: cuDNN: Efficient primitives for deep learning. CoRR abs/1410.0759 (2014). arXiv:1410.0759
Chigier, B.: Automatic speech recognition. US Patent 5,638,487, 10 June (1997). http://www.freepatentsonline.com/5638487.html
Cirean, D., Meier, U., Gambardella, L., Schmidhuber, J.: Deep, big, simple neural nets for handwritten digit recognition. Neural Comput. 22(12), 3207–3220 (2010)
Article Google Scholar
Genevieve Orr FC Nici Schraudolph: Cs-449: Neural Networks. https://www.willamette.edu/gorr/classes/cs449/momrate.html (1999)
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional Architecture for Fast Feature Embedding, pp. 675–678. doi:10.1145/2647868.2654889 (2014)
Jin, L., Wang, Z., Gu, R., Yuan, C., Huang, Y.: Training large scale deep neural networks on the intel xeon phi many-core coprocessor. In: IEEE International Parallel and Distributed Processing Symposium Workshops, pp. 1622–1630 (2014) doi:10.1109/ipdpsw.2014.194
Liu, J., Wang, H., Wang, D., Gao, Y., Li, Z.: Parallelizing Convolutional Neural Networks on Intel \(^{\textregistered }\) Many Integrated Core Architecture. Springer, Berlin (2015)
Book Google Scholar
Niranjan, M.: Support vector machines: a tutorial overview and critical appraisal. In: Applied Statistical Pattern Recognition (1999) doi:10.1049/ic:19990359
Pennycook, S.J., Hughes, C.J., Smelyanskiy, M., Jarvis, S.A.: Exploring simd for molecular dynamics, using intel xeon processors and intel xeon phi coprocessors. In: Parallel and Distributed Processing Symposium, International, pp. 1085–1097 (2013). doi:10.1109/ipdps.2013.44
Viebke, A., Pllana, S.: The potential of the intel (r) xeon phi for supervised deep learning. In: Computer Science, pp. 758–765 (2015). doi:10.1109/hpcc-css-icess.2015.45
Zhang, C., Zhang, Z.: Improving multiview face detection with multi-task deep convolutional neural networks. In: IEEE Winter Conference on Applications of Computer Vision, pp. 1036–1041 (2014). doi:10.1109/wacv.2014.6835990

Download references

Acknowledgements

This work is supported in part by the National Natural Science Foundation of China under Grant No. 61472431. The authors would like to thank Chengkun Wu for his advising, and the anonymous reviewers for their time, work, and valuable feedback.

Author information

Authors and Affiliations

School of Computer, National University of Defense Technology, Changsha, 410073, China
Sijiang Fan, Jiawei Fei & Li Shen

Authors

Sijiang Fan
View author publications
You can also search for this author in PubMed Google Scholar
Jiawei Fei
View author publications
You can also search for this author in PubMed Google Scholar
Li Shen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Shen.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fan, S., Fei, J. & Shen, L. Accelerating Deep Learning with a Parallel Mechanism Using CPU + MIC. Int J Parallel Prog 46, 660–673 (2018). https://doi.org/10.1007/s10766-017-0535-9

Download citation

Received: 27 August 2017
Accepted: 13 October 2017
Published: 24 October 2017
Issue Date: August 2018
DOI: https://doi.org/10.1007/s10766-017-0535-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerating Deep Learning with a Parallel Mechanism Using CPU + MIC

Abstract

Access this article

Similar content being viewed by others

oclCUB: an OpenCL parallel computing library for deep learning operators

Performance Issues of Parallel, Scalable Convolutional Neural Networks in Deep Learning

Research on CNN Parallel Computing and Learning Architecture Based on Real-Time Streaming Architecture

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Accelerating Deep Learning with a Parallel Mechanism Using CPU + MIC

Abstract

Access this article

Similar content being viewed by others

oclCUB: an OpenCL parallel computing library for deep learning operators

Performance Issues of Parallel, Scalable Convolutional Neural Networks in Deep Learning

Research on CNN Parallel Computing and Learning Architecture Based on Real-Time Streaming Architecture

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation