Skip to main content
Log in

Accelerating Deep Learning with a Parallel Mechanism Using CPU + MIC

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Deep neural networks (DNNs) is one of the most popular machine learning methods and is widely used in many modern applications. The training process of DNNs is a time-consuming process. Accelerating the training of DNNs has been the focus of many research works. In this paper, we speed up the training of DNNs applied for automatic speech recognition and the target architecture is heterogeneous (CPU + MIC). We apply asynchronous methods for I/O and communication operations and propose an adaptive load balancing method. Besides, we also employ a momentum idea to speed up the convergence of the gradient descent algorithm. Experimental results show that our optimized algorithm is able to acquire a 20-fold speedup on a CPU + MIC platform compared with the original sequential algorithm on a single-core CPU.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Chetlur, S., Woolley, C., Vandermersch, P., Cohen, J., Tran, J., Catanzaro, B., Shelhamer, E.: cuDNN: Efficient primitives for deep learning. CoRR abs/1410.0759 (2014). arXiv:1410.0759

  2. Chigier, B.: Automatic speech recognition. US Patent 5,638,487, 10 June (1997). http://www.freepatentsonline.com/5638487.html

  3. Cirean, D., Meier, U., Gambardella, L., Schmidhuber, J.: Deep, big, simple neural nets for handwritten digit recognition. Neural Comput. 22(12), 3207–3220 (2010)

    Article  Google Scholar 

  4. Genevieve Orr FC Nici Schraudolph: Cs-449: Neural Networks. https://www.willamette.edu/gorr/classes/cs449/momrate.html (1999)

  5. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: Convolutional Architecture for Fast Feature Embedding, pp. 675–678. doi:10.1145/2647868.2654889 (2014)

  6. Jin, L., Wang, Z., Gu, R., Yuan, C., Huang, Y.: Training large scale deep neural networks on the intel xeon phi many-core coprocessor. In: IEEE International Parallel and Distributed Processing Symposium Workshops, pp. 1622–1630 (2014) doi:10.1109/ipdpsw.2014.194

  7. Liu, J., Wang, H., Wang, D., Gao, Y., Li, Z.: Parallelizing Convolutional Neural Networks on Intel \(^{\textregistered }\) Many Integrated Core Architecture. Springer, Berlin (2015)

    Book  Google Scholar 

  8. Niranjan, M.: Support vector machines: a tutorial overview and critical appraisal. In: Applied Statistical Pattern Recognition (1999) doi:10.1049/ic:19990359

  9. Pennycook, S.J., Hughes, C.J., Smelyanskiy, M., Jarvis, S.A.: Exploring simd for molecular dynamics, using intel xeon processors and intel xeon phi coprocessors. In: Parallel and Distributed Processing Symposium, International, pp. 1085–1097 (2013). doi:10.1109/ipdps.2013.44

  10. Viebke, A., Pllana, S.: The potential of the intel (r) xeon phi for supervised deep learning. In: Computer Science, pp. 758–765 (2015). doi:10.1109/hpcc-css-icess.2015.45

  11. Zhang, C., Zhang, Z.: Improving multiview face detection with multi-task deep convolutional neural networks. In: IEEE Winter Conference on Applications of Computer Vision, pp. 1036–1041 (2014). doi:10.1109/wacv.2014.6835990

Download references

Acknowledgements

This work is supported in part by the National Natural Science Foundation of China under Grant No. 61472431. The authors would like to thank Chengkun Wu for his advising, and the anonymous reviewers for their time, work, and valuable feedback.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Shen.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fan, S., Fei, J. & Shen, L. Accelerating Deep Learning with a Parallel Mechanism Using CPU + MIC. Int J Parallel Prog 46, 660–673 (2018). https://doi.org/10.1007/s10766-017-0535-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-017-0535-9

Keywords

Navigation