Skip to main content

Abstract

Deep networks have been widely used in many domains in recent years. However, the pre-training of deep networks is time consuming with greedy layer-wise algorithm, and the scalability of this algorithm is greatly restricted by its inherently sequential nature where only one hidden layer can be trained at one time. In order to speed up the training of deep networks, this paper mainly focuses on pre-training phase and proposes a pipelined pre-training algorithm because it uses distributed cluster, which can significantly reduce the pre-training time at no loss of recognition accuracy. It’s more efficient than greedy layer-wise pre-training algorithm by using the computational cluster. The contrastive experiments between greedy layer-wise and pipelined layer-wise algorithm are conducted finally, so we have carried out a comparative experiment on the greedy layer-wise algorithm and pipelined pre-training algorithms on the TIMIT corpus, result shows that the pipelined pre-training algorithm is an efficient algorithm to utilize distributed GPU cluster. We achieve a 2.84 and 5.9 speed-up with no loss of recognition accuracy when we use 4 slaves and 8 slaves. Parallelization efficiency is close to 0.73.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Dahl, G.E., et al.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)

    Article  Google Scholar 

  2. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (2012)

    Google Scholar 

  3. Sarikaya, R., Hinton, G.E., Deoras, A.: Application of deep belief networks for natural language understanding. IEEE/ACM Trans. Audio Speech Lang. Process. 22(4), 778–784 (2014)

    Article  Google Scholar 

  4. Bengio, Y.: Learning deep architectures for AI. Found. Trends® Mach. Learn. 2(1), 1–127 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  5. Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  6. Bengio, Y., et al.: Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems, vol. 19, p. 153 (2007)

    Google Scholar 

  7. Dean, J., et al.: Large scale distributed deep networks. In: Advances in Neural Information Processing Systems (2012)

    Google Scholar 

  8. Seide, F., et al.: On parallelizability of stochastic gradient descent for speech DNNs. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2014)

    Google Scholar 

  9. Collobert, R., Kavukcuoglu, K., Farabet, C.: Torch7: a matlab-like environment for machine learning. In: BigLearn, NIPS Workshop. No. EPFL-CONF-192376 (2011)

    Google Scholar 

  10. Bergstra, J., et al.: Theano: deep learning on gpus with python. In: NIPS 2011, BigLearning Workshop, Granada, Spain (2011)

    Google Scholar 

  11. Moritz, P., et al.: SparkNet: Training Deep Networks in Spark (2015). arXiv preprint: arXiv:1511.06051

  12. Zhang, S., et al.: Asynchronous stochastic gradient descent for DNN training. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2013)

    Google Scholar 

  13. Miao, Y., Zhang, H., Metze, F.: Distributed learning of multilingual DNN feature extractors using GPUs (2014)

    Google Scholar 

  14. Chen, X., et al.: Pipelined back-propagation for context-dependent deep neural networks. In: INTERSPEECH (2012)

    Google Scholar 

  15. Santara, A., et al.: Faster learning of deep stacked autoencoders on multi-core systems using synchronized layer-wise pre-training (2016). arXiv preprint: arXiv:1603.02836

Download references

Acknowledgments

Funding project: National Natural Science Foundation of China (61650205). Inner Mongolia Autonomous Region Natural Sciences Foundation project (2014MS0608). Inner Mongolian University of Technology key Fund (ZD201118).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Zhiqiang Ma , Tuya Li , Shuangtao Yang or Li Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Ma, Z., Li, T., Yang, S., Zhang, L. (2017). A Pipelined Pre-training Algorithm for DBNs. In: Sun, M., Wang, X., Chang, B., Xiong, D. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2017 2017. Lecture Notes in Computer Science(), vol 10565. Springer, Cham. https://doi.org/10.1007/978-3-319-69005-6_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-69005-6_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-69004-9

  • Online ISBN: 978-3-319-69005-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics