Abstract
Deep networks have been widely used in many domains in recent years. However, the pre-training of deep networks is time consuming with greedy layer-wise algorithm, and the scalability of this algorithm is greatly restricted by its inherently sequential nature where only one hidden layer can be trained at one time. In order to speed up the training of deep networks, this paper mainly focuses on pre-training phase and proposes a pipelined pre-training algorithm because it uses distributed cluster, which can significantly reduce the pre-training time at no loss of recognition accuracy. It’s more efficient than greedy layer-wise pre-training algorithm by using the computational cluster. The contrastive experiments between greedy layer-wise and pipelined layer-wise algorithm are conducted finally, so we have carried out a comparative experiment on the greedy layer-wise algorithm and pipelined pre-training algorithms on the TIMIT corpus, result shows that the pipelined pre-training algorithm is an efficient algorithm to utilize distributed GPU cluster. We achieve a 2.84 and 5.9 speed-up with no loss of recognition accuracy when we use 4 slaves and 8 slaves. Parallelization efficiency is close to 0.73.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Dahl, G.E., et al.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (2012)
Sarikaya, R., Hinton, G.E., Deoras, A.: Application of deep belief networks for natural language understanding. IEEE/ACM Trans. Audio Speech Lang. Process. 22(4), 778–784 (2014)
Bengio, Y.: Learning deep architectures for AI. Found. Trends® Mach. Learn. 2(1), 1–127 (2009)
Hinton, G.E., Osindero, S., Teh, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)
Bengio, Y., et al.: Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems, vol. 19, p. 153 (2007)
Dean, J., et al.: Large scale distributed deep networks. In: Advances in Neural Information Processing Systems (2012)
Seide, F., et al.: On parallelizability of stochastic gradient descent for speech DNNs. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2014)
Collobert, R., Kavukcuoglu, K., Farabet, C.: Torch7: a matlab-like environment for machine learning. In: BigLearn, NIPS Workshop. No. EPFL-CONF-192376 (2011)
Bergstra, J., et al.: Theano: deep learning on gpus with python. In: NIPS 2011, BigLearning Workshop, Granada, Spain (2011)
Moritz, P., et al.: SparkNet: Training Deep Networks in Spark (2015). arXiv preprint: arXiv:1511.06051
Zhang, S., et al.: Asynchronous stochastic gradient descent for DNN training. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2013)
Miao, Y., Zhang, H., Metze, F.: Distributed learning of multilingual DNN feature extractors using GPUs (2014)
Chen, X., et al.: Pipelined back-propagation for context-dependent deep neural networks. In: INTERSPEECH (2012)
Santara, A., et al.: Faster learning of deep stacked autoencoders on multi-core systems using synchronized layer-wise pre-training (2016). arXiv preprint: arXiv:1603.02836
Acknowledgments
Funding project: National Natural Science Foundation of China (61650205). Inner Mongolia Autonomous Region Natural Sciences Foundation project (2014MS0608). Inner Mongolian University of Technology key Fund (ZD201118).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Ma, Z., Li, T., Yang, S., Zhang, L. (2017). A Pipelined Pre-training Algorithm for DBNs. In: Sun, M., Wang, X., Chang, B., Xiong, D. (eds) Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data. NLP-NABD CCL 2017 2017. Lecture Notes in Computer Science(), vol 10565. Springer, Cham. https://doi.org/10.1007/978-3-319-69005-6_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-69005-6_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69004-9
Online ISBN: 978-3-319-69005-6
eBook Packages: Computer ScienceComputer Science (R0)