Abstract
With the advancement in Internet of Things (Iot), the speech recognition technology in mobile terminals’ applications has become a new trend. Consequently, how to accelerate the training and improve the accuracy in speech recognition has attracted the attention of academia and industry. Generally, Deep Belief Network (DBN) with Graphic Processing Unit (GPU) is applied in acoustic model of speech recognition, critical research challenges are yet to be solved. It’s hard for GPU to store the parameters of DBN at one time as well as GPU’s shared memory is not fully used. And parameters transmission have become a bottleneck in multi-GPUs. This paper presents a new method in which the weight matrix is divided into sub-weight matrices and established a reasonable memory model. To eliminate the inefficient idle-state during data transfers, a stream process model is proposed in which the data transfer and kernel execution are performed simultaneously. Further, apply the optimized single GPU implementation to multi-GPUs and is intend to solve the parameters transmission. Experimental results show the optimized GPU implementation without violating the size limitation of GPU’s memory.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
In 2006, a parallel computing platform and programming model for NVIDIA GPUs named CUDA [11] was introduced aiming to make full use of computing power of GPUs to achieve general purpose computation. CUDA also enables programmers without any knowledge about graphic APIs to write C/C++ code for high performance scientific computation by using NVIDIA GPUs. Therefore, it is widely used in speech recognition based on DBN model.
References
Gubbi, J., Buyya, R., Marusic, S., Palaniswami, M.: Internet of things (IoT): a vision, architectural elements, and future directions. Future Gen. Comput. Syst. 29(7), 1645–1660 (2013)
Su, D., Wu, X., Xu, L.: GMM-HMM acoustic model training by a two level procedure with Gaussian components determined by automatic model selection. In: IEEE International Conference on Acoustics Speech and Signal Processing, pp. 4890–4893, Texas, USA, March 2010
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: Conference of the International Speech Communication Association (INTERSPEECH), Florence, Italy, pp. 437–440, August 2011
Sainath, T.N., Kingsbury, B., Ramabhadran, B., Fousek, P.: Making deep belief networks effective for large vocabulary continuous speech recognition. In: Proceedings of Automatic Speech Recognition and Understanding, pp. 30–35, December 2011
Raina, R., Madhavan, A., Ng, A.Y.: Large-scale deep unsupervised learning using graphics processors. In: Proceedings of International Conference on Machine Learning (ICML), Montreal, Quebec, Canada, pp. 873–880, June 2009
Lopes, N., Ribeiro, B.: Towards adaptive learning with improved convergence of deep belief networks on graphics processing units. Pattern Recogn. 47(1), 114–127 (2014)
Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14(8), 1771–1800 (2002)
Swersky, K., Chen, B., Marlin, B., De Freitas, N.: A tutorial on stochastic approximation algorithms for training restricted Boltzmann machines and deep belief nets. In: Information Theory and Applications Workshop, pp. 1–10, January 2010
Deng, L., Togneri, R.: Deep dynamic models for learning hidden representations of speech features. In: Ogunfunmi, T., Togneri, R., Narasimha, M. (eds.) Speech & Audio Processing for Coding Enhancement & Recognition, pp. 153–195. Springer, Heidelberg (2015). https://doi.org/10.1007/978-1-4939-1456-2_6
NVIDIA. What is CUDA (2006)
Wang, Y., Tang, P., An, H., Liu, Z., Wang, K., Zhou, Y.: Optimization and analysis of parallel back propagation neural network on GPU Using CUDA. In: Arik, S., Huang, T., Lai, W.K., Liu, Q. (eds.) ICONIP 2015. LNCS, vol. 9491, pp. 156–163. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26555-1_18
Povey, D., et al.: The Kaldi speech recognition toolkit. IDIAP Publications (2012)
Xue, S., Yan, S., Dai, L.: Fast training algorithm for deep neural network using multiple GPUs. J. Tsinghua Univ. (Sci. Technol.) 53(6), 745–748 (2013)
Acknowledgement
The work described in this paper is supported by Guangdong Provincial Key Laboratory of Petrochemical Equipment Fault Diagnosis, Guangdong University of Petrochemical Technology (GDUPTKLAB201502) and Special Fund for Forest Scientific Research in the Public Welfare (201504307).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering
About this paper
Cite this paper
Jing, W., Jiang, T., Mukherjee, M., Shu, L., Kang, J. (2018). An Optimized Implementation of Speech Recognition Combining GPU with Deep Belief Network for IoT. In: Lin, YB., Deng, DJ., You, I., Lin, CC. (eds) IoT as a Service. IoTaaS 2017. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 246. Springer, Cham. https://doi.org/10.1007/978-3-030-00410-1_30
Download citation
DOI: https://doi.org/10.1007/978-3-030-00410-1_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-00409-5
Online ISBN: 978-3-030-00410-1
eBook Packages: Computer ScienceComputer Science (R0)