Skip to main content

An Optimized Implementation of Speech Recognition Combining GPU with Deep Belief Network for IoT

  • Conference paper
  • First Online:
IoT as a Service (IoTaaS 2017)

Abstract

With the advancement in Internet of Things (Iot), the speech recognition technology in mobile terminals’ applications has become a new trend. Consequently, how to accelerate the training and improve the accuracy in speech recognition has attracted the attention of academia and industry. Generally, Deep Belief Network (DBN) with Graphic Processing Unit (GPU) is applied in acoustic model of speech recognition, critical research challenges are yet to be solved. It’s hard for GPU to store the parameters of DBN at one time as well as GPU’s shared memory is not fully used. And parameters transmission have become a bottleneck in multi-GPUs. This paper presents a new method in which the weight matrix is divided into sub-weight matrices and established a reasonable memory model. To eliminate the inefficient idle-state during data transfers, a stream process model is proposed in which the data transfer and kernel execution are performed simultaneously. Further, apply the optimized single GPU implementation to multi-GPUs and is intend to solve the parameters transmission. Experimental results show the optimized GPU implementation without violating the size limitation of GPU’s memory.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In 2006, a parallel computing platform and programming model for NVIDIA GPUs named CUDA [11] was introduced aiming to make full use of computing power of GPUs to achieve general purpose computation. CUDA also enables programmers without any knowledge about graphic APIs to write C/C++ code for high performance scientific computation by using NVIDIA GPUs. Therefore, it is widely used in speech recognition based on DBN model.

References

  1. Gubbi, J., Buyya, R., Marusic, S., Palaniswami, M.: Internet of things (IoT): a vision, architectural elements, and future directions. Future Gen. Comput. Syst. 29(7), 1645–1660 (2013)

    Article  Google Scholar 

  2. Su, D., Wu, X., Xu, L.: GMM-HMM acoustic model training by a two level procedure with Gaussian components determined by automatic model selection. In: IEEE International Conference on Acoustics Speech and Signal Processing, pp. 4890–4893, Texas, USA, March 2010

    Google Scholar 

  3. Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)

    Article  MathSciNet  Google Scholar 

  4. Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: Conference of the International Speech Communication Association (INTERSPEECH), Florence, Italy, pp. 437–440, August 2011

    Google Scholar 

  5. Sainath, T.N., Kingsbury, B., Ramabhadran, B., Fousek, P.: Making deep belief networks effective for large vocabulary continuous speech recognition. In: Proceedings of Automatic Speech Recognition and Understanding, pp. 30–35, December 2011

    Google Scholar 

  6. Raina, R., Madhavan, A., Ng, A.Y.: Large-scale deep unsupervised learning using graphics processors. In: Proceedings of International Conference on Machine Learning (ICML), Montreal, Quebec, Canada, pp. 873–880, June 2009

    Google Scholar 

  7. Lopes, N., Ribeiro, B.: Towards adaptive learning with improved convergence of deep belief networks on graphics processing units. Pattern Recogn. 47(1), 114–127 (2014)

    Article  Google Scholar 

  8. Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14(8), 1771–1800 (2002)

    Article  Google Scholar 

  9. Swersky, K., Chen, B., Marlin, B., De Freitas, N.: A tutorial on stochastic approximation algorithms for training restricted Boltzmann machines and deep belief nets. In: Information Theory and Applications Workshop, pp. 1–10, January 2010

    Google Scholar 

  10. Deng, L., Togneri, R.: Deep dynamic models for learning hidden representations of speech features. In: Ogunfunmi, T., Togneri, R., Narasimha, M. (eds.) Speech & Audio Processing for Coding Enhancement & Recognition, pp. 153–195. Springer, Heidelberg (2015). https://doi.org/10.1007/978-1-4939-1456-2_6

    Chapter  Google Scholar 

  11. NVIDIA. What is CUDA (2006)

    Google Scholar 

  12. Wang, Y., Tang, P., An, H., Liu, Z., Wang, K., Zhou, Y.: Optimization and analysis of parallel back propagation neural network on GPU Using CUDA. In: Arik, S., Huang, T., Lai, W.K., Liu, Q. (eds.) ICONIP 2015. LNCS, vol. 9491, pp. 156–163. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-26555-1_18

    Chapter  Google Scholar 

  13. Povey, D., et al.: The Kaldi speech recognition toolkit. IDIAP Publications (2012)

    Google Scholar 

  14. Xue, S., Yan, S., Dai, L.: Fast training algorithm for deep neural network using multiple GPUs. J. Tsinghua Univ. (Sci. Technol.) 53(6), 745–748 (2013)

    Google Scholar 

Download references

Acknowledgement

The work described in this paper is supported by Guangdong Provincial Key Laboratory of Petrochemical Equipment Fault Diagnosis, Guangdong University of Petrochemical Technology (GDUPTKLAB201502) and Special Fund for Forest Scientific Research in the Public Welfare (201504307).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Weipeng Jing .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Jing, W., Jiang, T., Mukherjee, M., Shu, L., Kang, J. (2018). An Optimized Implementation of Speech Recognition Combining GPU with Deep Belief Network for IoT. In: Lin, YB., Deng, DJ., You, I., Lin, CC. (eds) IoT as a Service. IoTaaS 2017. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, vol 246. Springer, Cham. https://doi.org/10.1007/978-3-030-00410-1_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-00410-1_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-00409-5

  • Online ISBN: 978-3-030-00410-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics