Skip to main content

Adaptation of Deep Neural Network Acoustic Models for Robust Automatic Speech Recognition

  • Chapter
  • First Online:
New Era for Robust Speech Recognition

Abstract

Deep neural networks (DNNs) have been successfully applied to many pattern classification problems, including acoustic modelling for automatic speech recognition (ASR). However, DNN adaptation remains a challenging task. Many approaches have been proposed in recent years to improve the adaptability of DNNs to achieve robust ASR. This chapter will review the recent adaptation methods for DNNs, broadly categorising them into constrained adaptation, feature normalisation, feature augmentation and structured DNN parameterisation. Specifically, we will describe various methods of estimating reliable representations for feature augmentation, focusing primarily on comparing i-vectors and other bottleneck features. Moreover, we will also present an adaptable DNN layer parameterisation scheme based on a linear interpolation structure. The interpolation weights can be reliably adjusted to adapt the DNN to different conditions. This generic scheme subsumes many existing DNN adaptation methods, including speaker-code adaptation, learning hidden unit contribution factorised hidden layer and cluster adaptive training for DNNs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    A speaker super-vector is a concatenation of the mean vectors of a Gaussian mixture model that represents the feature distribution for each speaker.

References

  1. Abdel-Hamid, O., Jiang, H.: Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 7942–7946 (2013)

    Google Scholar 

  2. Abrash, V., Franco, H., Sankar, A., Cohen, M.: Connectionist speaker normalization and adaptation. In: Eurospeech, pp. 2183–2186. ISCA (1995)

    Google Scholar 

  3. Chunyang, W., Gales, M.J.: Multi-basis adaptive neural network for rapid adaptation in speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 4315–4319. IEEE (2015)

    Google Scholar 

  4. Chunyang, W., Karanasou, P., Gales, M.J.: Combining i-vector representation and structured neural networks for rapid adaptation. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 5000–5004. IEEE (2016)

    Google Scholar 

  5. Cui, X., Goel, V., Kingsbury, B.: Data augmentation for deep neural network acoustic modeling. IEEE/ACM Trans. Audio Speech Lang. Process. 23(9), 1469–1477 (2015)

    Article  Google Scholar 

  6. Dahl, G., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)

    Article  Google Scholar 

  7. Dehak, N., Dehak, R., Kenny, P., Brümmer, N., Ouellet, P., Dumouchel, P.: Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification. In: Proceedings of Interspeech, vol. 9, pp. 1559–1562 (2009)

    Google Scholar 

  8. Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)

    Article  Google Scholar 

  9. Delcroix, M., Kinoshita, K., Hori, T., Nakatani, T.: Context adaptive deep neural networks for fast acoustic model adaptation. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 4535–4539. IEEE (2015)

    Google Scholar 

  10. Delcroix, M., Kinoshita, K., Chengzhu, Y., Atsunori, O.: Context adaptive deep neural networks for fast acoustic model adaptation in noise conditions. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 5270–5274. IEEE (2016)

    Google Scholar 

  11. Gales, M.J.F.: Maximum likelihood linear transformations for HMM-based speech recognition. Comput. Speech Lang. 12(2), 75–98 (1998)

    Article  Google Scholar 

  12. Gales, M.J.: Cluster adaptive training of hidden Markov models. IEEE Trans. Speech Audio Process. 8(4), 417–428 (2000)

    Article  Google Scholar 

  13. Gauvain, J.L., Lee, C.H.: Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans. Speech Audio Process. 2(2), 291–298 (1994)

    Article  Google Scholar 

  14. Gemello, R., Mana, F., Scanzio, S., Laface, P., Mori, R.D.: Adaptation of hybrid ANN/HMM models using linear hidden transformations and conservative training. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 1189–1192. IEEE (2006)

    Google Scholar 

  15. Giri, R., Seltzer, M.L., Droppo, J., Yu, D.: Improving speech recognition in reverberation using a room-aware deep neural network and multi-task learning. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 5014–5018 (2015)

    Google Scholar 

  16. Grézl, F., Fousek, P.: Optimizing bottle-neck features for LVCSR. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 4729–4732 (2008)

    Google Scholar 

  17. Grézl, F., Karafiat, M., Kontar, S., Cernocky, J.: Probabilistic and bottle-neck features for LVCSR of meetings. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, vol. 4, pp. 757–760 (2007)

    Google Scholar 

  18. Grézl, F., Karafiát, M., Janda, M.: Study of probabilistic and bottle-neck features in multilingual environment. In: Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 359–364 (2011)

    Google Scholar 

  19. Gupta, V., Kenny, P., Ouellet, P., Stafylakis, T.: I-vector-based speaker adaptation of deep neural networks for French broadcast audio transcription. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 6334–6338 (2014)

    Google Scholar 

  20. Hain, T., Burget, L., Dines, J., Garner, P.N., Grézl, F., Hannani, A.E., Huijbregts, M., Karafiat, M., Lincoln, M., Wan, V.: Transcribing meetings with the AMIDA systems. IEEE Trans. Audio Speech Lang. Process. 20(2), 486–498 (2012)

    Article  Google Scholar 

  21. Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 29, 82–97 (2012)

    Article  Google Scholar 

  22. Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580 (2012, arXiv preprint)

    Google Scholar 

  23. Hirsch, G.: Experimental framework for the performance evaluation of speech recognition front-ends on a large vocabulary task, version 2.0. ETSI STQ-Aurora DSR Working Group (2002)

    Google Scholar 

  24. Huang, H., Sim, K.C.: An investigation of augmenting speaker representations to improve speaker normalisation for DNN-based speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 4610–4613 (2015)

    Google Scholar 

  25. Ishii, T., Komiyama, H., Shinozaki, T., Horiuchi, Y., Kuroiwa, S.: Reverberant speech recognition based on denoising autoencoder. In: Proceedings of Interspeech, pp. 3512–3516 (2013)

    Google Scholar 

  26. Karanasou, P., Wang, Y., Gales, M.J.F., Woodland, P.C.: Adaptation of deep neural network acoustic models using factorised i-vectors. In: Proceedings of Interspeech, pp. 2180–2184 (2014)

    Google Scholar 

  27. Karanasou, P., Gales, M.J.F., Woodland, P.C.: I-vector estimation using informative priors for adaptation of deep neural networks. In: Interspeech, pp. 2872–2876 (2015)

    Google Scholar 

  28. Kenny, P., Ouellet, P., Dehak, N., Gupta, V., Dumouchel, P.: A study of interspeaker variability in speaker verification. IEEE Trans. Audio Speech Lang. Process. 16(5), 980–988 (2008)

    Article  Google Scholar 

  29. Knapp, C.H., Carter, G.C.: The generalized correlation method for estimation of time delay. IEEE Trans. Acoust. Speech Signal Process. 24(4), 320–327 (1976)

    Article  Google Scholar 

  30. Kumar, K., Singh, R., Raj, B., Stern, R.: Gammatone sub-band magnitude-domain dereverberation for ASR. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 4604–4607. IEEE (2011)

    Google Scholar 

  31. Kumar, K., Liu, C., Yao, K., Gong, Y.: Intermediate-layer DNN adaptation for offline and session-based iterative speaker adaptation. In: Proceedings of Interspeech. ISCA (2015)

    Google Scholar 

  32. Kundu, S., Mantena, G., Qian, Y., Tan, T., Delcroix, M., Sim, K.C.: Joint acoustic factor learning for robust deep neural network based automatic speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP (2016)

    Book  Google Scholar 

  33. Leggetter, C.J., Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput. Speech Lang. 9(2), 171–185 (1995)

    Article  Google Scholar 

  34. Li, B., Sim, K.: Comparison of discriminative input and output transformation for speaker adaptation in the hybrid NN/HMM systems. In: Proceedings of Interspeech, pp. 526–529. ISCA (2010)

    Google Scholar 

  35. Li, B., Sim, K.C.: Noise adaptive front-end normalization based on vector Taylor series for deep neural networks in robust speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 7408–7412. IEEE (2013)

    Google Scholar 

  36. Liao, H.: Speaker adaptation of context dependent deep neural networks. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 7947–7951. IEEE (2013)

    Google Scholar 

  37. Lippman, R.P., Martin, E.A., Paul, D.B.: Multi-style training for robust isolated-word speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, vol. 12, pp. 705–708. IEEE (1987)

    Google Scholar 

  38. Liu, S., Sim, K.C.: Temporally varying weight regression: a semi-parametric trajectory model for automatic speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(1), 151–160 (2014)

    Article  Google Scholar 

  39. Liu, Y., Karanasou, P., Hain, T.: An investigation into speaker informed DNN front-end for LVCSR. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 4300–4304 (2015)

    Google Scholar 

  40. Lu, X., Tsao, Y., Matsuda, S., Hori, C.: Speech enhancement based on deep denoising autoencoder. In: Proceedings of Interspeech, pp. 436–440 (2013)

    Google Scholar 

  41. Miao, Y., Metze, F.: Distance-aware DNNS for robust speech recognition. In: Proceedings of Interspeech (2015)

    Google Scholar 

  42. Miao, Y., Jiang, L., Zhang, H., Metze, F.: Improvements to speaker adaptive training of deep neural networks. In: IEEE Spoken Language Technology Workshop (SLT), 2014, pp. 165–170. IEEE (2014)

    Google Scholar 

  43. Miao, Y., Zhang, H., Metze, F.: Towards speaker adaptive training of deep neural network acoustic models. In: Proceedings of Interspeech, pp. 2189–2193 (2014)

    Google Scholar 

  44. Moreno, P.J., Raj, B., Stern, R.M.: A vector Taylor series approach for environment-independent speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, vol. 2, pp. 733–736. IEEE (1996)

    Google Scholar 

  45. Nagamine, T., Seltzer, M.L., Mesgarani, N.: Exploring how deep neural networks form phonemic categories. In: Proceedings of Interspeech (2015)

    Google Scholar 

  46. Naylor, P.A., Gaubitch, N.D.: Speech Dereverberation. Springer Science & Business Media, London (2010)

    Book  MATH  Google Scholar 

  47. Neto, J., Almeida, L., Hochberg, M., Martins, C., Nunes, L., Renals, S., Robinson, T.: Speaker-adaptation for hybrid HMM-ANN continuous speech recognition system. In: Proceedings of Interspeech. ISCA (1995)

    Google Scholar 

  48. Peddinti, V., Chen, G., Povey, D., Khudanpur, S.: Reverberation robust acoustic modeling using i-vectors with time delay neural networks. In: Proceedings of Interspeech (2015)

    Google Scholar 

  49. Qian, Y., Yin, M., You, Y., Yu, K.: Multi-task joint-learning of deep neural networks for robust speech recognition. In: Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Scottsdale, AZ, pp. 310–316 (2015)

    Google Scholar 

  50. Qian, Y., Tan, T., Yu, D.: An investigation into using parallel data for far-field speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, Shanghai, China, pp. 5725–5729 (2016)

    Google Scholar 

  51. Qian, Y., Tan, T., Yu, D., Zhang, Y.: Integrated adaptation with multi-factor joint-learning for far-field speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, Shanghai, pp. 5770–5774 (2016)

    Google Scholar 

  52. Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)

    Article  Google Scholar 

  53. Sainath, T.N., Kingsbury, B., Sindhwani, V., Arisoy, E., Ramabhadran, B.: Low-rank matrix factorization for deep neural network training with high-dimensional output targets. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6655–6659. IEEE (2013)

    Google Scholar 

  54. Samarakoon, L., Sim, K.C.: Factorized hidden layer adaptation for deep neural network based acoustic modeling. IEEE/ACM Trans. Audio Speech Lang. Process. 24(12), 2241–2250 (2016)

    Article  Google Scholar 

  55. Samarakoon, L., Sim, K.C.: On combining i-vectors and discriminative adaptation methods for unsupervised speaker normalisation in DNN acoustic models. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP (2016)

    Google Scholar 

  56. Samarakoon, L., Sim, K.C.: Subspace LHUC for fast adaptation of deep neural network acoustic models. In: Interspeech (2016)

    Book  Google Scholar 

  57. Saon, G., Soltau, H., Nahamoo, D., Picheny, M.: Speaker adaptation of neural network acoustic models using i-vectors. In: Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 55–59 (2013)

    Google Scholar 

  58. Seide, F., Li, G., Chen, X., Yu, D.: Feature engineering in context-dependent deep neural networks for conversational speech transcription. In: Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 24–29. IEEE (2011)

    Google Scholar 

  59. Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: Proceedings of Interspeech, pp. 437–440 (2011)

    Google Scholar 

  60. Seltzer, M.L., Yu, D., Wang, Y.: An investigation of deep neural networks for noise robust speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 7398–7402 (2013)

    Google Scholar 

  61. Senior, A., Moreno, I.L.: Improving DNN speaker independence with i-vector inputs. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 225–229 (2014)

    Google Scholar 

  62. Shaofei, X., Abdel-Hamid, O., Hui, J., Lirong, D.: Direct adaptation of hybrid DNN/HMM model for fast speaker adaptation in LVCSR based on speaker code. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 6339–6343. IEEE (2014)

    Google Scholar 

  63. Shaofei, X., Abdel-Hamid, O., Hui, J., Lirong, D., Qingfeng, L.: Fast adaptation of deep neural network based on discriminant codes for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(12), 1713–1725 (2014)

    Article  Google Scholar 

  64. Shilin, L., Sim, K.C.: Joint adaptation and adaptive training of TVWR for robust automatic speech recognition. In: Proceedings of Interspeech (2014)

    Google Scholar 

  65. Sim, K.C.: On constructing and analysing an interpretable brain model for the DNN based on hidden activity patterns. In: Proceedings of Automatic Speech Recognition and Understanding (ASRU), pp. 22–29 (2015)

    Google Scholar 

  66. Stadermann, J., Rigoll, G.: Two-stage speaker adaptation of hybrid tied-posterior acoustic models. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 977–980 (2005)

    Google Scholar 

  67. Swietojanski, P., Renals, S.: Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models. In: Proceedings of IEEE Spoken Language Technology Workshop (SLT), pp. 171–176. IEEE (2014)

    Google Scholar 

  68. Swietojanski, P., Renals, S.: SAT-LHUC: speaker adaptive training for learning hidden unit contributions. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP. IEEE (2016)

    Google Scholar 

  69. Swietojanski, P., Ghoshal, A., Renals, S.: Hybrid acoustic models for distant and multichannel large vocabulary speech recognition. In: Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 285–290 (2013)

    Google Scholar 

  70. Tan, S., Sim, K.C., Gales, M.: Improving the interpretability of deep neural networks with stimulated learning. In: Proceedings of Automatic Speech Recognition and Understanding (ASRU), pp. 617–623 (2015)

    Google Scholar 

  71. Tan, T., Qian, Y., Yin, M., Zhuang, Y., Yu, K.: Cluster adaptive training for deep neural network. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, Brisbane, pp. 4325–4329 (2015)

    Google Scholar 

  72. Tan, T., Qian, Y., Yu, K.: Cluster adaptive training for deep neural network based acoustic model. IEEE/ACM Trans. Audio Speech Lang. Process. 24(03), 459–468 (2016)

    Article  Google Scholar 

  73. Trmal, J., Zelinka, J., Müller, L.: Adaptation of a feedforward artificial neural network using a linear transform. In: Sojka, P., et al. (eds.) Text, Speech and Dialogue, pp. 423–430. Springer, Berlin/Heidelberg (2010)

    Chapter  Google Scholar 

  74. Variani, E., McDermott, E., Heigold, G.: A Gaussian mixture model layer jointly optimized with discriminative features within a deep neural network architecture. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 4270–4274. IEEE (2015)

    Google Scholar 

  75. Vesely, K., Karafiat, M., Grezl, F., Janda, M., Egorova, E.: The language-independent bottleneck features. In: Proceedings of IEEE Spoken Language Technology Workshop (SLT), pp. 336–341 (2012)

    Google Scholar 

  76. Vu, N.T., Metze, F., Schultz, T.: Multilingual bottle-neck features and its application for under-resourced languages. In: Proceedings of Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU), pp. 90–93 (2012)

    Google Scholar 

  77. Wu, C., Karanasou, P., Gales, M.J., Sim, K.C.: Stimulated deep neural network for speech recognition. In: Proceedings of Interspeech, pp. 400–404. ISCA (2016)

    Google Scholar 

  78. Xiao, Y., Zhang, Z., Cai, S., Pan, J., Yan, Y.: A initial attempt on task-specific adaptation for deep neural network-based large vocabulary continuous speech recognition. In: Proceedings of Interspeech. ISCA (2012)

    Google Scholar 

  79. Xu, Y., Du, J., Dai, L.R., Lee, C.H.: An experimental study on speech enhancement based on deep neural networks. IEEE Signal Process. Lett. 21(1), 65–68 (2014)

    Article  Google Scholar 

  80. Xu, Y., Du, J., Dai, L.R., Lee, C.H.: A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 23(1), 7–19 (2015)

    Article  Google Scholar 

  81. Xue, J., Li, J., Gong, Y.: Restructuring of deep neural network acoustic models with singular value decomposition. In: Proceedings of Interspeech, pp. 2365–2369. ISCA (2013)

    Google Scholar 

  82. Xue, J., Li, J., Yu, D., Seltzer, M., Gong, Y.: Singular value decomposition based low-footprint speaker adaptation and personalization for deep neural network. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP, pp. 6359–6363. IEEE (2014)

    Google Scholar 

  83. Xue, S., Abdel-Hamid, O., Jiang, H., Dai, L.: Direct adaptation of hybrid DNN/HMM model for fast speaker adaptation in LVCSR based on speaker code. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 6339–6343 (2014)

    Google Scholar 

  84. Xue, S., Abdel-Hamid, O., Jiang, H., Dai, L., Liu, Q.: Fast adaptation of deep neural network based on discriminant codes for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(12), 1713–1725 (2014)

    Article  Google Scholar 

  85. Xue, S., Jiang, H., Dai, L.: Speaker adaptation of hybrid NN/HMM model for speech recognition based on singular value decomposition. In: ISCSLP, pp. 1–5. IEEE (2014)

    Google Scholar 

  86. Yanmin Qian, T.T., Yu, D.: Neural network based multi-factor aware joint training for robust speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 24(12), 2231–2240 (2016)

    Article  Google Scholar 

  87. Yao, K., Yu, D., Seide, F., Su, H., Deng, L., Gong, Y.: Adaptation of context-dependent deep neural networks for automatic speech recognition. In: IEEE Spoken Language Technology Workshop (SLT), 2012, pp. 366–369. IEEE (2012)

    Google Scholar 

  88. Yoshioka, T., Sehr, A., Delcroix, M., Kinoshita, K., Maas, R., Nakatani, T., Kellermann, W.: Making machines understand us in reverberant rooms: robustness against reverberation for automatic speech recognition. IEEE Signal Process. Mag. 29(6), 114–126 (2012)

    Article  Google Scholar 

  89. Yu, D., Deng, L.: Automatic Speech Recognition: A Deep Learning Approach. Springer, London (2014)

    MATH  Google Scholar 

  90. Yu, D., Yao, K., Su, H., Li, G., Seide, F.: KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 7893–7897. IEEE (2013)

    Google Scholar 

  91. Zhang, Y., Yu, D., Seltzer, M.L., Droppo, J.: Speech recognition with prediction–adaptation–correction recurrent neural networks. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 5004–5008 (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Khe Chai Sim .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Sim, K.C., Qian, Y., Mantena, G., Samarakoon, L., Kundu, S., Tan, T. (2017). Adaptation of Deep Neural Network Acoustic Models for Robust Automatic Speech Recognition. In: Watanabe, S., Delcroix, M., Metze, F., Hershey, J. (eds) New Era for Robust Speech Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-64680-0_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-64680-0_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-64679-4

  • Online ISBN: 978-3-319-64680-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics