Adaptation of Deep Neural Network Acoustic Models for Robust Automatic Speech Recognition

Sim, Khe Chai; Qian, Yanmin; Mantena, Gautam; Samarakoon, Lahiru; Kundu, Souvik; Tan, Tian

doi:10.1007/978-3-319-64680-0_9

Khe Chai Sim⁵,
Yanmin Qian⁶,
Gautam Mantena⁵,
Lahiru Samarakoon⁵,
Souvik Kundu⁵ &
…
Tian Tan⁶

2394 Accesses
6 Citations

Abstract

Deep neural networks (DNNs) have been successfully applied to many pattern classification problems, including acoustic modelling for automatic speech recognition (ASR). However, DNN adaptation remains a challenging task. Many approaches have been proposed in recent years to improve the adaptability of DNNs to achieve robust ASR. This chapter will review the recent adaptation methods for DNNs, broadly categorising them into constrained adaptation, feature normalisation, feature augmentation and structured DNN parameterisation. Specifically, we will describe various methods of estimating reliable representations for feature augmentation, focusing primarily on comparing i-vectors and other bottleneck features. Moreover, we will also present an adaptable DNN layer parameterisation scheme based on a linear interpolation structure. The interpolation weights can be reliably adjusted to adapt the DNN to different conditions. This generic scheme subsumes many existing DNN adaptation methods, including speaker-code adaptation, learning hidden unit contribution factorised hidden layer and cluster adaptive training for DNNs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
A speaker super-vector is a concatenation of the mean vectors of a Gaussian mixture model that represents the feature distribution for each speaker.

References

Abdel-Hamid, O., Jiang, H.: Fast speaker adaptation of hybrid NN/HMM model for speech recognition based on discriminative learning of speaker code. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 7942–7946 (2013)
Google Scholar
Abrash, V., Franco, H., Sankar, A., Cohen, M.: Connectionist speaker normalization and adaptation. In: Eurospeech, pp. 2183–2186. ISCA (1995)
Google Scholar
Chunyang, W., Gales, M.J.: Multi-basis adaptive neural network for rapid adaptation in speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 4315–4319. IEEE (2015)
Google Scholar
Chunyang, W., Karanasou, P., Gales, M.J.: Combining i-vector representation and structured neural networks for rapid adaptation. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 5000–5004. IEEE (2016)
Google Scholar
Cui, X., Goel, V., Kingsbury, B.: Data augmentation for deep neural network acoustic modeling. IEEE/ACM Trans. Audio Speech Lang. Process. 23(9), 1469–1477 (2015)
Article Google Scholar
Dahl, G., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)
Article Google Scholar
Dehak, N., Dehak, R., Kenny, P., Brümmer, N., Ouellet, P., Dumouchel, P.: Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification. In: Proceedings of Interspeech, vol. 9, pp. 1559–1562 (2009)
Google Scholar
Dehak, N., Kenny, P., Dehak, R., Dumouchel, P., Ouellet, P.: Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2011)
Article Google Scholar
Delcroix, M., Kinoshita, K., Hori, T., Nakatani, T.: Context adaptive deep neural networks for fast acoustic model adaptation. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 4535–4539. IEEE (2015)
Google Scholar
Delcroix, M., Kinoshita, K., Chengzhu, Y., Atsunori, O.: Context adaptive deep neural networks for fast acoustic model adaptation in noise conditions. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 5270–5274. IEEE (2016)
Google Scholar
Gales, M.J.F.: Maximum likelihood linear transformations for HMM-based speech recognition. Comput. Speech Lang. 12(2), 75–98 (1998)
Article Google Scholar
Gales, M.J.: Cluster adaptive training of hidden Markov models. IEEE Trans. Speech Audio Process. 8(4), 417–428 (2000)
Article Google Scholar
Gauvain, J.L., Lee, C.H.: Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains. IEEE Trans. Speech Audio Process. 2(2), 291–298 (1994)
Article Google Scholar
Gemello, R., Mana, F., Scanzio, S., Laface, P., Mori, R.D.: Adaptation of hybrid ANN/HMM models using linear hidden transformations and conservative training. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 1189–1192. IEEE (2006)
Google Scholar
Giri, R., Seltzer, M.L., Droppo, J., Yu, D.: Improving speech recognition in reverberation using a room-aware deep neural network and multi-task learning. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 5014–5018 (2015)
Google Scholar
Grézl, F., Fousek, P.: Optimizing bottle-neck features for LVCSR. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 4729–4732 (2008)
Google Scholar
Grézl, F., Karafiat, M., Kontar, S., Cernocky, J.: Probabilistic and bottle-neck features for LVCSR of meetings. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, vol. 4, pp. 757–760 (2007)
Google Scholar
Grézl, F., Karafiát, M., Janda, M.: Study of probabilistic and bottle-neck features in multilingual environment. In: Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 359–364 (2011)
Google Scholar
Gupta, V., Kenny, P., Ouellet, P., Stafylakis, T.: I-vector-based speaker adaptation of deep neural networks for French broadcast audio transcription. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 6334–6338 (2014)
Google Scholar
Hain, T., Burget, L., Dines, J., Garner, P.N., Grézl, F., Hannani, A.E., Huijbregts, M., Karafiat, M., Lincoln, M., Wan, V.: Transcribing meetings with the AMIDA systems. IEEE Trans. Audio Speech Lang. Process. 20(2), 486–498 (2012)
Article Google Scholar
Hinton, G., Deng, L., Yu, D., Dahl, G., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., Kingsbury, B.: Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Mag. 29, 82–97 (2012)
Article Google Scholar
Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580 (2012, arXiv preprint)
Google Scholar
Hirsch, G.: Experimental framework for the performance evaluation of speech recognition front-ends on a large vocabulary task, version 2.0. ETSI STQ-Aurora DSR Working Group (2002)
Google Scholar
Huang, H., Sim, K.C.: An investigation of augmenting speaker representations to improve speaker normalisation for DNN-based speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 4610–4613 (2015)
Google Scholar
Ishii, T., Komiyama, H., Shinozaki, T., Horiuchi, Y., Kuroiwa, S.: Reverberant speech recognition based on denoising autoencoder. In: Proceedings of Interspeech, pp. 3512–3516 (2013)
Google Scholar
Karanasou, P., Wang, Y., Gales, M.J.F., Woodland, P.C.: Adaptation of deep neural network acoustic models using factorised i-vectors. In: Proceedings of Interspeech, pp. 2180–2184 (2014)
Google Scholar
Karanasou, P., Gales, M.J.F., Woodland, P.C.: I-vector estimation using informative priors for adaptation of deep neural networks. In: Interspeech, pp. 2872–2876 (2015)
Google Scholar
Kenny, P., Ouellet, P., Dehak, N., Gupta, V., Dumouchel, P.: A study of interspeaker variability in speaker verification. IEEE Trans. Audio Speech Lang. Process. 16(5), 980–988 (2008)
Article Google Scholar
Knapp, C.H., Carter, G.C.: The generalized correlation method for estimation of time delay. IEEE Trans. Acoust. Speech Signal Process. 24(4), 320–327 (1976)
Article Google Scholar
Kumar, K., Singh, R., Raj, B., Stern, R.: Gammatone sub-band magnitude-domain dereverberation for ASR. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 4604–4607. IEEE (2011)
Google Scholar
Kumar, K., Liu, C., Yao, K., Gong, Y.: Intermediate-layer DNN adaptation for offline and session-based iterative speaker adaptation. In: Proceedings of Interspeech. ISCA (2015)
Google Scholar
Kundu, S., Mantena, G., Qian, Y., Tan, T., Delcroix, M., Sim, K.C.: Joint acoustic factor learning for robust deep neural network based automatic speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP (2016)
Book Google Scholar
Leggetter, C.J., Woodland, P.C.: Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models. Comput. Speech Lang. 9(2), 171–185 (1995)
Article Google Scholar
Li, B., Sim, K.: Comparison of discriminative input and output transformation for speaker adaptation in the hybrid NN/HMM systems. In: Proceedings of Interspeech, pp. 526–529. ISCA (2010)
Google Scholar
Li, B., Sim, K.C.: Noise adaptive front-end normalization based on vector Taylor series for deep neural networks in robust speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 7408–7412. IEEE (2013)
Google Scholar
Liao, H.: Speaker adaptation of context dependent deep neural networks. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 7947–7951. IEEE (2013)
Google Scholar
Lippman, R.P., Martin, E.A., Paul, D.B.: Multi-style training for robust isolated-word speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, vol. 12, pp. 705–708. IEEE (1987)
Google Scholar
Liu, S., Sim, K.C.: Temporally varying weight regression: a semi-parametric trajectory model for automatic speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(1), 151–160 (2014)
Article Google Scholar
Liu, Y., Karanasou, P., Hain, T.: An investigation into speaker informed DNN front-end for LVCSR. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 4300–4304 (2015)
Google Scholar
Lu, X., Tsao, Y., Matsuda, S., Hori, C.: Speech enhancement based on deep denoising autoencoder. In: Proceedings of Interspeech, pp. 436–440 (2013)
Google Scholar
Miao, Y., Metze, F.: Distance-aware DNNS for robust speech recognition. In: Proceedings of Interspeech (2015)
Google Scholar
Miao, Y., Jiang, L., Zhang, H., Metze, F.: Improvements to speaker adaptive training of deep neural networks. In: IEEE Spoken Language Technology Workshop (SLT), 2014, pp. 165–170. IEEE (2014)
Google Scholar
Miao, Y., Zhang, H., Metze, F.: Towards speaker adaptive training of deep neural network acoustic models. In: Proceedings of Interspeech, pp. 2189–2193 (2014)
Google Scholar
Moreno, P.J., Raj, B., Stern, R.M.: A vector Taylor series approach for environment-independent speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, vol. 2, pp. 733–736. IEEE (1996)
Google Scholar
Nagamine, T., Seltzer, M.L., Mesgarani, N.: Exploring how deep neural networks form phonemic categories. In: Proceedings of Interspeech (2015)
Google Scholar
Naylor, P.A., Gaubitch, N.D.: Speech Dereverberation. Springer Science & Business Media, London (2010)
Book MATH Google Scholar
Neto, J., Almeida, L., Hochberg, M., Martins, C., Nunes, L., Renals, S., Robinson, T.: Speaker-adaptation for hybrid HMM-ANN continuous speech recognition system. In: Proceedings of Interspeech. ISCA (1995)
Google Scholar
Peddinti, V., Chen, G., Povey, D., Khudanpur, S.: Reverberation robust acoustic modeling using i-vectors with time delay neural networks. In: Proceedings of Interspeech (2015)
Google Scholar
Qian, Y., Yin, M., You, Y., Yu, K.: Multi-task joint-learning of deep neural networks for robust speech recognition. In: Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Scottsdale, AZ, pp. 310–316 (2015)
Google Scholar
Qian, Y., Tan, T., Yu, D.: An investigation into using parallel data for far-field speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, Shanghai, China, pp. 5725–5729 (2016)
Google Scholar
Qian, Y., Tan, T., Yu, D., Zhang, Y.: Integrated adaptation with multi-factor joint-learning for far-field speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, Shanghai, pp. 5770–5774 (2016)
Google Scholar
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (1989)
Article Google Scholar
Sainath, T.N., Kingsbury, B., Sindhwani, V., Arisoy, E., Ramabhadran, B.: Low-rank matrix factorization for deep neural network training with high-dimensional output targets. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6655–6659. IEEE (2013)
Google Scholar
Samarakoon, L., Sim, K.C.: Factorized hidden layer adaptation for deep neural network based acoustic modeling. IEEE/ACM Trans. Audio Speech Lang. Process. 24(12), 2241–2250 (2016)
Article Google Scholar
Samarakoon, L., Sim, K.C.: On combining i-vectors and discriminative adaptation methods for unsupervised speaker normalisation in DNN acoustic models. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP (2016)
Google Scholar
Samarakoon, L., Sim, K.C.: Subspace LHUC for fast adaptation of deep neural network acoustic models. In: Interspeech (2016)
Book Google Scholar
Saon, G., Soltau, H., Nahamoo, D., Picheny, M.: Speaker adaptation of neural network acoustic models using i-vectors. In: Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 55–59 (2013)
Google Scholar
Seide, F., Li, G., Chen, X., Yu, D.: Feature engineering in context-dependent deep neural networks for conversational speech transcription. In: Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 24–29. IEEE (2011)
Google Scholar
Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: Proceedings of Interspeech, pp. 437–440 (2011)
Google Scholar
Seltzer, M.L., Yu, D., Wang, Y.: An investigation of deep neural networks for noise robust speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 7398–7402 (2013)
Google Scholar
Senior, A., Moreno, I.L.: Improving DNN speaker independence with i-vector inputs. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 225–229 (2014)
Google Scholar
Shaofei, X., Abdel-Hamid, O., Hui, J., Lirong, D.: Direct adaptation of hybrid DNN/HMM model for fast speaker adaptation in LVCSR based on speaker code. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 6339–6343. IEEE (2014)
Google Scholar
Shaofei, X., Abdel-Hamid, O., Hui, J., Lirong, D., Qingfeng, L.: Fast adaptation of deep neural network based on discriminant codes for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(12), 1713–1725 (2014)
Article Google Scholar
Shilin, L., Sim, K.C.: Joint adaptation and adaptive training of TVWR for robust automatic speech recognition. In: Proceedings of Interspeech (2014)
Google Scholar
Sim, K.C.: On constructing and analysing an interpretable brain model for the DNN based on hidden activity patterns. In: Proceedings of Automatic Speech Recognition and Understanding (ASRU), pp. 22–29 (2015)
Google Scholar
Stadermann, J., Rigoll, G.: Two-stage speaker adaptation of hybrid tied-posterior acoustic models. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 977–980 (2005)
Google Scholar
Swietojanski, P., Renals, S.: Learning hidden unit contributions for unsupervised speaker adaptation of neural network acoustic models. In: Proceedings of IEEE Spoken Language Technology Workshop (SLT), pp. 171–176. IEEE (2014)
Google Scholar
Swietojanski, P., Renals, S.: SAT-LHUC: speaker adaptive training for learning hidden unit contributions. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP. IEEE (2016)
Google Scholar
Swietojanski, P., Ghoshal, A., Renals, S.: Hybrid acoustic models for distant and multichannel large vocabulary speech recognition. In: Proceedings of IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 285–290 (2013)
Google Scholar
Tan, S., Sim, K.C., Gales, M.: Improving the interpretability of deep neural networks with stimulated learning. In: Proceedings of Automatic Speech Recognition and Understanding (ASRU), pp. 617–623 (2015)
Google Scholar
Tan, T., Qian, Y., Yin, M., Zhuang, Y., Yu, K.: Cluster adaptive training for deep neural network. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, Brisbane, pp. 4325–4329 (2015)
Google Scholar
Tan, T., Qian, Y., Yu, K.: Cluster adaptive training for deep neural network based acoustic model. IEEE/ACM Trans. Audio Speech Lang. Process. 24(03), 459–468 (2016)
Article Google Scholar
Trmal, J., Zelinka, J., Müller, L.: Adaptation of a feedforward artificial neural network using a linear transform. In: Sojka, P., et al. (eds.) Text, Speech and Dialogue, pp. 423–430. Springer, Berlin/Heidelberg (2010)
Chapter Google Scholar
Variani, E., McDermott, E., Heigold, G.: A Gaussian mixture model layer jointly optimized with discriminative features within a deep neural network architecture. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 4270–4274. IEEE (2015)
Google Scholar
Vesely, K., Karafiat, M., Grezl, F., Janda, M., Egorova, E.: The language-independent bottleneck features. In: Proceedings of IEEE Spoken Language Technology Workshop (SLT), pp. 336–341 (2012)
Google Scholar
Vu, N.T., Metze, F., Schultz, T.: Multilingual bottle-neck features and its application for under-resourced languages. In: Proceedings of Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU), pp. 90–93 (2012)
Google Scholar
Wu, C., Karanasou, P., Gales, M.J., Sim, K.C.: Stimulated deep neural network for speech recognition. In: Proceedings of Interspeech, pp. 400–404. ISCA (2016)
Google Scholar
Xiao, Y., Zhang, Z., Cai, S., Pan, J., Yan, Y.: A initial attempt on task-specific adaptation for deep neural network-based large vocabulary continuous speech recognition. In: Proceedings of Interspeech. ISCA (2012)
Google Scholar
Xu, Y., Du, J., Dai, L.R., Lee, C.H.: An experimental study on speech enhancement based on deep neural networks. IEEE Signal Process. Lett. 21(1), 65–68 (2014)
Article Google Scholar
Xu, Y., Du, J., Dai, L.R., Lee, C.H.: A regression approach to speech enhancement based on deep neural networks. IEEE/ACM Trans. Audio Speech Lang. Process. 23(1), 7–19 (2015)
Article Google Scholar
Xue, J., Li, J., Gong, Y.: Restructuring of deep neural network acoustic models with singular value decomposition. In: Proceedings of Interspeech, pp. 2365–2369. ISCA (2013)
Google Scholar
Xue, J., Li, J., Yu, D., Seltzer, M., Gong, Y.: Singular value decomposition based low-footprint speaker adaptation and personalization for deep neural network. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing ICASSP, pp. 6359–6363. IEEE (2014)
Google Scholar
Xue, S., Abdel-Hamid, O., Jiang, H., Dai, L.: Direct adaptation of hybrid DNN/HMM model for fast speaker adaptation in LVCSR based on speaker code. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 6339–6343 (2014)
Google Scholar
Xue, S., Abdel-Hamid, O., Jiang, H., Dai, L., Liu, Q.: Fast adaptation of deep neural network based on discriminant codes for speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 22(12), 1713–1725 (2014)
Article Google Scholar
Xue, S., Jiang, H., Dai, L.: Speaker adaptation of hybrid NN/HMM model for speech recognition based on singular value decomposition. In: ISCSLP, pp. 1–5. IEEE (2014)
Google Scholar
Yanmin Qian, T.T., Yu, D.: Neural network based multi-factor aware joint training for robust speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 24(12), 2231–2240 (2016)
Article Google Scholar
Yao, K., Yu, D., Seide, F., Su, H., Deng, L., Gong, Y.: Adaptation of context-dependent deep neural networks for automatic speech recognition. In: IEEE Spoken Language Technology Workshop (SLT), 2012, pp. 366–369. IEEE (2012)
Google Scholar
Yoshioka, T., Sehr, A., Delcroix, M., Kinoshita, K., Maas, R., Nakatani, T., Kellermann, W.: Making machines understand us in reverberant rooms: robustness against reverberation for automatic speech recognition. IEEE Signal Process. Mag. 29(6), 114–126 (2012)
Article Google Scholar
Yu, D., Deng, L.: Automatic Speech Recognition: A Deep Learning Approach. Springer, London (2014)
MATH Google Scholar
Yu, D., Yao, K., Su, H., Li, G., Seide, F.: KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 7893–7897. IEEE (2013)
Google Scholar
Zhang, Y., Yu, D., Seltzer, M.L., Droppo, J.: Speech recognition with prediction–adaptation–correction recurrent neural networks. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 5004–5008 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

National University of Singapore, Lower Kent Ridge Road, Singapore, 119077, Singapore
Khe Chai Sim, Gautam Mantena, Lahiru Samarakoon & Souvik Kundu
Shanghai Jiao Tong University, Shanghai, China
Yanmin Qian & Tian Tan

Authors

Khe Chai Sim
View author publications
You can also search for this author in PubMed Google Scholar
Yanmin Qian
View author publications
You can also search for this author in PubMed Google Scholar
Gautam Mantena
View author publications
You can also search for this author in PubMed Google Scholar
Lahiru Samarakoon
View author publications
You can also search for this author in PubMed Google Scholar
Souvik Kundu
View author publications
You can also search for this author in PubMed Google Scholar
Tian Tan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Khe Chai Sim .

Editor information

Editors and Affiliations

Mitsubishi Electric Research Laboratories (MERL), Cambridge, Massachusetts, USA
Shinji Watanabe
NTT Communication Science Laboratories, NTT Corporation, Kyoto, Japan
Marc Delcroix
Language Technologies Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
Florian Metze
Mitsubishi Electric Research Laboratories (MERL), Cambridge, Massachusetts, USA
John R. Hershey

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sim, K.C., Qian, Y., Mantena, G., Samarakoon, L., Kundu, S., Tan, T. (2017). Adaptation of Deep Neural Network Acoustic Models for Robust Automatic Speech Recognition. In: Watanabe, S., Delcroix, M., Metze, F., Hershey, J. (eds) New Era for Robust Speech Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-64680-0_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-64680-0_9
Published: 26 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64679-4
Online ISBN: 978-3-319-64680-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics