Sequence-Discriminative Training of Neural Networks

Chen, Guoguo; Zhang, Yu; Yu, Dong

doi:10.1007/978-3-319-64680-0_12

Sequence-Discriminative Training of Neural Networks

Guoguo Chen⁵,
Yu Zhang⁶ &
Dong Yu⁷

Chapter
First Online: 26 July 2017

2236 Accesses
1 Citations

Abstract

In this chapter we explore sequence-discriminative training techniques for neural-network–hidden-Markov-model (NN-HMM) hybrid speech recognition systems. We first review different sequence-discriminative training criteria for NN-HMM hybrid systems, including maximum mutual information (MMI), boosted, minimum phone error, and state-level minimum Bayes risk (sMBR). We then focus on the sMBR criterion, and demonstrate a few heuristics, such as denominator language model order and frame-smoothing, that may improve the recognition performance. We further propose a two-forward-pass procedure to speed up sequence-discriminative training when memory is the main constraint. Experiments were conducted on the AMI meeting corpus.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Agarwal, A., Akchurin, E., Basoglu, C., Chen, G., Cyphers, S., Droppo, J., Eversole, A., Guenter, B., Hillebrand, M., Hoens, R., Huang, X., Huang, Z., Ivanov, V., Kamenev, A., Kranen, P., Kuchaiev, O., Manousek, W., May, A., Mitra, B., Nano, O., Navarro, G., Orlov, A., Parthasarathi, H., Peng, B., Padmilac, M., Reznichenko, A., Seide, F., Seltzer, M.L., Slaney, M., Stolcke, A., Wang, Y., Wang, H., Yao, K., Yu, D., Zhang, Y., Zweig, G.: An introduction to computational networks and the computational network toolkit. Technical Report MSR-TR-2014-112, Microsoft Research (2014)
Google Scholar
Bahl, L., Brown, P.F., De Souza, P.V., Mercer, R.L.: Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 86, pp. 49–52 (1986)
Google Scholar
Bridle, J., Dodd, L.: An Alphanet approach to optimising input transformations for continuous speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 277–280. IEEE (1991)
Google Scholar
Carletta, J.: Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus. Lang. Resour. Eval. 41(2), 181–190 (2007)
Article Google Scholar
Chen, K., Huo, Q.: Training deep bidirectional LSTM acoustic model for LVCSR by a context-sensitive-chunk BPTT approach. IEEE/ACM Trans. Audio Speech Lang. Process. 24(7), 1185–1193 (2016)
Article Google Scholar
Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)
Article Google Scholar
Fiscus, J.G., Ajot, J., Radde, N., Laprun, C.: Multiple dimension Levenshtein edit distance calculations for evaluating automatic speech recognition systems during simultaneous speech. In: Proceedings of the International Conference on Language Resources and Evaluation (LERC) (2006)
Google Scholar
Gibson, M., Hain, T.: Hypothesis spaces for minimum Bayes risk training in large vocabulary speech recognition. In: Proceedings of INTERSPEECH (2006)
Google Scholar
Gopalakrishnan, P., Kanevsky, D., Nadas, A., Nahamoo, D., Picheny, M.: Decoder selection based on cross-entropies. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 20–23. IEEE, New York (1988)
Google Scholar
Graves, A., Jaitly, N., Mohamed, A.R.: Hybrid speech recognition with deep bidirectional LSTM. In: Proceedings of Automatic Speech Recognition and Understanding (ASRU), pp. 273–278. IEEE, New York (2013)
Google Scholar
Heigold, G., McDermott, E., Vanhoucke, V., Senior, A., Bacchiani, M.: Asynchronous stochastic optimization for sequence training of deep neural networks. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5587–5591. IEEE, New York (2014)
Google Scholar
Kaiser, J., Horvat, B., Kacic, Z.: A novel loss function for the overall risk criterion based discriminative training of HMM models. In: Proceedings of the Sixth International Conference on Spoken Language Processing (2000)
Google Scholar
Kapadia, S., Valtchev, V., Young, S.: MMI training for continuous phoneme recognition on the TIMIT database. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. 491–494. IEEE, New York (1993)
Google Scholar
Kingsbury, B.: Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3761–3764. IEEE, New York (2009)
Google Scholar
Kingsbury, B., Sainath, T.N., Soltau, H.: Scalable minimum Bayes risk training of deep neural network acoustic models using distributed Hessian-free optimization. In: Proceedings of INTERSPEECH (2012)
Google Scholar
Povey, D.: Discriminative training for large vocabulary speech recognition. Ph.D. thesis, University of Cambridge (2005)
Google Scholar
Povey, D., Kingsbury, B.: Evaluation of proposed modifications to MPE for large scale discriminative training. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 4, pp. IV-321. IEEE, New York (2007)
Google Scholar
Povey, D., Woodland, P.C.: Minimum phone error and I-smoothing for improved discriminative training. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. I-105. IEEE, New York (2002)
Google Scholar
Povey, D., Kanevsky, D., Kingsbury, B., Ramabhadran, B., Saon, G., Visweswariah, K.: Boosted MMI for model and feature-space discriminative training. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4057–4060. IEEE, New York (2008)
Google Scholar
Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., et al.: The Kaldi speech recognition toolkit. In: Proceedings of Automatic Speech Recognition and Understanding (ASRU), EPFL-CONF-192584. IEEE Signal Processing Society, Piscataway (2011)
Google Scholar
Sak, H., Senior, A., Beaufays, F.: Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition (2014). arXiv preprint arXiv:1402.1128
Google Scholar
Sak, H., Vinyals, O., Heigold, G., Senior, A., McDermott, E., Monga, R., Mao, M.: Sequence discriminative distributed training of long short-term memory recurrent neural networks. In: Proceedings of INTERSPEECH (2014)
Google Scholar
Seide, F., Li, G., Chen, X., Yu, D.: Feature engineering in context-dependent deep neural networks for conversational speech transcription. In: Proceedings of Automatic Speech Recognition and Understanding (ASRU), pp. 24–29. IEEE, New York (2011)
Google Scholar
Su, H., Li, G., Yu, D., Seide, F.: Error back propagation for sequence training of context-dependent deep networks for conversational speech transcription. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6664–6668. IEEE, New York (2013)
Google Scholar
Valtchev, V., Odell, J., Woodland, P.C., Young, S.J.: MMIE training of large vocabulary recognition systems. Speech Commun. 22(4), 303–314 (1997)
Article Google Scholar
Veselỳ, K., Ghoshal, A., Burget, L., Povey, D.: Sequence-discriminative training of deep neural networks. In: Proceedings of INTERSPEECH, pp. 2345–2349 (2013)
Google Scholar
Wang, G., Sim, K.C.: Sequential classification criteria for NNs in automatic speech recognition. In: Proceedings of INTERSPEECH (2011)
Google Scholar
Williams, R.J., Peng, J.: An efficient gradient-based algorithm for on-line training of recurrent network trajectories. Neural Comput. 2(4), 490–501 (1990)
Article Google Scholar
Yu, D., Deng, L.: Automatic Speech Recognition, pp. 137–153. Springer, London (2015)
Google Scholar
Zhang, Y., Chen, G., Yu, D., Yao, K., Khudanpur, S., Glass, J.: Highway long short-term memory RNNs for distant speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, New York (2016)
Google Scholar
Zilly, J.G., Srivastava, R.K., Koutník, J., Schmidhuber, J.: Recurrent highway networks (2016). arXiv preprint arXiv:1607.03474
Google Scholar

Download references

Author information

Authors and Affiliations

Johns Hopkins University, Baltimore, MD, USA
Guoguo Chen
Massachusetts Institute of Technology, Cambridge, MA, USA
Yu Zhang
Microsoft Research, Redmond, WA, USA
Dong Yu

Authors

Guoguo Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Dong Yu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guoguo Chen .

Editor information

Editors and Affiliations

Mitsubishi Electric Research Laboratories (MERL), Cambridge, Massachusetts, USA
Shinji Watanabe
NTT Communication Science Laboratories, NTT Corporation, Kyoto, Japan
Marc Delcroix
Language Technologies Institute, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
Florian Metze
Mitsubishi Electric Research Laboratories (MERL), Cambridge, Massachusetts, USA
John R. Hershey

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chen, G., Zhang, Y., Yu, D. (2017). Sequence-Discriminative Training of Neural Networks. In: Watanabe, S., Delcroix, M., Metze, F., Hershey, J. (eds) New Era for Robust Speech Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-64680-0_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-64680-0_12
Published: 26 July 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64679-4
Online ISBN: 978-3-319-64680-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics