Skip to main content

Sequence-Discriminative Training of Neural Networks

  • Chapter
  • First Online:

Abstract

In this chapter we explore sequence-discriminative training techniques for neural-network–hidden-Markov-model (NN-HMM) hybrid speech recognition systems. We first review different sequence-discriminative training criteria for NN-HMM hybrid systems, including maximum mutual information (MMI), boosted, minimum phone error, and state-level minimum Bayes risk (sMBR). We then focus on the sMBR criterion, and demonstrate a few heuristics, such as denominator language model order and frame-smoothing, that may improve the recognition performance. We further propose a two-forward-pass procedure to speed up sequence-discriminative training when memory is the main constraint. Experiments were conducted on the AMI meeting corpus.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Agarwal, A., Akchurin, E., Basoglu, C., Chen, G., Cyphers, S., Droppo, J., Eversole, A., Guenter, B., Hillebrand, M., Hoens, R., Huang, X., Huang, Z., Ivanov, V., Kamenev, A., Kranen, P., Kuchaiev, O., Manousek, W., May, A., Mitra, B., Nano, O., Navarro, G., Orlov, A., Parthasarathi, H., Peng, B., Padmilac, M., Reznichenko, A., Seide, F., Seltzer, M.L., Slaney, M., Stolcke, A., Wang, Y., Wang, H., Yao, K., Yu, D., Zhang, Y., Zweig, G.: An introduction to computational networks and the computational network toolkit. Technical Report MSR-TR-2014-112, Microsoft Research (2014)

    Google Scholar 

  2. Bahl, L., Brown, P.F., De Souza, P.V., Mercer, R.L.: Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 86, pp. 49–52 (1986)

    Google Scholar 

  3. Bridle, J., Dodd, L.: An Alphanet approach to optimising input transformations for continuous speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 277–280. IEEE (1991)

    Google Scholar 

  4. Carletta, J.: Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus. Lang. Resour. Eval. 41(2), 181–190 (2007)

    Article  Google Scholar 

  5. Chen, K., Huo, Q.: Training deep bidirectional LSTM acoustic model for LVCSR by a context-sensitive-chunk BPTT approach. IEEE/ACM Trans. Audio Speech Lang. Process. 24(7), 1185–1193 (2016)

    Article  Google Scholar 

  6. Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio Speech Lang. Process. 20(1), 30–42 (2012)

    Article  Google Scholar 

  7. Fiscus, J.G., Ajot, J., Radde, N., Laprun, C.: Multiple dimension Levenshtein edit distance calculations for evaluating automatic speech recognition systems during simultaneous speech. In: Proceedings of the International Conference on Language Resources and Evaluation (LERC) (2006)

    Google Scholar 

  8. Gibson, M., Hain, T.: Hypothesis spaces for minimum Bayes risk training in large vocabulary speech recognition. In: Proceedings of INTERSPEECH (2006)

    Google Scholar 

  9. Gopalakrishnan, P., Kanevsky, D., Nadas, A., Nahamoo, D., Picheny, M.: Decoder selection based on cross-entropies. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 20–23. IEEE, New York (1988)

    Google Scholar 

  10. Graves, A., Jaitly, N., Mohamed, A.R.: Hybrid speech recognition with deep bidirectional LSTM. In: Proceedings of Automatic Speech Recognition and Understanding (ASRU), pp. 273–278. IEEE, New York (2013)

    Google Scholar 

  11. Heigold, G., McDermott, E., Vanhoucke, V., Senior, A., Bacchiani, M.: Asynchronous stochastic optimization for sequence training of deep neural networks. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5587–5591. IEEE, New York (2014)

    Google Scholar 

  12. Kaiser, J., Horvat, B., Kacic, Z.: A novel loss function for the overall risk criterion based discriminative training of HMM models. In: Proceedings of the Sixth International Conference on Spoken Language Processing (2000)

    Google Scholar 

  13. Kapadia, S., Valtchev, V., Young, S.: MMI training for continuous phoneme recognition on the TIMIT database. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. 491–494. IEEE, New York (1993)

    Google Scholar 

  14. Kingsbury, B.: Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3761–3764. IEEE, New York (2009)

    Google Scholar 

  15. Kingsbury, B., Sainath, T.N., Soltau, H.: Scalable minimum Bayes risk training of deep neural network acoustic models using distributed Hessian-free optimization. In: Proceedings of INTERSPEECH (2012)

    Google Scholar 

  16. Povey, D.: Discriminative training for large vocabulary speech recognition. Ph.D. thesis, University of Cambridge (2005)

    Google Scholar 

  17. Povey, D., Kingsbury, B.: Evaluation of proposed modifications to MPE for large scale discriminative training. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 4, pp. IV-321. IEEE, New York (2007)

    Google Scholar 

  18. Povey, D., Woodland, P.C.: Minimum phone error and I-smoothing for improved discriminative training. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. I-105. IEEE, New York (2002)

    Google Scholar 

  19. Povey, D., Kanevsky, D., Kingsbury, B., Ramabhadran, B., Saon, G., Visweswariah, K.: Boosted MMI for model and feature-space discriminative training. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4057–4060. IEEE, New York (2008)

    Google Scholar 

  20. Povey, D., Ghoshal, A., Boulianne, G., Burget, L., Glembek, O., Goel, N., Hannemann, M., Motlicek, P., Qian, Y., Schwarz, P., et al.: The Kaldi speech recognition toolkit. In: Proceedings of Automatic Speech Recognition and Understanding (ASRU), EPFL-CONF-192584. IEEE Signal Processing Society, Piscataway (2011)

    Google Scholar 

  21. Sak, H., Senior, A., Beaufays, F.: Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition (2014). arXiv preprint arXiv:1402.1128

    Google Scholar 

  22. Sak, H., Vinyals, O., Heigold, G., Senior, A., McDermott, E., Monga, R., Mao, M.: Sequence discriminative distributed training of long short-term memory recurrent neural networks. In: Proceedings of INTERSPEECH (2014)

    Google Scholar 

  23. Seide, F., Li, G., Chen, X., Yu, D.: Feature engineering in context-dependent deep neural networks for conversational speech transcription. In: Proceedings of Automatic Speech Recognition and Understanding (ASRU), pp. 24–29. IEEE, New York (2011)

    Google Scholar 

  24. Su, H., Li, G., Yu, D., Seide, F.: Error back propagation for sequence training of context-dependent deep networks for conversational speech transcription. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6664–6668. IEEE, New York (2013)

    Google Scholar 

  25. Valtchev, V., Odell, J., Woodland, P.C., Young, S.J.: MMIE training of large vocabulary recognition systems. Speech Commun. 22(4), 303–314 (1997)

    Article  Google Scholar 

  26. Veselỳ, K., Ghoshal, A., Burget, L., Povey, D.: Sequence-discriminative training of deep neural networks. In: Proceedings of INTERSPEECH, pp. 2345–2349 (2013)

    Google Scholar 

  27. Wang, G., Sim, K.C.: Sequential classification criteria for NNs in automatic speech recognition. In: Proceedings of INTERSPEECH (2011)

    Google Scholar 

  28. Williams, R.J., Peng, J.: An efficient gradient-based algorithm for on-line training of recurrent network trajectories. Neural Comput. 2(4), 490–501 (1990)

    Article  Google Scholar 

  29. Yu, D., Deng, L.: Automatic Speech Recognition, pp. 137–153. Springer, London (2015)

    Google Scholar 

  30. Zhang, Y., Chen, G., Yu, D., Yao, K., Khudanpur, S., Glass, J.: Highway long short-term memory RNNs for distant speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, New York (2016)

    Google Scholar 

  31. Zilly, J.G., Srivastava, R.K., Koutník, J., Schmidhuber, J.: Recurrent highway networks (2016). arXiv preprint arXiv:1607.03474

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guoguo Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this chapter

Cite this chapter

Chen, G., Zhang, Y., Yu, D. (2017). Sequence-Discriminative Training of Neural Networks. In: Watanabe, S., Delcroix, M., Metze, F., Hershey, J. (eds) New Era for Robust Speech Recognition. Springer, Cham. https://doi.org/10.1007/978-3-319-64680-0_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-64680-0_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-64679-4

  • Online ISBN: 978-3-319-64680-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics