Skip to main content

Lead ASR Models to Generalize Better Using Approximated Bias-Variance Tradeoff

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2023)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1961))

Included in the following conference series:

  • 338 Accesses

Abstract

The conventional recipe for Automatic Speech Recognition (ASR) models is to 1) train multiple checkpoints on a training set while relying on a validation set to prevent over fitting using early stopping and 2) average several last checkpoints or that of the lowest validation losses to obtain the final model. In this paper, we rethink and update the early stopping and checkpoint averaging from the perspective of the bias-variance tradeoff. Theoretically, the bias and variance represent the fitness and variability of a model and the tradeoff of them determines the overall generalization error. But, it’s impractical to evaluate them precisely. As an alternative, we take the training loss and validation loss as proxies of bias and variance and guide the early stopping and checkpoint averaging using their tradeoff, namely an Approximated Bias-Variance Tradeoff (ApproBiVT). When evaluating with advanced ASR models, our recipe provides 2.5%–3.7% and 3.1%–4.6% CER reduction on the AISHELL-1 and AISHELL-2, respectively (The code and sampled unaugmented training sets used in this paper will be public available on GitHub).

Supported by the National Innovation 2030 Major S&T Project of China under Grant 2020AAA0104202 and the Basic Research of the Academy of Broadcasting Science, NRTA, under Grant JBKY20230180.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 79.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 99.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Li, J., et al.: Jasper: an end-to-end convolutional neural acoustic model. In: Interspeech 2019–20rd Annual Conference of the International Speech Communication Association (2019)

    Google Scholar 

  2. Kriman, S., et al.: QuartzNet: deep automatic speech recognition with 1D time-channel separable convolutions. In: ICASSP 2020–45rd IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6124–6128. May 4–8, Barcelona, Spain (2020)

    Google Scholar 

  3. Han, K.J., Pan, J., Naveen Tadala, V.K., Ma, T., Povey, D.: Multistream CNN for robust acoustic modeling. In: ICASSP 2021–46rd IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6873–6877. Jun. 6–11, Toronto, Ontario, Canada (2021)

    Google Scholar 

  4. Chan, W., Jaitly, N., Le, Q., Vinyals, O.: Listen, attend and spell: a neural network for large vocabulary conversational speech recognition. In: ICASSP 2016–41rd IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4960–4964. Mar. 20–25, Shanghai, China (2016)

    Google Scholar 

  5. Rao, K., Sak, H., Prabhavalkar, R.: Exploring architectures, data and units for streaming end-to-end speech recognition with RNN-transducer. In: ASRU 2017–2017 IEEE Automatic Speech Recognition and Understanding Workshop, pp. 193–199. Dec. 16–20, Okinawa, Japan (2017)

    Google Scholar 

  6. Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: NIPS 2017–31rd Conference on Neural Information Processing Systems, pp. 5998–6008. Dec. 4–9, Long Beach, California, U.S.A. (2017)

    Google Scholar 

  7. Dong, L., Xu, S., Xu, B.: Speech-Transformer: a no-recurrence sequence-to-sequence model for speech recognition. In: ICASSP 2018–43rd IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 5884–5888. Apr. 22–27, Seoul, South Korea (2018)

    Google Scholar 

  8. Moritz, N., Hori, T., Roux, J.L.: Streaming automatic speech recognition with the transformer model. In: ICASSP 2020–45rd IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6074–6078. May 4–8, Barcelona, Spain (2020)

    Google Scholar 

  9. Gulati, A., Qin, J., Chiu, C.C., et al.: Conformer: convolution-augmented transformer for speech recognition. In: Interspeech 2020–21rd Annual Conference of the International Speech Communication Association, pp. 5036–5040. Oct. 25–30, Shanghai, China (2020)

    Google Scholar 

  10. Zhang, B.B., Wu, D., Yao, Z.Y., et al.: Unified streaming and non-streaming two-pass end-to-end model for speech recognition. arXiv preprint arXiv:2012.05481 (2020)

  11. Wu, D., Zhang, B.B., Yang, C., et al.: U2++: unified two-pass bidirectional end-to-end model for speech recognition. arXiv preprint arXiv:2106.05642 (2021)

  12. An, K., Zheng, H., Ou, Z., Xiang, H., Ding, K., Wan, G.: CUSIDE: chunking, simulating future context and decoding for streaming ASR. arXiv preprint arXiv:2203.16758 (2022)

  13. Ren, X., Zhu, H., Wei, L., Wu, M., Hao, J.: Improving mandarin speech recogntion with block-augmented transformer. arXiv preprint ArXiv:2207.11697 (2022)

  14. Wang, F., Xu, B.: Shifted chunk encoder for transformer based streaming end-to-end ASR. In: ICONIP 2022–29rd International Conference on Neural Information Processing, Part V, pp. 39–51. Nov. 22–26, Virtual Event, India (2022)

    Google Scholar 

  15. Kim, S., Gholami, A., Eaton, A., et al.: Squeezeformer: an efficient transformer for automatic speech recognition. arXiv preprint ArXiv:2206.00888 (2022)

  16. Geman, S., Bienenstock, E., Doursa, R.: Neural networks and the bias/variance dilemma. Neural Comput. 4, 1–58 (1992)

    Article  Google Scholar 

  17. Morgan, N., Bourlard, H.: Generalization and parameter estimation in feedforward netws: some experiments. In: NIPS 1990–4rd Conference on Neural Information Processing Systems (1990)

    Google Scholar 

  18. Reed, R.: Pruning algorithms-a survey. IEEE Trans. Neural Netw. 4(5), 740–747 (1993)

    Article  Google Scholar 

  19. Prechelt, L.: Early stopping-but when? In Neural Networks (1996)

    Google Scholar 

  20. Popel, M., Bojar, O.: Training tips for the transformer model. The Prague Bull. Math. Linguist. 110, 43–70 (2018)

    Article  Google Scholar 

  21. Yao, Z., Wu, D., Wang, X., et al.: WeNet: production oriented streaming and non-streaming end-to-end speech recognition toolkit. In: Interspeech 2021–22rd Annual Conference of the International Speech Communication Association, Aug. 30-Sep. 3, Brno, Czech Republic (2021)

    Google Scholar 

  22. Park, D.S., Chan, W., Zhang, Y., Chiu, C.C., et al.: SpecAugment: a simple data augmentation method for automatic speech recognition. In: Interspeech 2019–20rd Annual Conference of the International Speech Communication Association, pp. 2613–2617. Graz, Austria (2019)

    Google Scholar 

  23. Bouthillier, X., Konda, K., Vincent, P., Memisevic, R.: Dropout as data augmentation. arXiv preprint arXiv:1506.08700 (2015)

  24. Bu, H., Du, J., Na, X., Wu, B., Zheng, H.: Aishell-1: an open-source mandarin speech corpus and a speech recognition baseline. In: O-COCOSDA 2017–20rd Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment, pp. 1–5. Nov. 1–3, Seoul, South Korea (2015)

    Google Scholar 

  25. Du, J., Na, X., Liu, X., Bu, H.: AISHELL-2: transforming mandarin ASR research into industrial scale. arXiv preprint ArXiv:1808.10583 (2018)

  26. Heskes, T.M.: Bias/Variance decompositions for likelihood-based estimators. Neural Comput. 10, 1425–1433 (1998)

    Article  Google Scholar 

  27. Hoffer, E., Hubara, I., Soudry, D.: Train longer, generalize better: closing the generalization gap in large batch training of neural networks. arXiv preprint ArXiv:1705.08741 (2017)

  28. Jais, I.K.M., Ismail, A.R., Nisa, S.Q.: Adam optimization algorithm for wide and deep neural network. Knowl. Eng. Data Sci. 2, 41–46 (2019)

    Article  Google Scholar 

  29. Gao, Y., Herold, Y., Yang, Z., Ney, H.: Revisiting checkpoint averaging for neural machine translation. In: AACL/IJCNLP (2022)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fangyuan Wang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, F., Hao, M., Shi, Y., Xu, B. (2024). Lead ASR Models to Generalize Better Using Approximated Bias-Variance Tradeoff. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Communications in Computer and Information Science, vol 1961. Springer, Singapore. https://doi.org/10.1007/978-981-99-8126-7_14

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-8126-7_14

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-8125-0

  • Online ISBN: 978-981-99-8126-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics