Skip to main content

Low-Cost Training of Speech Recognition System for Hindi ASR Challenge 2022

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2022)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13721))

Included in the following conference series:

  • 1077 Accesses

Abstract

This paper describes the speech recognition system, developed for Gram Vaani ASR Challenge 2022. The acoustic modeling techniques included i-vectors-based speaker adaptation and a combination of convolutional and factored time-delayed neural networks, fine-tuned with state-level Minimum Bayes Risk criteria.

Experiments with text data augmentation and separation of different domains in test data are discussed.

Proposed system is quite competitive, as it was among the top four participants in the evaluation, and show best result among individual participants. Yet, it requires very low computation resources to build it, which can be important for developing countries.

Employed by VK Company, Ltd. Proposed system and this paper were developed during vacation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. OpenASR21 Homepage. https://sat.nist.gov/openasr21

  2. Baevski, A., Zhou, Y., Mohamed, A., et al.: wav2vec 2.0: a framework for self-supervised learning of speech representations. Adv. Neural Inf. Process. Syst. 33, 12449–12460 (2020)

    Google Scholar 

  3. Ghahremani, P., Manohar, V., Povey, D., et al.: Acoustic modelling from the signal domain using cnns. In: Interspeech, pp. 3434–3438 (2016)

    Google Scholar 

  4. Hsu, W.N., Bolte, B., Tsai, Y.H.H., et al.: Hubert: self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM Trans. Audio Speech Lang. Process. 29, 3451–3460 (2021)

    Article  Google Scholar 

  5. Hsu, W.N., Sriram, A., Baevski, A., et al.: Robust wav2vec 2.0: analyzing domain shift in self-supervised pre-training. In: Interspeech, pp. 721–725 (2021)

    Google Scholar 

  6. Javed, T., Doddapaneni, S., Raman, A., et al.: Towards building ASR systems for the next billion users. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 10813–10821 (2022)

    Google Scholar 

  7. Khokhlov, Y.Y., Medennikov, I., Romanenko, A., et al.: The stc keyword search system for openkws 2016 evaluation. In: Interspeech, pp. 3602–3606 (2017)

    Google Scholar 

  8. Likhomanenko, T., Xu, Q., Kahn, J., et al.: slimIPL: language-model-free iterative pseudo-labeling. In: Interspeech, pp. 741–745 (2021)

    Google Scholar 

  9. Liu, J., Zhang, W.: Research progress on key technologies of low resource speech recognition. J. Data Acquisit. Process. 32(2), 205–220 (2017)

    Google Scholar 

  10. Medennikov, I., Sorokin, I., Romanenko, A., et al.: The STC system for the CHiME 2018 challenge. In: CHiME5 Workshop (2018)

    Google Scholar 

  11. Medennikov, I., Khokhlov, Y.Y., Romanenko, A., et al.: The stc asr system for the voices from a distance challenge 2019. In: INTERSPEECH, pp. 2453–2457 (2019)

    Google Scholar 

  12. Povey, D., Cheng, G., Wang, Y., el al.: Semi-orthogonal low-rank matrix factorization for deep neural networks. In: Interspeech, pp. 3743–3747 (2018)

    Google Scholar 

  13. Povey, D., Ghoshal, A., Boulianne, G., et al.: The kaldi speech recognition toolkit. In: IEEE Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society (2011)

    Google Scholar 

  14. Povey, D., Peddinti, V., Galvez, D., et al.: Purely sequence-trained neural networks for ASR based on lattice-free MMI. In: Interspeech, pp. 2751–2755 (2016)

    Google Scholar 

  15. Rosenfeld, R.: A maximum entropy approach to adaptive statistical language modeling. Comput. Speech Lang. 10, 187–228 (1996)

    Article  Google Scholar 

  16. Saon, G., Soltau, H., Nahamoo, D., et al.: Speaker adaptation of neural network acoustic models using i-vectors. In: IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 55–59. IEEE (2013)

    Google Scholar 

  17. Veselỳ, K., Ghoshal, A., Burget, L., et al.: Sequence-discriminative training of deep neural networks. In: Proceedings of Interspeech, vol. 2013, pp. 2345–2349 (2013)

    Google Scholar 

  18. Wu, J., Khudanpur, S.: Efficient training methods for maximum entropy language modeling. In: Interspeech, pp. 114–118. Citeseer (2000)

    Google Scholar 

  19. Xu, H., Povey, D., Mangu, L., et al.: Minimum bayes risk decoding and system combination based on a recursion for edit distance. Comput. Speech Lang. 25(4), 802–828 (2011)

    Article  Google Scholar 

  20. Yang, S.W., Chi, P.H., Chuang, Y.S., et al.: SUPERB: speech processing universal PERformance benchmark. In: Interspeech, pp. 1194–1198 (2021)

    Google Scholar 

  21. Zhao, J., Wang, H., Li, J., et al.: The THUEE system description for the IARPA OpenASR21 challenge. In: Interspeech 2022, pp. 4855–4859 (2022)

    Google Scholar 

Download references

Acknowledgements

We want to thank the organizers of Gram Vaani ASR Challenge 2022 for interesting and important task, and for their work in collecting and open sourcing corpus of Hindi dialects.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexander Zatvornitskiy .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zatvornitskiy, A. (2022). Low-Cost Training of Speech Recognition System for Hindi ASR Challenge 2022. In: Prasanna, S.R.M., Karpov, A., Samudravijaya, K., Agrawal, S.S. (eds) Speech and Computer. SPECOM 2022. Lecture Notes in Computer Science(), vol 13721. Springer, Cham. https://doi.org/10.1007/978-3-031-20980-2_60

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20980-2_60

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20979-6

  • Online ISBN: 978-3-031-20980-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics