Low-Cost Training of Speech Recognition System for Hindi ASR Challenge 2022

Zatvornitskiy, Alexander

doi:10.1007/978-3-031-20980-2_60

Alexander Zatvornitskiy ORCID: orcid.org/0000-0002-5483-8728¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13721))

Included in the following conference series:

International Conference on Speech and Computer

1077 Accesses

Abstract

This paper describes the speech recognition system, developed for Gram Vaani ASR Challenge 2022. The acoustic modeling techniques included i-vectors-based speaker adaptation and a combination of convolutional and factored time-delayed neural networks, fine-tuned with state-level Minimum Bayes Risk criteria.

Experiments with text data augmentation and separation of different domains in test data are discussed.

Proposed system is quite competitive, as it was among the top four participants in the evaluation, and show best result among individual participants. Yet, it requires very low computation resources to build it, which can be important for developing countries.

Employed by VK Company, Ltd. Proposed system and this paper were developed during vacation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Automatic Speech Recognition for Portuguese with Small Data Set

The NECTEC 2015 Thai Open-Domain Automatic Speech Recognition System

Toolkits for Robust Speech Processing

References

OpenASR21 Homepage. https://sat.nist.gov/openasr21
Baevski, A., Zhou, Y., Mohamed, A., et al.: wav2vec 2.0: a framework for self-supervised learning of speech representations. Adv. Neural Inf. Process. Syst. 33, 12449–12460 (2020)
Google Scholar
Ghahremani, P., Manohar, V., Povey, D., et al.: Acoustic modelling from the signal domain using cnns. In: Interspeech, pp. 3434–3438 (2016)
Google Scholar
Hsu, W.N., Bolte, B., Tsai, Y.H.H., et al.: Hubert: self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM Trans. Audio Speech Lang. Process. 29, 3451–3460 (2021)
Article Google Scholar
Hsu, W.N., Sriram, A., Baevski, A., et al.: Robust wav2vec 2.0: analyzing domain shift in self-supervised pre-training. In: Interspeech, pp. 721–725 (2021)
Google Scholar
Javed, T., Doddapaneni, S., Raman, A., et al.: Towards building ASR systems for the next billion users. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 10813–10821 (2022)
Google Scholar
Khokhlov, Y.Y., Medennikov, I., Romanenko, A., et al.: The stc keyword search system for openkws 2016 evaluation. In: Interspeech, pp. 3602–3606 (2017)
Google Scholar
Likhomanenko, T., Xu, Q., Kahn, J., et al.: slimIPL: language-model-free iterative pseudo-labeling. In: Interspeech, pp. 741–745 (2021)
Google Scholar
Liu, J., Zhang, W.: Research progress on key technologies of low resource speech recognition. J. Data Acquisit. Process. 32(2), 205–220 (2017)
Google Scholar
Medennikov, I., Sorokin, I., Romanenko, A., et al.: The STC system for the CHiME 2018 challenge. In: CHiME5 Workshop (2018)
Google Scholar
Medennikov, I., Khokhlov, Y.Y., Romanenko, A., et al.: The stc asr system for the voices from a distance challenge 2019. In: INTERSPEECH, pp. 2453–2457 (2019)
Google Scholar
Povey, D., Cheng, G., Wang, Y., el al.: Semi-orthogonal low-rank matrix factorization for deep neural networks. In: Interspeech, pp. 3743–3747 (2018)
Google Scholar
Povey, D., Ghoshal, A., Boulianne, G., et al.: The kaldi speech recognition toolkit. In: IEEE Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society (2011)
Google Scholar
Povey, D., Peddinti, V., Galvez, D., et al.: Purely sequence-trained neural networks for ASR based on lattice-free MMI. In: Interspeech, pp. 2751–2755 (2016)
Google Scholar
Rosenfeld, R.: A maximum entropy approach to adaptive statistical language modeling. Comput. Speech Lang. 10, 187–228 (1996)
Article Google Scholar
Saon, G., Soltau, H., Nahamoo, D., et al.: Speaker adaptation of neural network acoustic models using i-vectors. In: IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 55–59. IEEE (2013)
Google Scholar
Veselỳ, K., Ghoshal, A., Burget, L., et al.: Sequence-discriminative training of deep neural networks. In: Proceedings of Interspeech, vol. 2013, pp. 2345–2349 (2013)
Google Scholar
Wu, J., Khudanpur, S.: Efficient training methods for maximum entropy language modeling. In: Interspeech, pp. 114–118. Citeseer (2000)
Google Scholar
Xu, H., Povey, D., Mangu, L., et al.: Minimum bayes risk decoding and system combination based on a recursion for edit distance. Comput. Speech Lang. 25(4), 802–828 (2011)
Article Google Scholar
Yang, S.W., Chi, P.H., Chuang, Y.S., et al.: SUPERB: speech processing universal PERformance benchmark. In: Interspeech, pp. 1194–1198 (2021)
Google Scholar
Zhao, J., Wang, H., Li, J., et al.: The THUEE system description for the IARPA OpenASR21 challenge. In: Interspeech 2022, pp. 4855–4859 (2022)
Google Scholar

Download references

Acknowledgements

We want to thank the organizers of Gram Vaani ASR Challenge 2022 for interesting and important task, and for their work in collecting and open sourcing corpus of Hindi dialects.

Author information

Authors and Affiliations

Saint-Petersburg, Russia
Alexander Zatvornitskiy

Authors

Alexander Zatvornitskiy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexander Zatvornitskiy .

Editor information

Editors and Affiliations

Indian Institute of Technology Dharwad, Dharwad, India
S. R. Mahadeva Prasanna
St. Petersburg Federal Research Center of the Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Koneru Lakshmaiah Education Foundation, Vaddeswaram, India
K. Samudravijaya
KIIT Group of Colleges, Gurugram, India
Shyam S. Agrawal

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zatvornitskiy, A. (2022). Low-Cost Training of Speech Recognition System for Hindi ASR Challenge 2022. In: Prasanna, S.R.M., Karpov, A., Samudravijaya, K., Agrawal, S.S. (eds) Speech and Computer. SPECOM 2022. Lecture Notes in Computer Science(), vol 13721. Springer, Cham. https://doi.org/10.1007/978-3-031-20980-2_60

Download citation

DOI: https://doi.org/10.1007/978-3-031-20980-2_60
Published: 10 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20979-6
Online ISBN: 978-3-031-20980-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Low-Cost Training of Speech Recognition System for Hindi ASR Challenge 2022