Abstract
This paper describes the speech recognition system, developed for Gram Vaani ASR Challenge 2022. The acoustic modeling techniques included i-vectors-based speaker adaptation and a combination of convolutional and factored time-delayed neural networks, fine-tuned with state-level Minimum Bayes Risk criteria.
Experiments with text data augmentation and separation of different domains in test data are discussed.
Proposed system is quite competitive, as it was among the top four participants in the evaluation, and show best result among individual participants. Yet, it requires very low computation resources to build it, which can be important for developing countries.
Employed by VK Company, Ltd. Proposed system and this paper were developed during vacation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
OpenASR21 Homepage. https://sat.nist.gov/openasr21
Baevski, A., Zhou, Y., Mohamed, A., et al.: wav2vec 2.0: a framework for self-supervised learning of speech representations. Adv. Neural Inf. Process. Syst. 33, 12449–12460 (2020)
Ghahremani, P., Manohar, V., Povey, D., et al.: Acoustic modelling from the signal domain using cnns. In: Interspeech, pp. 3434–3438 (2016)
Hsu, W.N., Bolte, B., Tsai, Y.H.H., et al.: Hubert: self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM Trans. Audio Speech Lang. Process. 29, 3451–3460 (2021)
Hsu, W.N., Sriram, A., Baevski, A., et al.: Robust wav2vec 2.0: analyzing domain shift in self-supervised pre-training. In: Interspeech, pp. 721–725 (2021)
Javed, T., Doddapaneni, S., Raman, A., et al.: Towards building ASR systems for the next billion users. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, pp. 10813–10821 (2022)
Khokhlov, Y.Y., Medennikov, I., Romanenko, A., et al.: The stc keyword search system for openkws 2016 evaluation. In: Interspeech, pp. 3602–3606 (2017)
Likhomanenko, T., Xu, Q., Kahn, J., et al.: slimIPL: language-model-free iterative pseudo-labeling. In: Interspeech, pp. 741–745 (2021)
Liu, J., Zhang, W.: Research progress on key technologies of low resource speech recognition. J. Data Acquisit. Process. 32(2), 205–220 (2017)
Medennikov, I., Sorokin, I., Romanenko, A., et al.: The STC system for the CHiME 2018 challenge. In: CHiME5 Workshop (2018)
Medennikov, I., Khokhlov, Y.Y., Romanenko, A., et al.: The stc asr system for the voices from a distance challenge 2019. In: INTERSPEECH, pp. 2453–2457 (2019)
Povey, D., Cheng, G., Wang, Y., el al.: Semi-orthogonal low-rank matrix factorization for deep neural networks. In: Interspeech, pp. 3743–3747 (2018)
Povey, D., Ghoshal, A., Boulianne, G., et al.: The kaldi speech recognition toolkit. In: IEEE Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society (2011)
Povey, D., Peddinti, V., Galvez, D., et al.: Purely sequence-trained neural networks for ASR based on lattice-free MMI. In: Interspeech, pp. 2751–2755 (2016)
Rosenfeld, R.: A maximum entropy approach to adaptive statistical language modeling. Comput. Speech Lang. 10, 187–228 (1996)
Saon, G., Soltau, H., Nahamoo, D., et al.: Speaker adaptation of neural network acoustic models using i-vectors. In: IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 55–59. IEEE (2013)
Veselỳ, K., Ghoshal, A., Burget, L., et al.: Sequence-discriminative training of deep neural networks. In: Proceedings of Interspeech, vol. 2013, pp. 2345–2349 (2013)
Wu, J., Khudanpur, S.: Efficient training methods for maximum entropy language modeling. In: Interspeech, pp. 114–118. Citeseer (2000)
Xu, H., Povey, D., Mangu, L., et al.: Minimum bayes risk decoding and system combination based on a recursion for edit distance. Comput. Speech Lang. 25(4), 802–828 (2011)
Yang, S.W., Chi, P.H., Chuang, Y.S., et al.: SUPERB: speech processing universal PERformance benchmark. In: Interspeech, pp. 1194–1198 (2021)
Zhao, J., Wang, H., Li, J., et al.: The THUEE system description for the IARPA OpenASR21 challenge. In: Interspeech 2022, pp. 4855–4859 (2022)
Acknowledgements
We want to thank the organizers of Gram Vaani ASR Challenge 2022 for interesting and important task, and for their work in collecting and open sourcing corpus of Hindi dialects.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Zatvornitskiy, A. (2022). Low-Cost Training of Speech Recognition System for Hindi ASR Challenge 2022. In: Prasanna, S.R.M., Karpov, A., Samudravijaya, K., Agrawal, S.S. (eds) Speech and Computer. SPECOM 2022. Lecture Notes in Computer Science(), vol 13721. Springer, Cham. https://doi.org/10.1007/978-3-031-20980-2_60
Download citation
DOI: https://doi.org/10.1007/978-3-031-20980-2_60
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20979-6
Online ISBN: 978-3-031-20980-2
eBook Packages: Computer ScienceComputer Science (R0)