Abstract
Automatic speech recognition is a rapidly developing area in the field of machine learning and is a necessary tool for controlling various devices and automated systems. However, such recognition systems are more aimed at adults than at the younger generation. The peculiarity of the development of a child’s voice leads to an increase in the error in the recognition of children’s speech in applications developed based on adult speech data. In addition, many applications do not consider the peculiarities of children’s speech and the data used when children communicate between other children and adults. Thus, there is currently a huge demand for systems that understand adult and child speech and can process them correctly. In addition, there is the problem of the lack of these languages, which are part of the agglutinative, i.e. Turkic languages, especially Kazakh language. The difficulty of assembling and developing a high-quality and large case is still an unsolved problem. This paper presents studies of children’s speech recognition based on modified data from adults and their impact on the quality of recognition for Kazakh language. Two models were built, namely the Transformer model and the insert-based model. The results obtained are satisfactory, but still require improvement and expansion of the corpus of children’s speech.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Juang, B.H., Rabiner, L.R.: Hidden markov models for speech recognition. Technometrics 33(3), 251 (1991). https://doi.org/10.2307/1268779
Brown, J.C., Smaragdis, P.: Hidden Markov and Gaussian mixture models for automatic call classification. J. Acoustical Soc. Am. 125(6), EL221–EL224 (2009). https://doi.org/10.1121/1.3124659
Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Magazine 29(6), 82–97 (2012)
Ghaffarzadegan, S., Bořil, H., Hansen, J.H.L.: Deep neural network training for whispered speech recognition using small databases and generative model sampling. Int. J. Speech Technol. 20(4), 1063–1075 (2017). https://doi.org/10.1007/s10772-017-9461-x
Children’s Art School No. 4, Engels Homepage. https://engels-dshi4.ru/index.php?option=com_content&view=article&id=86:tormanova-o-v-detskij-golos-i-osobennosti-ego-razvitiya&catid=18&Itemid=131. Last accessed 16 Mar 2023
Mamyrbayev, O., Oralbekova, D., Alimhan, K., Othman, M., Turdalykyzy, T.: A study of transformer-based end-to-end speech recognition system for Kazakh language. Sci. Rep. 12, 8337 (2022). https://doi.org/10.1038/s41598-022-12260-y
Mamyrbayev, O.Z., Oralbekova, D.O., Alimhan, K., Nuranbayeva, B.M.: Hybrid end-to-end model for Kazakh speech recognition. Int. J. Speech Technol. 26(2), 261–270 (2022). https://doi.org/10.1007/s10772-022-09983-8
Oralbekova, D., Mamyrbayev, O., Othman, M., Alimhan, K., Zhumazhanov, B., Nuranbayeva, B.: Development of CRF and CTC based end-to-end kazakh speech recognition system. In: Nguyen, N.T., Tran, T.K., Tukayev, U., Hong, TP., Trawiński, B., Szczerbicki, E. (eds.) Intelligent Information and Database Systems. ACIIDS 2022. Lecture Notes in Computer Science(), vol. 13757. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-21743-2_41
Mamyrbayev, O., Oralbekova, D., Kydyrbekova, A., Turdalykyzy, T., Bekarystankyzy, A.: End-to-end model based on RNN-T for Kazakh speech recognition. In: 2021 3rd International Conference on Computer Communication and the Internet (ICCCI), pp. 163–167 (2021). https://doi.org/10.1109/ICCCI51764.2021.9486811
Abulimiti, A., Schultz, T.: Automatic speech recognition for uyghur through multilingual acoustic modeling. In: Proceedings of the Twelfth Language Resources and Evaluation Conference, pp. 6444–6449. European Language Resources Association, Marseille, France (2020)
Du, W., Maimaitiyiming, Y., Nijat, M., Li, L., Hamdulla, A., Wang, D.: Automatic speech recognition for Uyghur, Kazakh, and Kyrgyz: an overview. Appl. Sci. 13(1), 326 (2022). https://doi.org/10.3390/app13010326
Mukhamadiyev, A., Khujayarov, I., Djuraev, O., Cho, J.: Automatic speech recognition method based on deep learning approaches for Uzbek Language. Sensors 22, 3683 (2022). https://doi.org/10.3390/s22103683
Ren, Z., Yolwas, N., Slamu, W., Cao, R., Wang, H.: Improving hybrid CTC/attention architecture for agglutinative language speech recognition. Sensors 22, 7319 (2022). https://doi.org/10.3390/s22197319
Rathor, S., Jadon, R.S.: Speech recognition and system controlling using Hindi language. In: 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1–6. Kanpur, India (2019). https://doi.org/10.1109/ICCCNT45670.2019.8944641
TechInsider Homepage: https://www.techinsider.ru/technologies/1122303-raspoznavanie-rechi-v-medicine-zachem-nam-eto-nuzhno/. Last accessed 16 Mar 2023
Sensory Inc. Homepage: https://www.sensory.com/. Last accessed 16 Mar 2023
SoapBox Inc. Homepage. https://www.soapboxlabs.com/. Last accessed 16 Feb 2023
Kadyan, V., Shanawazuddin, S., Singh, A.: Developing children’s speech recognition system for low resource Punjabi language. Appl. Acoustics 178, 108002 (2021). https://doi.org/10.1016/j.apacoust.2021.108002
Jenthe, T., Kris, D.: Transfer Learning for Robust Low-Resource Children’s Speech ASR with Transformers and Source-Filter Warping (2022). https://doi.org/10.48550/arXiv.2206.09396
Rong, T., Lei, W., Bin, M.: Transfer learning for children’s speech recognition, pp. 36–39 (2017). https://doi.org/10.1109/IALP.2017.8300540
Dissertation thesis. https://jscholarship.library.jhu.edu/bitstream/handle/1774.2/62766/WU-THESIS-2020.pdf?sequence=1. Last accessed 2 Feb 2023
Dubagunta, S.P., Hande Kabil, S., Magimai.-Doss, M.: Improving children speech recognition through feature learning from raw speech signal. In: ICASSP 2019 – 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5736–5740. Brighton, UK (2019). https://doi.org/10.1109/ICASSP.2019.8682826
Shivakumar, P.G., Narayanan, S.: End-to-end neural systems for automatic children speech recognition: an empirical study. Comput. Speech Lang. 72, 101289 (2022). https://doi.org/10.1016/j.csl.2021.101289
Potamianos, A., Narayanan, S., Lee, S.: Automatic speech recognition for children (1997). https://doi.org/10.21437/Eurospeech.1997-623
Ignatenko, G.S.: Classification of audio signals using neural networks. In: Ignatenko, G.S., Lamchanovsky, A.G. (eds.) Text: direct // Young scientist. - No. 48 (286), pp. 23–25 (2019). https://moluch.ru/archive/286/64455/
Mamyrbayev, O., Oralbekova, D., Othman, M., Turdalykyzy, T., Zhumazhanov, B., Mukhsina, K.: Investigation of insertion-based speech recognition method. Int. J. Signal Process. 7, 32–35 (2022)
Gu, J., Bradbury, J., Xiong, C., Li, V.O., Socher R.: Non-autoregressive neural machine translation. arXiv preprint arXiv:1711.02281 (2017)
Chen, N., Watanabe, S., Villalba, J., Zelasko, P., Dehak, N.: Non-autoregressive transformer for speech recognition. IEEE Signal Process. Lett. 28, 121–125 (2021)
Fujita, Y., Watanabe, S., Omachi, M., Chan, X.: Insertion-Based Modeling for End-to-End Automatic Speech Recognition. INTERSPEECH 2020 (2020). https://doi.org/10.48550/arXiv.2005.13211
Acknowledgement
This research has been funded by the Committee of Science of the Ministry of Science and Higher Education of the Republic of Kazakhstan (Grant No. AP19174298).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Oralbekova, D., Mamyrbayev, O., Othman, M., Alimhan, K., NinaKhairova, Zhunussova, A. (2023). Difficulties Developing a Children’s Speech Recognition System for Language with Limited Training Data. In: Nguyen, N.T., et al. Advances in Computational Collective Intelligence. ICCCI 2023. Communications in Computer and Information Science, vol 1864. Springer, Cham. https://doi.org/10.1007/978-3-031-41774-0_33
Download citation
DOI: https://doi.org/10.1007/978-3-031-41774-0_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-41773-3
Online ISBN: 978-3-031-41774-0
eBook Packages: Computer ScienceComputer Science (R0)