Skip to main content

Difficulties Developing a Children’s Speech Recognition System for Language with Limited Training Data

  • Conference paper
  • First Online:
Advances in Computational Collective Intelligence (ICCCI 2023)

Abstract

Automatic speech recognition is a rapidly developing area in the field of machine learning and is a necessary tool for controlling various devices and automated systems. However, such recognition systems are more aimed at adults than at the younger generation. The peculiarity of the development of a child’s voice leads to an increase in the error in the recognition of children’s speech in applications developed based on adult speech data. In addition, many applications do not consider the peculiarities of children’s speech and the data used when children communicate between other children and adults. Thus, there is currently a huge demand for systems that understand adult and child speech and can process them correctly. In addition, there is the problem of the lack of these languages, which are part of the agglutinative, i.e. Turkic languages, especially Kazakh language. The difficulty of assembling and developing a high-quality and large case is still an unsolved problem. This paper presents studies of children’s speech recognition based on modified data from adults and their impact on the quality of recognition for Kazakh language. Two models were built, namely the Transformer model and the insert-based model. The results obtained are satisfactory, but still require improvement and expansion of the corpus of children’s speech.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Juang, B.H., Rabiner, L.R.: Hidden markov models for speech recognition. Technometrics 33(3), 251 (1991). https://doi.org/10.2307/1268779

    Article  MathSciNet  MATH  Google Scholar 

  2. Brown, J.C., Smaragdis, P.: Hidden Markov and Gaussian mixture models for automatic call classification. J. Acoustical Soc. Am. 125(6), EL221–EL224 (2009). https://doi.org/10.1121/1.3124659

    Article  Google Scholar 

  3. Hinton, G., et al.: Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Process. Magazine 29(6), 82–97 (2012)

    Article  Google Scholar 

  4. Ghaffarzadegan, S., Bořil, H., Hansen, J.H.L.: Deep neural network training for whispered speech recognition using small databases and generative model sampling. Int. J. Speech Technol. 20(4), 1063–1075 (2017). https://doi.org/10.1007/s10772-017-9461-x

    Article  Google Scholar 

  5. Children’s Art School No. 4, Engels Homepage. https://engels-dshi4.ru/index.php?option=com_content&view=article&id=86:tormanova-o-v-detskij-golos-i-osobennosti-ego-razvitiya&catid=18&Itemid=131. Last accessed 16 Mar 2023

  6. https://te-st.org/2021/06/02/voice-assistants-and-problems/

  7. Mamyrbayev, O., Oralbekova, D., Alimhan, K., Othman, M., Turdalykyzy, T.: A study of transformer-based end-to-end speech recognition system for Kazakh language. Sci. Rep. 12, 8337 (2022). https://doi.org/10.1038/s41598-022-12260-y

    Article  Google Scholar 

  8. Mamyrbayev, O.Z., Oralbekova, D.O., Alimhan, K., Nuranbayeva, B.M.: Hybrid end-to-end model for Kazakh speech recognition. Int. J. Speech Technol. 26(2), 261–270 (2022). https://doi.org/10.1007/s10772-022-09983-8

    Article  Google Scholar 

  9. Oralbekova, D., Mamyrbayev, O., Othman, M., Alimhan, K., Zhumazhanov, B., Nuranbayeva, B.: Development of CRF and CTC based end-to-end kazakh speech recognition system. In: Nguyen, N.T., Tran, T.K., Tukayev, U., Hong, TP., Trawiński, B., Szczerbicki, E. (eds.) Intelligent Information and Database Systems. ACIIDS 2022. Lecture Notes in Computer Science(), vol. 13757. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-21743-2_41

  10. Mamyrbayev, O., Oralbekova, D., Kydyrbekova, A., Turdalykyzy, T., Bekarystankyzy, A.: End-to-end model based on RNN-T for Kazakh speech recognition. In: 2021 3rd International Conference on Computer Communication and the Internet (ICCCI), pp. 163–167 (2021). https://doi.org/10.1109/ICCCI51764.2021.9486811

  11. Abulimiti, A., Schultz, T.: Automatic speech recognition for uyghur through multilingual acoustic modeling. In: Proceedings of the Twelfth Language Resources and Evaluation Conference, pp. 6444–6449. European Language Resources Association, Marseille, France (2020)

    Google Scholar 

  12. Du, W., Maimaitiyiming, Y., Nijat, M., Li, L., Hamdulla, A., Wang, D.: Automatic speech recognition for Uyghur, Kazakh, and Kyrgyz: an overview. Appl. Sci. 13(1), 326 (2022). https://doi.org/10.3390/app13010326

    Article  Google Scholar 

  13. Mukhamadiyev, A., Khujayarov, I., Djuraev, O., Cho, J.: Automatic speech recognition method based on deep learning approaches for Uzbek Language. Sensors 22, 3683 (2022). https://doi.org/10.3390/s22103683

    Article  Google Scholar 

  14. Ren, Z., Yolwas, N., Slamu, W., Cao, R., Wang, H.: Improving hybrid CTC/attention architecture for agglutinative language speech recognition. Sensors 22, 7319 (2022). https://doi.org/10.3390/s22197319

    Article  Google Scholar 

  15. Rathor, S., Jadon, R.S.: Speech recognition and system controlling using Hindi language. In: 2019 10th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1–6. Kanpur, India (2019). https://doi.org/10.1109/ICCCNT45670.2019.8944641

  16. TechInsider Homepage: https://www.techinsider.ru/technologies/1122303-raspoznavanie-rechi-v-medicine-zachem-nam-eto-nuzhno/. Last accessed 16 Mar 2023

  17. Sensory Inc. Homepage: https://www.sensory.com/. Last accessed 16 Mar 2023

  18. SoapBox Inc. Homepage. https://www.soapboxlabs.com/. Last accessed 16 Feb 2023

  19. Kadyan, V., Shanawazuddin, S., Singh, A.: Developing children’s speech recognition system for low resource Punjabi language. Appl. Acoustics 178, 108002 (2021). https://doi.org/10.1016/j.apacoust.2021.108002

    Article  Google Scholar 

  20. Jenthe, T., Kris, D.: Transfer Learning for Robust Low-Resource Children’s Speech ASR with Transformers and Source-Filter Warping (2022). https://doi.org/10.48550/arXiv.2206.09396

  21. Rong, T., Lei, W., Bin, M.: Transfer learning for children’s speech recognition, pp. 36–39 (2017). https://doi.org/10.1109/IALP.2017.8300540

  22. Dissertation thesis. https://jscholarship.library.jhu.edu/bitstream/handle/1774.2/62766/WU-THESIS-2020.pdf?sequence=1. Last accessed 2 Feb 2023

  23. Dubagunta, S.P., Hande Kabil, S., Magimai.-Doss, M.: Improving children speech recognition through feature learning from raw speech signal. In: ICASSP 2019 – 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5736–5740. Brighton, UK (2019). https://doi.org/10.1109/ICASSP.2019.8682826

  24. Shivakumar, P.G., Narayanan, S.: End-to-end neural systems for automatic children speech recognition: an empirical study. Comput. Speech Lang. 72, 101289 (2022). https://doi.org/10.1016/j.csl.2021.101289

    Article  Google Scholar 

  25. Potamianos, A., Narayanan, S., Lee, S.: Automatic speech recognition for children (1997). https://doi.org/10.21437/Eurospeech.1997-623

  26. Ignatenko, G.S.: Classification of audio signals using neural networks. In: Ignatenko, G.S., Lamchanovsky, A.G. (eds.) Text: direct // Young scientist. - No. 48 (286), pp. 23–25 (2019). https://moluch.ru/archive/286/64455/

  27. Mamyrbayev, O., Oralbekova, D., Othman, M., Turdalykyzy, T., Zhumazhanov, B., Mukhsina, K.: Investigation of insertion-based speech recognition method. Int. J. Signal Process. 7, 32–35 (2022)

    Google Scholar 

  28. Gu, J., Bradbury, J., Xiong, C., Li, V.O., Socher R.: Non-autoregressive neural machine translation. arXiv preprint arXiv:1711.02281 (2017)

  29. Chen, N., Watanabe, S., Villalba, J., Zelasko, P., Dehak, N.: Non-autoregressive transformer for speech recognition. IEEE Signal Process. Lett. 28, 121–125 (2021)

    Article  Google Scholar 

  30. Fujita, Y., Watanabe, S., Omachi, M., Chan, X.: Insertion-Based Modeling for End-to-End Automatic Speech Recognition. INTERSPEECH 2020 (2020). https://doi.org/10.48550/arXiv.2005.13211

Download references

Acknowledgement

This research has been funded by the Committee of Science of the Ministry of Science and Higher Education of the Republic of Kazakhstan (Grant No. AP19174298).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dina Oralbekova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Oralbekova, D., Mamyrbayev, O., Othman, M., Alimhan, K., NinaKhairova, Zhunussova, A. (2023). Difficulties Developing a Children’s Speech Recognition System for Language with Limited Training Data. In: Nguyen, N.T., et al. Advances in Computational Collective Intelligence. ICCCI 2023. Communications in Computer and Information Science, vol 1864. Springer, Cham. https://doi.org/10.1007/978-3-031-41774-0_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-41774-0_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-41773-3

  • Online ISBN: 978-3-031-41774-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics