Automatic Recognition of Kazakh Speech Using Deep Neural Networks

Mamyrbayev, Orken; Turdalyuly, Mussa; Mekebayev, Nurbapa; Alimhan, Keylan; Kydyrbekova, Aizat; Turdalykyzy, Tolganay

doi:10.1007/978-3-030-14802-7_40

Automatic Recognition of Kazakh Speech Using Deep Neural Networks

Conference paper
First Online: 07 March 2019

2022 Accesses
14 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11432))

Abstract

This article presents a deep neural network (DNN) system based on automatic speech recognition for Kazakh language, developed using the Kaldi speech recognition tool. DNNs are initialized using the restricted Boltzmann machines (RBM) and are trained using cross-entropy as the objective function and the standard back propagation of error. In order to achieve optimal results, the training has been modified based on peculiarities of Kazakh language. A 76 hours-corpus has been used in training. Results are compared for two different sets of values between classical models and various DNN settings.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Stouten, F., Duchateau, J., Martens, J.-P., Wambacq, P.: Coping with disfluencies in spontaneous speech recognition: acoustic detection and linguistic context manipulation. Speech Commun. 48, 1590–1606 (2006)
Article Google Scholar
Tsiaras, V., Panagiotakis, C., Stylianou, Y.: Video and audio based detection of filled hesitation pauses in classroom lectures. In: Proceedings of the 17th European Signal Processing Conference (EUSIPCO 2009), Glasgow, Scotland, 24–28 August 2009, pp. 834–838 (2009)
Google Scholar
Psutka, J., Ircing, P., Psutka, J.V., Hajič, J., Byrne, W.J., Mirovsky, J.: Automatic transcription of Czech, Russian, and Slovak spontaneous speech in the MALACH project. In: Proceedings of Eurospeech, Portugal, Lisboa, 4–8 September 2005, pp. 1349–1352 (2005)
Google Scholar
Young, S., et al.: The HTK Book (for HTK Version 3.4), Cambridge, UK, 375 p. (2009)
Google Scholar
Karpov, A., Kipyatkova, I., Ronzhin, A.: Very large vocabulary ASR for spoken Russian with syntactic and morphemic analysis. In: Proceedings INTERSPEECH-2011, Florence, Italy, pp. 3161–3164 (2011)
Google Scholar
Serizel, R., Giuliani, D.: Vocal tract length normalization approaches to DNN-Based children’s and adults’ speech recognition. In: IEEE Workshop on Spoken Language Technology, pp. 135–140 (2014)
Google Scholar
Behbahani, Y.M., Babaali, B., Turdalyuly, M.: Persian sentences to phoneme sequences conversion based on recurrent neural networks. Open Comput. Sci. 6, 219–225 (2016)
Article Google Scholar
Yu, D., Deng, L.: Automatic Speech Recognition, p. 315. Springer, London (2014). https://doi.org/10.1007/978-1-4471-5779-3
Book Google Scholar

Download references

Acknowledgements

This work was supported by the Ministry of Education and Science of the Republic of Kazakhstan. IRN AP05131207 Development of technologies for multilingual automatic speech recognition using deep neural networks.

Author information

Authors and Affiliations

Institute of Information and Computational Technology, 050010, Almaty, Kazakhstan
Orken Mamyrbayev, Mussa Turdalyuly, Keylan Alimhan & Tolganay Turdalykyzy
al-Farabi Kazakh National University, 050040, Almaty, Kazakhstan
Nurbapa Mekebayev & Aizat Kydyrbekova

Authors

Orken Mamyrbayev
View author publications
You can also search for this author in PubMed Google Scholar
Mussa Turdalyuly
View author publications
You can also search for this author in PubMed Google Scholar
Nurbapa Mekebayev
View author publications
You can also search for this author in PubMed Google Scholar
Keylan Alimhan
View author publications
You can also search for this author in PubMed Google Scholar
Aizat Kydyrbekova
View author publications
You can also search for this author in PubMed Google Scholar
Tolganay Turdalykyzy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Orken Mamyrbayev or Mussa Turdalyuly .

Editor information

Editors and Affiliations

Ton Duc Thang University, Ho Chi Minh City, Vietnam
Ngoc Thanh Nguyen
Bina Nusantara University, Jakarta, Indonesia
Ford Lumban Gaol
National University of Kaohsiung, Kaohsiung, Taiwan
Tzung-Pei Hong
Wrocław University of Science and Technology, Wrocław, Poland
Bogdan Trawiński

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mamyrbayev, O., Turdalyuly, M., Mekebayev, N., Alimhan, K., Kydyrbekova, A., Turdalykyzy, T. (2019). Automatic Recognition of Kazakh Speech Using Deep Neural Networks. In: Nguyen, N., Gaol, F., Hong, TP., Trawiński, B. (eds) Intelligent Information and Database Systems. ACIIDS 2019. Lecture Notes in Computer Science(), vol 11432. Springer, Cham. https://doi.org/10.1007/978-3-030-14802-7_40

Download citation

DOI: https://doi.org/10.1007/978-3-030-14802-7_40
Published: 07 March 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-14801-0
Online ISBN: 978-3-030-14802-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics