Abstract
This article presents a deep neural network (DNN) system based on automatic speech recognition for Kazakh language, developed using the Kaldi speech recognition tool. DNNs are initialized using the restricted Boltzmann machines (RBM) and are trained using cross-entropy as the objective function and the standard back propagation of error. In order to achieve optimal results, the training has been modified based on peculiarities of Kazakh language. A 76 hours-corpus has been used in training. Results are compared for two different sets of values between classical models and various DNN settings.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Stouten, F., Duchateau, J., Martens, J.-P., Wambacq, P.: Coping with disfluencies in spontaneous speech recognition: acoustic detection and linguistic context manipulation. Speech Commun. 48, 1590–1606 (2006)
Tsiaras, V., Panagiotakis, C., Stylianou, Y.: Video and audio based detection of filled hesitation pauses in classroom lectures. In: Proceedings of the 17th European Signal Processing Conference (EUSIPCO 2009), Glasgow, Scotland, 24–28 August 2009, pp. 834–838 (2009)
Psutka, J., Ircing, P., Psutka, J.V., Hajič, J., Byrne, W.J., Mirovsky, J.: Automatic transcription of Czech, Russian, and Slovak spontaneous speech in the MALACH project. In: Proceedings of Eurospeech, Portugal, Lisboa, 4–8 September 2005, pp. 1349–1352 (2005)
Young, S., et al.: The HTK Book (for HTK Version 3.4), Cambridge, UK, 375 p. (2009)
Karpov, A., Kipyatkova, I., Ronzhin, A.: Very large vocabulary ASR for spoken Russian with syntactic and morphemic analysis. In: Proceedings INTERSPEECH-2011, Florence, Italy, pp. 3161–3164 (2011)
Serizel, R., Giuliani, D.: Vocal tract length normalization approaches to DNN-Based children’s and adults’ speech recognition. In: IEEE Workshop on Spoken Language Technology, pp. 135–140 (2014)
Behbahani, Y.M., Babaali, B., Turdalyuly, M.: Persian sentences to phoneme sequences conversion based on recurrent neural networks. Open Comput. Sci. 6, 219–225 (2016)
Yu, D., Deng, L.: Automatic Speech Recognition, p. 315. Springer, London (2014). https://doi.org/10.1007/978-1-4471-5779-3
Acknowledgements
This work was supported by the Ministry of Education and Science of the Republic of Kazakhstan. IRN AP05131207 Development of technologies for multilingual automatic speech recognition using deep neural networks.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Mamyrbayev, O., Turdalyuly, M., Mekebayev, N., Alimhan, K., Kydyrbekova, A., Turdalykyzy, T. (2019). Automatic Recognition of Kazakh Speech Using Deep Neural Networks. In: Nguyen, N., Gaol, F., Hong, TP., Trawiński, B. (eds) Intelligent Information and Database Systems. ACIIDS 2019. Lecture Notes in Computer Science(), vol 11432. Springer, Cham. https://doi.org/10.1007/978-3-030-14802-7_40
Download citation
DOI: https://doi.org/10.1007/978-3-030-14802-7_40
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-14801-0
Online ISBN: 978-3-030-14802-7
eBook Packages: Computer ScienceComputer Science (R0)