Abstract
The paper is devoted to an isolated word automatic speech recognition. The first part deals with a theoretical description of methods for speech signal processing and algorithms which can be used for automatic speech recognition such as a dynamic time warping, hidden Markov models and deep neural networks. The practical part is focused on the description of the proposal which is based on convolutional neural networks (CNN). The system was designed and implemented in Python using Keras and TensorFlow frameworks. An open audio dataset of spoken words was used for training and testing. A contribution of the paper lies in the specific proposal using CNN for automatic speech recognition and its validation. The presented results show that the proposed approach is able to achieve 94% accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Benzeghiba, M., et al.: Automatic speech recognition and speech variability: a review. Speech Commun. 49(10–11), 763–786 (2017)
Petkar, H.: A review of challenges in automatic speech recognition. Int. J. Comput. Appl. 151(3), 23–26 (2016)
Imtiaz, M.A., Raja, G.: Isolated word Automatic Speech Recognition (ASR) system using MFCC, DTW & KNN. In: 2016 Asia Pacific Conference on Multimedia and Broadcasting (APMediaCast), pp. 106–110 (2016). https://doi.org/10.1109/APMediaCast.2016.7878163
Senthildevi, K.A., Chandra, E.: Keyword spotting system for Tamil isolated words using Multidimensional MFCC and DTW algorithm. In: 2015 International Conference on Communication and Signal Processing, ICCSP 2015, pp. 550–554 (2015). https://doi.org/10.1109/ICCSP.2015.7322545. Article No. 7322545
Xu, L., Ke, M.: Research on isolated word recognition with DTW-based. In: ICCSE 2012 - Proceedings of 2012 7th International Conference on Computer Science and Education, pp. 139–141 (2012). https://doi.org/10.1109/ICCSE.2012.6295044. Article No. 6295044
Abu Shariah, M.A.M., Ainon, R.N., Zainuddin, R., Khalifa, O.O.: Human computer interaction using isolated-words speech recognition technology. In: 2007 International Conference on Intelligent and Advanced Systems, ICIAS 2007, pp. 1173–1178 (2007). https://doi.org/10.1109/ICIAS.2007.4658569. Article No. 4658569
Dhanashri, D., Dhonde, S.B.: Isolated word speech recognition system using deep neural networks. In: Satapathy, S., Bhateja, V., Joshi, A. (eds.) Proceedings of the International Conference on Data Engineering and Communication Technology. AISC, vol. 468, pp. 9–17. (2017). https://doi.org/10.1007/978-981-10-1675-2_2
Ranjan, R., Dubey, R.K.: Isolated word recognition using HMM for Maithili dialect. In: 2016 International Conference on Signal Processing and Communication, ICSC 2016, pp. 323–327 (2016). https://doi.org/10.1109/ICSPCom.2016.7980600. Article No. 7980600
Frangoulis, E.: Isolated word recognition in noisy environment by vector quantization of the HMM and noise distributions. In: Proceedings of ICASSP 1991: 1991 International Conference on Acoustics, Speech, and Signal Processing, pp. 413–416 (1997). https://doi.org/10.1109/ICASSP.1991.150364
Zhao, L., Han, Z.: Speech recognition system based on integrating feature and HMM. In: 2010 International Conference on Measuring Technology and Mechatronics Automation, ICMTMA 2010, vol. 3, pp. 449–452 (2010). https://doi.org/10.1109/ICMTMA.2010.298. Article No. 5458876
Singhal, S., Dubey, R.K.: Automatic speech recognition for connected words using DTW/HMM for English/Hindi languages. In: International Conference Communication, Control and Intelligent Systems, CCIS 2015, pp. 199–203 (2016). https://doi.org/10.1109/CCIntelS.2015.7437908. Article No. 7437908
Chen, G., Parada, C., Heigold, G.: Small-footprint keyword spotting using deep neural networks. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, pp. 4087–4091 (2014). https://doi.org/10.1109/ICASSP.2014.6854370
Psutka, J.: Mluvíme s počítačem česky. Academia, Praha (2006)
Partila, P., Voznak, M., Mikulec, M., Zdralek, J.: Fundamental frequency extraction method using central clipping and its importance for the classification of emotional state. Adv. Electr. Electron. Eng. 10(4), 270–275 (2012)
Ibrahim, Y.A., Odiketa, J.C., Ibiyemi, T.S.: Preprocessing technique in automatic speech recognition for human computer interaction: an overview. Ann. Comput. Sci. Ser. 15(1), 186–191 (2017)
Bou-Ghazale, S.E., Hansen, J.H.L.: A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE Trans. Speech Audio Process. 8(4), 429–442 (2000)
Fazio, P., Tropea, M., Sottile, C., Lupia, A.: Vehicular networking and channel modeling: a new Markovian approach. In: 2015 12th Annual IEEE Consumer Communications and Networking Conference (CCNC), pp. 702–707 (2015)
Tropea, M., Fazio, P., Veltri, F., Marano, S.: A new DVB-RCS satellite channel model based on Discrete Time Markov Chain and Quality Degree. In: 2013 IEEE Wireless Communications and Networking Conference (WCNC), pp. 2615–2619 (2013)
Fazio, P., Tropea, M.: A new Markovian prediction scheme for resource reservations in wireless networks with mobile hosts. Adv. Electr. Electron. Eng. 10(4), 204–210 (2012)
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (2016). https://doi.org/10.1109/5.18626
Cooke, M., Green, P., Josifovski, V., Vizinho, A.: Robust automatic speech recognition with missing and unreliable acoustic data. Speech Commun. 34(3), 267–285 (2001). https://doi.org/10.1016/S0167-6393(00)00034-0. Accessed 26 Feb 2020
Young, S., et al.: The HTK Book. HTK Version 3.4. B.m.: Cambridge University Engineering Department (2006)
Zaccone, G., Karim, M.R., Mensha, A.: Deep Learning with TensorFlow. Packt Publishing, Birmingham (2017)
Tropea, M., Fedele, G.: Classifiers comparison for Convolutional Neural Networks (CNNs) in image classification. In: 2019 IEEE/ACM 23rd International Symposium on Distributed Simulation and Real Time Applications (DS-RT), pp. 1–4 (2019). https://doi.org/10.1109/DS-RT47707.2019.8958662
Dorfler, M., Bammer, R., Grill, T.: Inside the spectrogram: Convolutional Neural Networks in audio processing. In: 2017 International Conference on Sampling Theory and Applications (SampTA), pp. 152–155 (2017). https://doi.org/10.1109/SAMPTA.2017.8024472
Gouda, S.K., et al.: Speech recognition: keyword spotting through image recognition. arXiv preprint arXiv:1803.03759 (2018)
Fu, S.-W., Hu, T.-Y., Tsao, Y., Lu, X.: Complex spectrogram enhancement by convolutional neural network with multi-metrics learning. In: IEEE International Workshop on Machine Learning for Signal Processing, MLSP, pp. 1–6 (2017). https://doi.org/10.1109/MLSP.2017.8168119
Abdel-Hamid, O., Mohamed, A.-R., Jiang, H., Deng, L., Penn, G., Yu, D.: Convolutional neural networks for speech recognition. IEEE Trans. Audio Speech Lang. Process. 22(10), 1533–1545 (2014). Article No. 2339736
Warden, P.: Speech commands: a dataset for limited-vocabulary speech recognition. http://arxiv.org/abs/1804.03209. Accessed 09 Mar 2019
Acknowledgment
The research was supported by the Czech Ministry of Education, Youth and Sports from the Large Infrastructures for Research, Experimental Development and Innovations project reg. no. LM2015070 at the IT4Innovations - National Supercomputing Center, where a computational time was provided by the projects OPEN-19-38, and partly by the institutional grant SGS reg. no. SP2020/65 conducted at VSB - Technical University of Ostrava.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
A Neural network diagram
A Neural network diagram
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Slívová, M., Partila, P., Továrek, J., Vozňák, M. (2020). Isolated Word Automatic Speech Recognition System. In: Dziech, A., Mees, W., Czyżewski, A. (eds) Multimedia Communications, Services and Security. MCSS 2020. Communications in Computer and Information Science, vol 1284. Springer, Cham. https://doi.org/10.1007/978-3-030-59000-0_19
Download citation
DOI: https://doi.org/10.1007/978-3-030-59000-0_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58999-8
Online ISBN: 978-3-030-59000-0
eBook Packages: Computer ScienceComputer Science (R0)