Isolated Word Automatic Speech Recognition System

Slívová, Martina; Partila, Pavol; Továrek, Jaromír; Vozňák, Miroslav

doi:10.1007/978-3-030-59000-0_19

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1284))

Included in the following conference series:

International Conference on Multimedia Communications, Services and Security

526 Accesses
4 Citations

Abstract

The paper is devoted to an isolated word automatic speech recognition. The first part deals with a theoretical description of methods for speech signal processing and algorithms which can be used for automatic speech recognition such as a dynamic time warping, hidden Markov models and deep neural networks. The practical part is focused on the description of the proposal which is based on convolutional neural networks (CNN). The system was designed and implemented in Python using Keras and TensorFlow frameworks. An open audio dataset of spoken words was used for training and testing. A contribution of the paper lies in the specific proposal using CNN for automatic speech recognition and its validation. The presented results show that the proposed approach is able to achieve 94% accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

An automatic speech recognition system for isolated Amazigh word using 1D & 2D CNN-LSTM architecture

Article 01 September 2023

Speech Recognition System of Spoken Isolated Digit in Standard Khasi Dialect

RETRACTED ARTICLE: Nonlinear acoustic noise cancellation based automatic speech recognition system (NANC-ASR) with convolutional neural networks

Article 30 April 2021

References

Benzeghiba, M., et al.: Automatic speech recognition and speech variability: a review. Speech Commun. 49(10–11), 763–786 (2017)
Google Scholar
Petkar, H.: A review of challenges in automatic speech recognition. Int. J. Comput. Appl. 151(3), 23–26 (2016)
Google Scholar
Imtiaz, M.A., Raja, G.: Isolated word Automatic Speech Recognition (ASR) system using MFCC, DTW & KNN. In: 2016 Asia Pacific Conference on Multimedia and Broadcasting (APMediaCast), pp. 106–110 (2016). https://doi.org/10.1109/APMediaCast.2016.7878163
Senthildevi, K.A., Chandra, E.: Keyword spotting system for Tamil isolated words using Multidimensional MFCC and DTW algorithm. In: 2015 International Conference on Communication and Signal Processing, ICCSP 2015, pp. 550–554 (2015). https://doi.org/10.1109/ICCSP.2015.7322545. Article No. 7322545
Xu, L., Ke, M.: Research on isolated word recognition with DTW-based. In: ICCSE 2012 - Proceedings of 2012 7th International Conference on Computer Science and Education, pp. 139–141 (2012). https://doi.org/10.1109/ICCSE.2012.6295044. Article No. 6295044
Abu Shariah, M.A.M., Ainon, R.N., Zainuddin, R., Khalifa, O.O.: Human computer interaction using isolated-words speech recognition technology. In: 2007 International Conference on Intelligent and Advanced Systems, ICIAS 2007, pp. 1173–1178 (2007). https://doi.org/10.1109/ICIAS.2007.4658569. Article No. 4658569
Dhanashri, D., Dhonde, S.B.: Isolated word speech recognition system using deep neural networks. In: Satapathy, S., Bhateja, V., Joshi, A. (eds.) Proceedings of the International Conference on Data Engineering and Communication Technology. AISC, vol. 468, pp. 9–17. (2017). https://doi.org/10.1007/978-981-10-1675-2_2
Chapter Google Scholar
Ranjan, R., Dubey, R.K.: Isolated word recognition using HMM for Maithili dialect. In: 2016 International Conference on Signal Processing and Communication, ICSC 2016, pp. 323–327 (2016). https://doi.org/10.1109/ICSPCom.2016.7980600. Article No. 7980600
Frangoulis, E.: Isolated word recognition in noisy environment by vector quantization of the HMM and noise distributions. In: Proceedings of ICASSP 1991: 1991 International Conference on Acoustics, Speech, and Signal Processing, pp. 413–416 (1997). https://doi.org/10.1109/ICASSP.1991.150364
Zhao, L., Han, Z.: Speech recognition system based on integrating feature and HMM. In: 2010 International Conference on Measuring Technology and Mechatronics Automation, ICMTMA 2010, vol. 3, pp. 449–452 (2010). https://doi.org/10.1109/ICMTMA.2010.298. Article No. 5458876
Singhal, S., Dubey, R.K.: Automatic speech recognition for connected words using DTW/HMM for English/Hindi languages. In: International Conference Communication, Control and Intelligent Systems, CCIS 2015, pp. 199–203 (2016). https://doi.org/10.1109/CCIntelS.2015.7437908. Article No. 7437908
Chen, G., Parada, C., Heigold, G.: Small-footprint keyword spotting using deep neural networks. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, pp. 4087–4091 (2014). https://doi.org/10.1109/ICASSP.2014.6854370
Psutka, J.: Mluvíme s počítačem česky. Academia, Praha (2006)
Google Scholar
Partila, P., Voznak, M., Mikulec, M., Zdralek, J.: Fundamental frequency extraction method using central clipping and its importance for the classification of emotional state. Adv. Electr. Electron. Eng. 10(4), 270–275 (2012)
Google Scholar
Ibrahim, Y.A., Odiketa, J.C., Ibiyemi, T.S.: Preprocessing technique in automatic speech recognition for human computer interaction: an overview. Ann. Comput. Sci. Ser. 15(1), 186–191 (2017)
Google Scholar
Bou-Ghazale, S.E., Hansen, J.H.L.: A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE Trans. Speech Audio Process. 8(4), 429–442 (2000)
Article Google Scholar
Fazio, P., Tropea, M., Sottile, C., Lupia, A.: Vehicular networking and channel modeling: a new Markovian approach. In: 2015 12th Annual IEEE Consumer Communications and Networking Conference (CCNC), pp. 702–707 (2015)
Google Scholar
Tropea, M., Fazio, P., Veltri, F., Marano, S.: A new DVB-RCS satellite channel model based on Discrete Time Markov Chain and Quality Degree. In: 2013 IEEE Wireless Communications and Networking Conference (WCNC), pp. 2615–2619 (2013)
Google Scholar
Fazio, P., Tropea, M.: A new Markovian prediction scheme for resource reservations in wireless networks with mobile hosts. Adv. Electr. Electron. Eng. 10(4), 204–210 (2012)
Google Scholar
Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (2016). https://doi.org/10.1109/5.18626
Article Google Scholar
Cooke, M., Green, P., Josifovski, V., Vizinho, A.: Robust automatic speech recognition with missing and unreliable acoustic data. Speech Commun. 34(3), 267–285 (2001). https://doi.org/10.1016/S0167-6393(00)00034-0. Accessed 26 Feb 2020
Article MATH Google Scholar
Young, S., et al.: The HTK Book. HTK Version 3.4. B.m.: Cambridge University Engineering Department (2006)
Google Scholar
Zaccone, G., Karim, M.R., Mensha, A.: Deep Learning with TensorFlow. Packt Publishing, Birmingham (2017)
Google Scholar
Tropea, M., Fedele, G.: Classifiers comparison for Convolutional Neural Networks (CNNs) in image classification. In: 2019 IEEE/ACM 23rd International Symposium on Distributed Simulation and Real Time Applications (DS-RT), pp. 1–4 (2019). https://doi.org/10.1109/DS-RT47707.2019.8958662
Dorfler, M., Bammer, R., Grill, T.: Inside the spectrogram: Convolutional Neural Networks in audio processing. In: 2017 International Conference on Sampling Theory and Applications (SampTA), pp. 152–155 (2017). https://doi.org/10.1109/SAMPTA.2017.8024472
Gouda, S.K., et al.: Speech recognition: keyword spotting through image recognition. arXiv preprint arXiv:1803.03759 (2018)
Fu, S.-W., Hu, T.-Y., Tsao, Y., Lu, X.: Complex spectrogram enhancement by convolutional neural network with multi-metrics learning. In: IEEE International Workshop on Machine Learning for Signal Processing, MLSP, pp. 1–6 (2017). https://doi.org/10.1109/MLSP.2017.8168119
Abdel-Hamid, O., Mohamed, A.-R., Jiang, H., Deng, L., Penn, G., Yu, D.: Convolutional neural networks for speech recognition. IEEE Trans. Audio Speech Lang. Process. 22(10), 1533–1545 (2014). Article No. 2339736
Article Google Scholar
Warden, P.: Speech commands: a dataset for limited-vocabulary speech recognition. http://arxiv.org/abs/1804.03209. Accessed 09 Mar 2019

Download references

Acknowledgment

The research was supported by the Czech Ministry of Education, Youth and Sports from the Large Infrastructures for Research, Experimental Development and Innovations project reg. no. LM2015070 at the IT4Innovations - National Supercomputing Center, where a computational time was provided by the projects OPEN-19-38, and partly by the institutional grant SGS reg. no. SP2020/65 conducted at VSB - Technical University of Ostrava.

Author information

Authors and Affiliations

VSB–Technical University of Ostrava, 17. listopadu 2172/15, 708 00, Ostrava, Czechia
Martina Slívová, Pavol Partila, Jaromír Továrek & Miroslav Vozňák

Authors

Martina Slívová
View author publications
You can also search for this author in PubMed Google Scholar
Pavol Partila
View author publications
You can also search for this author in PubMed Google Scholar
Jaromír Továrek
View author publications
You can also search for this author in PubMed Google Scholar
Miroslav Vozňák
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Martina Slívová , Pavol Partila , Jaromír Továrek or Miroslav Vozňák .

Editor information

Editors and Affiliations

AGH University of Science and Technology, Kraków, Poland
Andrzej Dziech
Royal Military Academy, Brussels, Belgium
Wim Mees
Gdańsk University of Technology, Gdańsk, Poland
Andrzej Czyżewski

A Neural network diagram

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Slívová, M., Partila, P., Továrek, J., Vozňák, M. (2020). Isolated Word Automatic Speech Recognition System. In: Dziech, A., Mees, W., Czyżewski, A. (eds) Multimedia Communications, Services and Security. MCSS 2020. Communications in Computer and Information Science, vol 1284. Springer, Cham. https://doi.org/10.1007/978-3-030-59000-0_19

Download citation

DOI: https://doi.org/10.1007/978-3-030-59000-0_19
Published: 24 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-58999-8
Online ISBN: 978-3-030-59000-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Isolated Word Automatic Speech Recognition System

Abstract

Access this chapter

Similar content being viewed by others

An automatic speech recognition system for isolated Amazigh word using 1D & 2D CNN-LSTM architecture

Speech Recognition System of Spoken Isolated Digit in Standard Khasi Dialect

RETRACTED ARTICLE: Nonlinear acoustic noise cancellation based automatic speech recognition system (NANC-ASR) with convolutional neural networks

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

A Neural network diagram

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Isolated Word Automatic Speech Recognition System

Abstract

Access this chapter

Similar content being viewed by others

An automatic speech recognition system for isolated Amazigh word using 1D & 2D CNN-LSTM architecture

Speech Recognition System of Spoken Isolated Digit in Standard Khasi Dialect

RETRACTED ARTICLE: Nonlinear acoustic noise cancellation based automatic speech recognition system (NANC-ASR) with convolutional neural networks

References

Acknowledgment

Author information

Authors and Affiliations

Corresponding authors

Editor information

Editors and Affiliations

A Neural network diagram

A Neural network diagram

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation