Skip to main content

Isolated Word Automatic Speech Recognition System

  • Conference paper
  • First Online:
Multimedia Communications, Services and Security (MCSS 2020)

Abstract

The paper is devoted to an isolated word automatic speech recognition. The first part deals with a theoretical description of methods for speech signal processing and algorithms which can be used for automatic speech recognition such as a dynamic time warping, hidden Markov models and deep neural networks. The practical part is focused on the description of the proposal which is based on convolutional neural networks (CNN). The system was designed and implemented in Python using Keras and TensorFlow frameworks. An open audio dataset of spoken words was used for training and testing. A contribution of the paper lies in the specific proposal using CNN for automatic speech recognition and its validation. The presented results show that the proposed approach is able to achieve 94% accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Benzeghiba, M., et al.: Automatic speech recognition and speech variability: a review. Speech Commun. 49(10–11), 763–786 (2017)

    Google Scholar 

  2. Petkar, H.: A review of challenges in automatic speech recognition. Int. J. Comput. Appl. 151(3), 23–26 (2016)

    Google Scholar 

  3. Imtiaz, M.A., Raja, G.: Isolated word Automatic Speech Recognition (ASR) system using MFCC, DTW & KNN. In: 2016 Asia Pacific Conference on Multimedia and Broadcasting (APMediaCast), pp. 106–110 (2016). https://doi.org/10.1109/APMediaCast.2016.7878163

  4. Senthildevi, K.A., Chandra, E.: Keyword spotting system for Tamil isolated words using Multidimensional MFCC and DTW algorithm. In: 2015 International Conference on Communication and Signal Processing, ICCSP 2015, pp. 550–554 (2015). https://doi.org/10.1109/ICCSP.2015.7322545. Article No. 7322545

  5. Xu, L., Ke, M.: Research on isolated word recognition with DTW-based. In: ICCSE 2012 - Proceedings of 2012 7th International Conference on Computer Science and Education, pp. 139–141 (2012). https://doi.org/10.1109/ICCSE.2012.6295044. Article No. 6295044

  6. Abu Shariah, M.A.M., Ainon, R.N., Zainuddin, R., Khalifa, O.O.: Human computer interaction using isolated-words speech recognition technology. In: 2007 International Conference on Intelligent and Advanced Systems, ICIAS 2007, pp. 1173–1178 (2007). https://doi.org/10.1109/ICIAS.2007.4658569. Article No. 4658569

  7. Dhanashri, D., Dhonde, S.B.: Isolated word speech recognition system using deep neural networks. In: Satapathy, S., Bhateja, V., Joshi, A. (eds.) Proceedings of the International Conference on Data Engineering and Communication Technology. AISC, vol. 468, pp. 9–17. (2017). https://doi.org/10.1007/978-981-10-1675-2_2

    Chapter  Google Scholar 

  8. Ranjan, R., Dubey, R.K.: Isolated word recognition using HMM for Maithili dialect. In: 2016 International Conference on Signal Processing and Communication, ICSC 2016, pp. 323–327 (2016). https://doi.org/10.1109/ICSPCom.2016.7980600. Article No. 7980600

  9. Frangoulis, E.: Isolated word recognition in noisy environment by vector quantization of the HMM and noise distributions. In: Proceedings of ICASSP 1991: 1991 International Conference on Acoustics, Speech, and Signal Processing, pp. 413–416 (1997). https://doi.org/10.1109/ICASSP.1991.150364

  10. Zhao, L., Han, Z.: Speech recognition system based on integrating feature and HMM. In: 2010 International Conference on Measuring Technology and Mechatronics Automation, ICMTMA 2010, vol. 3, pp. 449–452 (2010). https://doi.org/10.1109/ICMTMA.2010.298. Article No. 5458876

  11. Singhal, S., Dubey, R.K.: Automatic speech recognition for connected words using DTW/HMM for English/Hindi languages. In: International Conference Communication, Control and Intelligent Systems, CCIS 2015, pp. 199–203 (2016). https://doi.org/10.1109/CCIntelS.2015.7437908. Article No. 7437908

  12. Chen, G., Parada, C., Heigold, G.: Small-footprint keyword spotting using deep neural networks. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, pp. 4087–4091 (2014). https://doi.org/10.1109/ICASSP.2014.6854370

  13. Psutka, J.: Mluvíme s počítačem česky. Academia, Praha (2006)

    Google Scholar 

  14. Partila, P., Voznak, M., Mikulec, M., Zdralek, J.: Fundamental frequency extraction method using central clipping and its importance for the classification of emotional state. Adv. Electr. Electron. Eng. 10(4), 270–275 (2012)

    Google Scholar 

  15. Ibrahim, Y.A., Odiketa, J.C., Ibiyemi, T.S.: Preprocessing technique in automatic speech recognition for human computer interaction: an overview. Ann. Comput. Sci. Ser. 15(1), 186–191 (2017)

    Google Scholar 

  16. Bou-Ghazale, S.E., Hansen, J.H.L.: A comparative study of traditional and newly proposed features for recognition of speech under stress. IEEE Trans. Speech Audio Process. 8(4), 429–442 (2000)

    Article  Google Scholar 

  17. Fazio, P., Tropea, M., Sottile, C., Lupia, A.: Vehicular networking and channel modeling: a new Markovian approach. In: 2015 12th Annual IEEE Consumer Communications and Networking Conference (CCNC), pp. 702–707 (2015)

    Google Scholar 

  18. Tropea, M., Fazio, P., Veltri, F., Marano, S.: A new DVB-RCS satellite channel model based on Discrete Time Markov Chain and Quality Degree. In: 2013 IEEE Wireless Communications and Networking Conference (WCNC), pp. 2615–2619 (2013)

    Google Scholar 

  19. Fazio, P., Tropea, M.: A new Markovian prediction scheme for resource reservations in wireless networks with mobile hosts. Adv. Electr. Electron. Eng. 10(4), 204–210 (2012)

    Google Scholar 

  20. Rabiner, L.R.: A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77(2), 257–286 (2016). https://doi.org/10.1109/5.18626

    Article  Google Scholar 

  21. Cooke, M., Green, P., Josifovski, V., Vizinho, A.: Robust automatic speech recognition with missing and unreliable acoustic data. Speech Commun. 34(3), 267–285 (2001). https://doi.org/10.1016/S0167-6393(00)00034-0. Accessed 26 Feb 2020

    Article  MATH  Google Scholar 

  22. Young, S., et al.: The HTK Book. HTK Version 3.4. B.m.: Cambridge University Engineering Department (2006)

    Google Scholar 

  23. Zaccone, G., Karim, M.R., Mensha, A.: Deep Learning with TensorFlow. Packt Publishing, Birmingham (2017)

    Google Scholar 

  24. Tropea, M., Fedele, G.: Classifiers comparison for Convolutional Neural Networks (CNNs) in image classification. In: 2019 IEEE/ACM 23rd International Symposium on Distributed Simulation and Real Time Applications (DS-RT), pp. 1–4 (2019). https://doi.org/10.1109/DS-RT47707.2019.8958662

  25. Dorfler, M., Bammer, R., Grill, T.: Inside the spectrogram: Convolutional Neural Networks in audio processing. In: 2017 International Conference on Sampling Theory and Applications (SampTA), pp. 152–155 (2017). https://doi.org/10.1109/SAMPTA.2017.8024472

  26. Gouda, S.K., et al.: Speech recognition: keyword spotting through image recognition. arXiv preprint arXiv:1803.03759 (2018)

  27. Fu, S.-W., Hu, T.-Y., Tsao, Y., Lu, X.: Complex spectrogram enhancement by convolutional neural network with multi-metrics learning. In: IEEE International Workshop on Machine Learning for Signal Processing, MLSP, pp. 1–6 (2017). https://doi.org/10.1109/MLSP.2017.8168119

  28. Abdel-Hamid, O., Mohamed, A.-R., Jiang, H., Deng, L., Penn, G., Yu, D.: Convolutional neural networks for speech recognition. IEEE Trans. Audio Speech Lang. Process. 22(10), 1533–1545 (2014). Article No. 2339736

    Article  Google Scholar 

  29. Warden, P.: Speech commands: a dataset for limited-vocabulary speech recognition. http://arxiv.org/abs/1804.03209. Accessed 09 Mar 2019

Download references

Acknowledgment

The research was supported by the Czech Ministry of Education, Youth and Sports from the Large Infrastructures for Research, Experimental Development and Innovations project reg. no. LM2015070 at the IT4Innovations - National Supercomputing Center, where a computational time was provided by the projects OPEN-19-38, and partly by the institutional grant SGS reg. no. SP2020/65 conducted at VSB - Technical University of Ostrava.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Martina Slívová , Pavol Partila , Jaromír Továrek or Miroslav Vozňák .

Editor information

Editors and Affiliations

A Neural network diagram

A Neural network diagram

figure a

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Slívová, M., Partila, P., Továrek, J., Vozňák, M. (2020). Isolated Word Automatic Speech Recognition System. In: Dziech, A., Mees, W., Czyżewski, A. (eds) Multimedia Communications, Services and Security. MCSS 2020. Communications in Computer and Information Science, vol 1284. Springer, Cham. https://doi.org/10.1007/978-3-030-59000-0_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-59000-0_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-58999-8

  • Online ISBN: 978-3-030-59000-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics