Skip to main content

Advertisement

Log in

A Review of Feature Extraction and Classification Techniques in Speech Recognition

  • Review Article
  • Published:
SN Computer Science Aims and scope Submit manuscript

Abstract

Speech is one of the most fundamental and essential forms of human communication. Humans and computers interact through what is known as a human–computer interface. Speech can be used to communicate with the computer. Speech recognition is utilised not only in mobile devices, but also in embedded systems, modern desktop and laptop computers, operating systems, and browsers. This is beneficial to children, the senior citizens, and people who are blind or have impaired eyesight. This is especially important for physically handicapped individuals who rely only on this medium to interact with computer systems. The field of voice recognition research is becoming more inventive. Researchers are trying to expand the ways in which computers can use human speech. This review article aims to classify methods for translating human speech into a format that computers can understand. Challenges of the current most popular speech recognition systems are analysed and solutions are presented. This review paper is intended to provide a summary for the researchers who are working in speech recognition. Both feature extraction and classification are critical components of a speech recognition system. The focus of this study is to present a review of the literature on feature extraction and classification strategies for speech recognition systems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Ayushi Y, Vadwala Krina A, Suthar Yesha A, Karmakar Nirali P. Intelligent Android Voice Assistant - A Future Requisite. International Journal of Engineering Development and Research. 2017;5(3):337–9.

    Google Scholar 

  2. Radha V, Vimala C. A review on speech recognition challenges and approaches. doaj org. 2012;2(1):1–7.

    Google Scholar 

  3. Bhabad SS, Kharate GK . An Overview of Technical Progress in Speech Recognition. International Journal of advanced research in computer science and software Engineering, 3 2013; 3.

  4. Gaikwad SK, Gawali BW, Yannawar P. A review on speech recognition technique. International Journal of Computer Applications. 2010;10(3):16–24.

    Article  Google Scholar 

  5. Kalamani M, Valamrthy S, Mohan R, Anitha S . A review on clustering techniques in continuous speech recognition. 2014;

  6. Sak H, Senior A, Rao K, Beaufays F, Schalkwyk J. Google voice search: faster and more accurate. Google Research blog. 2015; https://research.googleblog.com/2015/09/google-voicesearch-faster-and-more.html

  7. Radha V, Vimala C. A review on speech recognition challenges and approaches. 2012;2(1):1–7.

    Google Scholar 

  8. Vadwala AY, Suthar KA, Karmakar YA, Pandya N. Survey paper on different speech recognition algorithms: challenges and techniques. Int J Comput Appl. 2017;175(1):31–6.

    Google Scholar 

  9. Hemakumar G, Punitha P. Speech recognition technology: a survey on Indian languages. Int J Inf Sci Intell Syst. 2013;2(4):1–38.

    Google Scholar 

  10. Davis KH, Biddulph R, Balashek S. Automatic recognition of spoken digits. J Acoust Soc Am. 1952;24(6):637–42.

    Article  Google Scholar 

  11. Forgie JW, Forgie CD. Results obtained from a vowel recognition computer program. J Acoust Soc Am. 1959;31(11):1480–9.

    Article  Google Scholar 

  12. Velichko VM, Zagoruyko NG. Automatic recognition of 200 words. Int J Man Mach Stud. 1970;2(3):223–34.

    Article  Google Scholar 

  13. Sakoe H, Chiba S. Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust Speech Signal Process. 1978;26(1):43–9.

    Article  MATH  Google Scholar 

  14. Huang X, Baker J, Reddy R. A historical perspective of speech recognition. Commun ACM. 2014;57(1):94–103.

    Article  Google Scholar 

  15. Juang BH, Rabiner LR. Automatic speech recognition-a brief history of technology development. Georgia Institute of Technology: Atlanta Rutgers University and the University of California; 2005.

    Google Scholar 

  16. Bourlard HA, Morgan N.. Connectionist speech recognition: a hybrid approach (Vol. 247). Springer Science and Business Media. 2012

  17. Panayotov V, Chen G, Povey D, Khudanpur S. Librispeech: An ASR corpus based on public domain audio books, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, QLD, Australia, 2015; pp. 5206-5210, https://doi.org/10.1109/ICASSP.2015.7178964.

  18. https://paperswithcode.com/dataset/2000-hub5-english

  19. Garofolo JS 1993Timit acoustic phonetic continuous speech corpus. Linguistic Data Consortium, 1993.

  20. Garcia-Romero D, Snyder D, Watanabe S, Sell G, McCree A, Povey D, Khudanpur S, Speaker Recognition Benchmark Using the CHiME-5 Corpus. In Interspeech 2019; (pp. 1506-1510).

  21. Hernandez F, Nguyen V, Ghannay S, Tomashenko N, Esteve Y. TED-LIUM 3: Twice as much data and corpus repartition for experiments on speaker adaptation. In Speech and Computer: 20th International Conference, SPECOM 2018, Leipzig, Germany, pp. 198-208. Springer International Publishing. 2018;

  22. Demirsahin I, Kjartansson O, Gutkin A, Rivera C. . Open-source multi-speaker corpora of the english accents in the british isles. In Proceedings of the Twelfth Language Resources and Evaluation Conference 2020; (pp. 6532-6541).

  23. Mubarak H, Hussein A, Chowdhury SA, Ali A. QASR: QCRI Aljazeera Speech Resource–A Large Scale Annotated Arabic Speech Corpus. 2021 arXiv preprint arXiv:2106.13000.

  24. Keshri A, Singh A, Kumar B, Pratap D, Chauhan A. Automatic detection and classification of human emotion in real-time scenario. Journal of IoT in Social, Mobile, Analytics, and Cloud. 2022;4(1):41–53.

    Google Scholar 

  25. Aslan M. CNN based efficient approach for emotion recognition. Journal of King Saud University-Computer and Information Sciences. 2022;34(9):7335–46.

    Article  Google Scholar 

  26. Liliana DY . Emotion recognition from facial expression using deep convolutional neural network. In Journal of physics: conference series (Vol. 1193, No. 1, p. 012004). IOP Publishing. 2019

  27. Baby B, Jojy C . Live Video Emotion Detection Using Convolutional Neural Network. 2022;

  28. Rahman MM, Sarkar AK, Hossain MA, Hossain MS, Islam MR, Hossain MB, Quinn JM, Moni MA. Recognition of human emotions using EEG signals: A review. Comput Biol Med. 2021;136: 104696.

    Article  Google Scholar 

  29. Ullah H, Uzair M, Mahmood A, Ullah M, Khan SD, Cheikh FA. Internal emotion classification using EEG signal with sparse discriminative ensemble. IEEE Access. 2019;7:40144–53.

    Article  Google Scholar 

  30. Hasan M, Rundensteiner E, Agu E. Automatic emotion detection in text streams by analyzing twitter data. International Journal of Data Science and Analytics. 2019;7:35–51.

    Article  Google Scholar 

  31. Nandwani P, Verma R. A review on sentiment analysis and emotion detection from text. Soc Netw Anal Min. 2021;11(1):81.

    Article  Google Scholar 

  32. Kerkeni L, Serrestou Y, Mbarki M, Raoof K, Mahjoub MA, Cleder C. Automatic speech emotion recognition using machine learning. IntechOpen: Social Media and Machine Learning; 2019.

    Google Scholar 

  33. Garcia AAT, Garcia CAR, Villasenor-Pineda L, Mendoza-Montoya O , eds. Biosignal Processing and Classification Using Computational Learning and Intelligence: Principles, Algorithms, and Applications. Academic Press. 2021;

  34. Rabiner L, Juang B H. Fundamental of speech recognition prentice-hall international.1993;

  35. Köhn A, Stegen F, Baumann T.. Mining the spoken wikipedia for speech data and beyond. In proceedings of the tenth international conference on language resources and evaluation (LREC’16) (pp. 4644-4647). 2016

  36. Veaux C, Yamagishi J, MacDonald K. . Superseded-cstr vctk corpus: English multi-speaker corpus for cstr voice cloning toolkit. 2016;

  37. Bu H, Du J, Na X, Wu B, Zheng H. Aishell-1: an open-source mandarin speech corpus and a speech recognition baseline. In 2017 20th conference of the oriental chapter of the international coordinating committee on speech databases and speech I/O systems and assessment (O-COCOSDA) (pp. 1-5). IEEE. 2017;

  38. Malekzadeh S, Gholizadeh MH, Razavi SN . Persian vowel recognition with MFCC and ANN on PCVC speech dataset. arXiv preprint arXiv:1812.06953. 2018;

  39. Halabi N. Modern standard arabic phonetics for speech synthesis (Doctoral dissertation, University of Southampton). 2016;

  40. Kaur P, Singh P, Garg V. Speech recognition system; challenges and techniques. International Journal of Computer Science and Information Technologies. 2012;3(3):3989–92.

    Google Scholar 

  41. Forsberg M. Why is speech recognition difficult? Chalmers University of Technology. 2003;

  42. Vimala C, Radha VA. Review on Speech Recognition Challenges and Approaches. World of Computer Science and Information Technology Journal (WCSIT); 2012;2(1):1–7 (2221-0741).

    Google Scholar 

  43. Yegnanarayana B, Veldhuis R N. Extraction of vocal-tract system characteristics from speech signals. IEEE. 1998;

  44. O’Shaughnessy D. Interacting with computers by voice: automatic speech recognition and synthesis. IEEE. 2003;

  45. Saha G, Chakroborty S, Senapati S. A new silence removal and endpoint detection algorithm for speech and speaker recognition applications. 2005;

  46. Alkhaldi W, Fakhr W, Hamdy N. Automatic speech/speaker recognition in noisy environments using wavelet transform, The 2002 45th Midwest Symposium on Circuits and Systems, 2002. 2002;

  47. Krishnan VV, Anto PB. Features of wavelet packet decomposition and discrete wavelet transform for malayalam speech recognition. 2009;

  48. Zamani B, Akbari A, Nasersharif B, Jalalvand A. Optimized discriminative transformations for speech features based on minimum classification error. 2011.

  49. Davis KH, Biddulph R, Balashek S Automatic recognition of spoken digits. 1952;

  50. Akanbi O A, Amiri I S, Fazeldehkordi E. Chapter 4- Feature Extraction,A Machine-Learning Approach to Phishing Detection and Defense. 2015;

  51. Lee J Y, Hung J W. Exploiting principal component analysis in modulation spectrum enhancement for robust speech recognition. In 2011 eighth international conference on fuzzy systems and knowledge discovery (FSKD). 2011;

  52. Hai J, Joo EM. Improved linear predictive coding method for speech recognition. In the fourth international conference on information, communications and signal processing, 2003 and the fourth Pacific rim conference on multimedia. Proceedings of the 2003 joint (IEEE). 2003;

  53. Korba MCA, Message D, Djemili R, Bourouba H. Robust speech recognition using perceptual wavelet denoising and mel-frequency product spectrum cepstral coefficient features. Informatica. 2008;

  54. Nouza J, Zdansky J, Cerva P. System for automatic collection, annotation and indexing of Czech broadcast speech with full-text search. In MELECON 2010 15th IEEE Mediterranean Electrotechnical Conference. 2010;

  55. Liu X. A new wavelet threshold denoising algorithm in speech recognition. In 2009 Asia-Pacific conference on information processing (IEEE). 2009;

  56. Molau S, Pitz M, Schluter R, Ney H. Computing mel-frequency cepstral coefficients on the power spectrum. In 2001 IEEE international conference on acoustics, speech, and signal processing. Proceedings (IEEE). 2001;

  57. Shanthi TS, Lingam C. Review of feature extraction techniques in automatic speech recognition. Int J Sci Eng Technol. 2013;

  58. Nehe NS, Holambe RS. DWT and LPC based feature extraction methods for isolated word recognition. EURASIP J Audio Speech Music Process. 2012;

  59. T. F. Li, S. C. Chang (2007). Speech recognition of mandarin syllables using both linear predict coding cepstra and Mel frequency cepstra. In ROCLING 2007.

  60. Kesarkar MP. Feature extraction for speech recognition. Electronic systems, EE: Dept., IIT Bombay; 2003.

    Google Scholar 

  61. Hermansky H, Morgan N, Bayya A, Kohn P. RASTA-PLP speech analysis. In Proc. IEEE Int’l Conf. Acoustics, speech and signal processing (Vol. 1, pp. 121-124). 1991;

  62. Chen CP, Bilmes J, Ellis DP. Speech feature smoothing for robust ASR. In proceedings of (ICASSP’05) IEEE international conference on acoustics, speech, and signal processing, 2005. 2005; (Vol. 1, pp. I-525). IEEE.

  63. Wang Y, Han K, Wang D. Exploring monaural features for classification-based speech segregation. IEEE Trans Audio Speech Lang Process. 2012;21(2):270–9.

    Article  Google Scholar 

  64. Meyer Y. Wavelets: Algorithms and Applications, SIAM, Philadelphia,1993; 1993. MR 95f, 94005.

  65. Ping Z, Li-Zhen T, Dong-Feng X. Speech recognition algorithm of parallel subband HMM based on wavelet analysis and neural network. Inf Technol J. 2009;8(5):796–800.

    Article  Google Scholar 

  66. Ibe OC. 14 - Hidden Markov Models, Editor(s): Oliver C. Ibe, Markov Processes for Stochastic Modeling (Second Edition), Elsevier, 2013; 417-451.

  67. Birkenes O, Matsui T, Tanabe K, Siniscalchi SM, Myrvoll TA, Johnsen MH. Penalized logistic regression with HMM log-likelihood regressors for speech recognition. IEEE Trans Audio Speech Lang Process. 2009;18(6):1440–54.

    Article  Google Scholar 

  68. Daniel Jurafsky,James H. Marti. Speech and Language Processing: An Introduction to Natural Language Processing , Computational Linguistics , and Speech Recognition. Pearson. 2009;

  69. Sak H, Senior A, Rao K, Beaufays F. Fast and accurate recurrent neural network acoustic models for speech recognition. 2015; arXiv preprint arXiv:1507.06947.

  70. Maladkar K. Types of Artificial Neural Networks Currently Being Used in Machine Learning. Analytics india magazine (6).

  71. The Scientist and Engineer’s Guide to Digital Signal Processing By Steven W. Smith, Ph.D., Chapter 26: Neural Networks (and more!)

  72. Article Neural Network: Architecture. Upgrad: Components & Top Algorithms; 2020.

  73. Hardesty L. Neural networks Ballyhooed artificial-intelligence technique known as “deep learning” revives 70-year-old idea. MIT news.

  74. Thubthong N, Kijsirikul B. Support vector machines for Thai phoneme recognition. Int J Uncertainty Fuzziness Knowledge Based Syst. 2001;9(06):803–13.

    Article  MATH  Google Scholar 

  75. Baum LE, Eagon JA. An inequality with applications to statistical estimation for probabilistic functions of Markov processes and to a model for ecology. Bull Am Math Soc. 1967;73(3):360–3.

    Article  MathSciNet  MATH  Google Scholar 

  76. Trentin E, Gori M. Robust combination of neural networks and hidden Markov models for speech recognition. IEEE Trans Neural Netw. 2003;14(6):1519–31.

    Article  Google Scholar 

  77. Atmaja B T, Akagi M. Deep multilayer Perceptrons for dimensional speech emotion recognition. 2020;arXiv preprint arXiv: 2004.02355.

  78. Morgan N, Bourlard H.. Continuous speech recognition using multilayer perceptrons with hidden Markov models. In international conference on acoustics, speech, and signal processing 1990; (pp. 413-416). IEEE.

  79. Article, A Comprehensive Guide to Convolutional Neural Networks- the ELI5 way, Sumit Saha, towards data science

  80. M. Venkatachalam (2019). Recurrent Neural Networks Remembering what’s Important. towards data science.

  81. Yadav S, Yaduvanshi A, Shekhar S, Bansal L, Meena P, Kumar A. Book Chapter: An Intelligent Interview bot for candidate assessment by using facial expression recognition and speech recognition System. Computational Statistical Computational Statistical Methodologies and Modeling for Artificial Intelligence (ISBN: 9781032170800). 2023;

  82. Li P. An Artificial Intelligence Conversational Chatbot Developed for Non-Native English Speakers. Highlights in Science, Engineering and Technology. 2022;1:97–100.

    Article  Google Scholar 

  83. Article about unknown languages. https://www.smithsonianmag.com/smart-news/unknown-language-discovered-malaysia-180968099/ (last accessed: Feb 24, 2023)

  84. Sharma G, Umapathy K, Krishnan S. Trends in audio signal feature extraction methods. Appl Acoust. 2020;158: 107020.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sonal Yadav.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest regarding this work.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Recent Trends on Agents and Artificial Intelligence” guest edited by Jaap van den Herik, Ana Paula Rocha and Luc Steels.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yadav, S., Kumar, A., Yaduvanshi, A. et al. A Review of Feature Extraction and Classification Techniques in Speech Recognition. SN COMPUT. SCI. 4, 777 (2023). https://doi.org/10.1007/s42979-023-02158-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42979-023-02158-5

Keywords

Navigation