Intelligent stuttering speech recognition: A succinct review

Banerjee, Nilanjan; Borah, Samarjeet; Sethi, Nilambar

doi:10.1007/s11042-022-12817-z

Intelligent stuttering speech recognition: A succinct review

Published: 19 March 2022

Volume 81, pages 24145–24166, (2022)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

671 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Stuttering speech recognition is a well-studied concept in speech signal processing. Classification of speech disorder is the main focus of this study. Classification of stuttered speech is becoming more important with the enhancement of machine learning and deep learning. In this study, some of the recent and most influencing stuttering speech recognition methods are reviewed with a discussion on different categories of stuttering. The stuttering speech recognition process is divided mainly into four segments-input speech pre-emphasis, segmentation, feature extraction, and stutter classification. All these segments are briefly elaborated and related researches are discussed. It is observed that different traditional machine learning and deep learning classification approaches are employed to recognize stuttered speech in last few decades. A comprehensive analysis is presented on different feature extraction and classification method with their efficiency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Automatic speech recognition: a survey

Article 10 November 2020

Mishaim Malik, Muhammad Kamran Malik, … Imran Makhdoom

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Article 29 September 2022

Yogesh Kumar, Apeksha Koul & Chamkaur Singh

Speech Emotion Recognition: A Comprehensive Survey

Article 08 March 2023

Mohammed Jawad Al-Dujaili & Abbas Ebrahimi-Moghadam

References

Alam MJ, Kinnunen T, Kenny P, Ouellet P, O’Shaughnessy D (2013) Multitaper MFCC and PLP features for speaker verification using i-vectors. Speech Comm 55(2):237–251
Article Google Scholar
Alanazi F, Elhadad A, Hamad S, Ghareeb A (2019) Sensors data collection framework using mobile identification with secure data sharing model. Int J Electrical Comput Eng 9(5):4258
Google Scholar
Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. In KDD workshop (Vol. 10, no. 16, pp. 359-370).
Bhattacharya S, Das N, Sahu S, Mondal A, & Borah S. (2020). Deep classification of sound: A concise review. First doctoral symposium on natural computing research(DANCER-2020), Springer, India.
Boulmaiz A, Messadeg D, Doghmane N, Taleb-Ahmed A (2017) Design and implementation of a robust acoustic recognition system for waterbird species using TMS320C6713 DSK. Int J Ambient Comput Intell (IJACI) 8(1):98–118
Article Google Scholar
Buza O, Toderean G, Nica A, Caruntu A (2006) Voice signal processing for speech synthesis. In 2006 IEEE international conference on automation, quality and testing, robotics (Vol. 2, pp. 360-364). IEEE.
Chee LS, Ai OC, Yaacob S (2009) Overview of automatic stuttering recognition system. In proc. international conference on man-machine systems, no. October, Batu Ferringhi, Penang Malaysia (pp. 1-6).
Chee LS, Ai OC, Hariharan M, Yaacob S (2009) Automatic detection of prolongations and repetitions using LPCC. In 2009 international conference for technical postgraduates (TECHPOS) (pp. 1-4). IEEE.
Das N, Chakraborty S, Chaki J, Padhy N, Dey N (2020) Fundamentals, present and future perspectives of speech enhancement. IntJ Speech Technol. 1-19.
Dave N (2013) Feature extraction methods LPC, PLP and MFCC in speech recognition. Int J Advan Res Eng Technol 1(6):1–4
Google Scholar
Dey N (2019) Intelligent speech signal processing, 1st edn. Academic Press
Elhadad A, Hamad S, Khalifa A, Ghareeb A (2017) High capacity information hiding for privacy protection in digital video files. Neural Comput Applic 28(1):91–95
Article Google Scholar
Elhadad A, Ghareeb A, Abbas S (2021) A blind and high-capacity data hiding of DICOM medical images based on fuzzification concepts. Alexandria Eng J 60(2):2471–2482
Article Google Scholar
Fook CY, Muthusamy H, Chee LS, Yaacob SB, Adom AHB (2013) Comparison of speech parameterization techniques for the classification of speech disfluencies. Turkish J Electrical Eng Comput sci 21(sup. 1):1983–1994
Article Google Scholar
Geetha YV, Pratibha K, Ashok R, Ravindra SK (2000) Classification of childhood disfluencies using neural networks. J Fluen Disord 25(2):99–117
Article Google Scholar
Girish M, Anil R, Ahmed A, & Hithaish Kumar M (2017). Word repetition analysis in stuttered speech using MFCC and dynamic time warping. National Conference on Communication and Image Processing TJIT, Bangalore.
Gupta H, Gupta D (2016) LPC and LPCC method of feature extraction in speech recognition system. In 2016 6th international conference-cloud system and big data engineering (confluence) (pp. 498-502). IEEE.
Gupta S, Jaafar J, Ahmad WW, Bansal A (2013) Feature extraction using MFCC. Signal Image Process: Int J (SIPIJ) 4(4):101–108
Google Scholar
Hariharan M, Chee LS, Ai OC, Yaacob S (2012) Classification of speech dysfluencies using LPC based parameterization techniques. J Med Syst 36(3):1821–1830
Article Google Scholar
Hariharan M, Vijean V, Fook CY, Yaacob S (2012) Speech stuttering assessment using sample entropy and Least Square support vector machine. In 2012 IEEE 8th international colloquium on signal processing and its applications (pp. 240-245). IEEE.
Healey EC (2010) What the literature tells us about listeners' reactions to stuttering: implications for the clinical management of stuttering. Sem Speech Language 31, no. 04, pp. 227-235). © Thieme Medical Publishers.
Hermansky H (1990) Perceptual linear predictive (PLP) analysis of speech. J Acoust Soc Am 87(4):1738–1752
Article Google Scholar
Hidayat R, Bejo A, Sumaryono S, Winursito A (2018) Denoising speech for MFCC feature extraction using wavelet transformation in speech recognition system. In 2018 10th international conference on information technology and electrical engineering (ICITEE) (pp. 280-284). IEEE.
Hossan MA, Memon S, Gregory MA (2010) A novel approach for MFCC feature extraction. In 2010 4th international conference on signal processing and communication systems (pp. 1-5). IEEE.
Hosseini R, Walsh B, Tian F, Wang S (2018) An fNIRS-based feature learning and classification framework to distinguish hemodynamic patterns in children who stutter. IEEE Trans Neural Syst Rehabil Eng 26(6):1254–1263
Article Google Scholar
Howell P, Sackin S (1995) Automatic recognition of repetitions and prolongations in stuttered speech. In proceedings of the first world congress on fluency disorders (Vol. 2, pp. 372-374). Nijmegen, the Netherlands: university press Nijmegen.
Howell P, Sackin S, Glenn K (1997) Development of a two-stage procedure for the automatic recognition of dysfluencies in the speech of children who stutter: II. ANN recognition of repetitions and prolongations with supplied word segment markers. J Speech, Language, Hearing Res 40(5):1085–1096
Article Google Scholar
Howell P, Davis S, Bartrip J, Wormald L (2004) Effectiveness of frequency shifted feedback at reducing disfluency for linguistically easy, and difficult, sections of speech (original audio recordings included). Stammer Res: On-Line J Publish Brit Stamm Assoc 1(3):309
Google Scholar
Jain AK, Mao J, Mohiuddin KM (1996) Artificial neural networks: A tutorial. Computer 29(3):31–44
Article Google Scholar
Khalil OH, Elhadad A, Ghareeb A (2020) A blind proposed 3D mesh watermarking technique for copyright protection. Imaging Sci J 68(2):90–99
Article Google Scholar
Khan N (2015) The effect of stuttering on speech and learning process, A case study. Int J Stud English Language Literature (IJSELL) 3(4):89–103
Google Scholar
Km RK, Ganesan S (2011) Comparison of multidimensional MFCC feature vectors for objective assessment of stuttered disfluencies. Int J Adv Netw Appl 2(05):854–860
Google Scholar
KN VN, Meharunnisa SP (2016) Detection and analysis of stuttered speech. Int J Adv Res Electronics Comm Eng (IJARECE) 5(4):2278–909X
Google Scholar
Kourkounakis T, Hajavi A & Etemad A (2020). FluentNet: end-to-end detection of speech disfluency with deep learning. arXiv preprint arXiv:2009.11394.
Kumar P, Biswas A, Mishra AN, Chandra M (2010) Spoken language identification using hybrid feature extraction methods. arXiv preprint arXiv:1003.5623.
Li Q, Huang Y (2010) An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions. IEEE Trans Audio Speech Lang Process 19(6):1791–1801
Article Google Scholar
Likitha MS, Gupta SRR, Hasitha K, Raju AU (2017) Speech based human emotion recognition using MFCC. In 2017 international conference on wireless communications, signal processing and networking (WiSPNET) (pp. 2257-2260). IEEE.
Maas AL, Qi P, Xie Z, Hannun AY, Lengerich CT, Jurafsky D, Ng AY (2017) Building DNN acoustic models for large vocabulary speech recognition. Comput Speech Lang 41:195–213
Article Google Scholar
Mahesha P, Vinod DS (2013) Classification of speech dysfluencies using speech parameterization techniques and multiclass SVM. In international conference on heterogeneous networking for quality, reliability, security and robustness (pp. 298-308). Springer, Berlin, Heidelberg.
Mahesha P, Vinod DS (2015) Combining cepstral and prosodic features for classification of disfluencies in stuttered speech. In intelligent computing, communication and devices (pp. 623–633). Springer, New Delhi
Manjula G, Kumar S (2016) Overview of Analysis and Classification of Stuttered Speech Proceed 11th IRF Int Conf
Manjula G, Kumar MS, Geetha YV, Kasar T (2017) Identification and validation of repetitions/prolongations in stuttering speech using epoch features. Int J Appl Eng Res 12(22):11976–11980
Google Scholar
Manjula G, Shivakumar M, Geetha YV (2019) Adaptive optimization based neural network for classification of stuttered speech. In Proceedings of the 3rd international Conference on Cryptography, Security and Privacy (pp. 93-98).
Meenakshi M (2020) Machine learning algorithms and their real-life applications: A survey. Available at SSRN 3595299
Mirri S, Delnevo G, Roccetti M (2020) Is a COVID-19 second wave possible in Emilia-Romagna (Italy)? Forecasting a future outbreak with particulate pollution and machine learning. Computation 8(3):74
Article Google Scholar
Mohan BJ (2014) Speech recognition using MFCC and DTW. In 2014 international conference on advances in electrical engineering (ICAEE) (pp. 1-4). IEEE.
Nöth E, Niemann H, Haderlein T, Decher M, Eysholdt U, Rosanowski F, Wittenberg T (2000) Automatic stuttering recognition using hidden Markov models In Sixth International Conference on Spoken Language Processing
Oue S, Marxer R, Rudzicz F (2015) Automatic dysfluency detection in dysarthric speech using deep belief networks. In proceedings of SLPAT 2015: 6th workshop on speech and language processing for assistive technologies (pp. 60-64).
Pálfy J, Pospíchal J (2011) Recognition of repetitions using support vector machines. In signal processing algorithms, architectures, arrangements, and applications SPA 2011 (pp. 1-6). IEEE.
Pinelli P (1992) Neurophysiology in the science of speech. Curr Opinion Neurol Neurosurg 5(5):744–755
Google Scholar
Prakash CO, Sai YP, Kumar VN (2018) Design and implementation of silent pause stuttered speech recognition system
Qi F, Bao C, Liu Y (2004, December) A novel two-step SVM classifier for voiced/unvoiced/silence classification of speech. In 2004 international symposium on Chinese spoken language processing (pp. 77-80). IEEE.
Raghavendra M, Rajeswari P (2016) Determination of disfluencies associated in stuttered speech using MFCC feature extraction. Comput. Speech Lang, IJEDR 4(2):2321–9939
Google Scholar
Ramteke PB, Koolagudi SG, Afroz F (2016). Repetition detection in stuttered speech. In Proceedings of 3rd international conference on advanced computing, networking and informatics (pp. 611–617). Springer, New Delhi
Ravikumar KM, Reddy B, Rajagopal R, Nagaraj H (2008) Automatic detection of syllable repetition in read speech for objective assessment of stuttered disfluencies. Proceed World Acad Sci, Eng Technol 36:270–273
Google Scholar
Ravikumar KM, Rajagopal R, Nagaraj HC (2009) An approach for objective assessment of stuttered speech using MFCC features. ICGST Int J Digital Signal Process, DSP 9(1):19–24
Revada LKV, Rambatla VK, Ande KVN (2011) A novel approach to speech recognition by using generalized regression neural networks. Int J Comput Sci Issues (IJCSI) 8(2):484
Google Scholar
Savin PS, Ramteke PB & Koolagudi SG (2016). Recognition of repetition and prolongation in stuttered speech using ANN. In proceedings of 3rd international conference on advanced computing, networking and informatics (pp. 65–71). Springer, New Delhi
Sen S, Dutta A, Dey N (2019) Audio processing and speech recognition: concepts. Springer, Techniques and Research Overviews
Sen S, Dutta A, Dey N (2019) Speech processing and recognition system. Audio Processing and Speech Recognition. Springer Briefs in Applied Sciences and Technology. Springer, Singapore
Sharma U, Maheshkar S, Mishra AN (2015) Study of robust feature extraction techniques for speech recognition system. In 2015 international conference on futuristic trends on computational analysis and knowledge management (ABLAZE) (pp. 654-658). IEEE.
Shirvan RA, Tahami E (2011) Voice analysis for detecting Parkinson's disease using genetic algorithm and KNN classification method. In 2011 18th Iranian conference of biomedical engineering (ICBME) (pp. 278-283). IEEE.
Subasi A, Gursoy MI (2010) EEG signal classification using PCA, ICA, LDA and support vector machines. Expert Syst Appl 37(12):8659–8666
Article Google Scholar
Suguna N, Thanushkodi K (2010) An improved k-nearest neighbor classification using genetic algorithm. Int J Comp Sci 7(2):18–21
Google Scholar
Surya AA, Varghese SM (2016) Automatic speech recognition system for stuttering disabled persons. Int J Control Theory Appl 9(43):16–20
Google Scholar
Świetlicka I, Kuniszyk-Jóźkowiak W, & Smołka E (2009). Artificial neural networks in the disabled speech analysis. In computer recognition systems 3 (pp. 347–354). Springer, Berlin, Heidelberg
Szczurowska I, Kuniszyk-Jóźkowiak W, Smołka E (2014) The application of Kohonen and multilayer perceptron networks in the speech nonfluency analysis. Arch Acoust 31(4 (S)):205–210
Google Scholar
Tan TS, Ariff AK, Ting CM, Salleh SH (2007) Application of Malay speech technology in Malay speech therapy assistance tools. In 2007 International Conference on Intelligent and Advanced Systems (pp. 330-334). IEEE.
UCLASS DATABASE, URL:https://www.uclass.psychol.ucl.ac.uk/ [ last access date: 01/01/2021]
Wahyuni ES (2017) Arabic speech recognition using MFCC feature extraction and ANN classification. In 2017 2nd international conferences on information technology, information systems and electrical engineering (ICITISEE) (pp. 22-25). IEEE.
Wiśniewski M, Kuniszyk-Jóźkowiak W, Smołka E, Suszyński W (2007) Automatic detection of prolonged fricative phonemes with the hidden Markov models approach. J Med Inform Technol:11
Wiśniewski, M., Kuniszyk-Jóźkowiak, W., Smołka, E., & Suszyński, W. (2007). Automatic detection of disorders in a continuous speech with the hidden Markov models approach. In computer recognition systems 2 (pp. 445–453). Springer, Berlin, Heidelberg
Xie L, Liu ZQ (2006) A comparative study of audio features for audio-to-visual conversion in mpeg-4 compliant facial animation. In 2006 international conference on machine Learni ng and cybernetics (pp. 4359-4364). IEEE.
Yairi E (2007) Subtyping stuttering I: A review. J Fluen Disord 32(3):165–196
Article Google Scholar
Yuhas BP, Goldstein MH, Sejnowski TJ, Jenkins RE (1990) Neural network models of sensory integration for improved vowel recognition. Proc IEEE 78(10):1658–1668
Article Google Scholar
Zhang JM, Harman M, Ma L, Liu Y (2020) Machine learning testing: survey, landscapes and horizons. IEEE Trans Softw Eng

Download references

Funding

There is no funding for this research work.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, GIET University, Odisha, India
Nilanjan Banerjee & Nilambar Sethi
Department of Computer Applications, Sikkim Manipal Institute of Technology, Sikkim Manipal University, Sikkim, India
Samarjeet Borah

Authors

Nilanjan Banerjee
View author publications
You can also search for this author in PubMed Google Scholar
Samarjeet Borah
View author publications
You can also search for this author in PubMed Google Scholar
Nilambar Sethi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Samarjeet Borah.

Ethics declarations

Conflicts of interests/competing interests

The authors want to declare that, there are no conflicts of interests / competing interests in this research work.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Banerjee, N., Borah, S. & Sethi, N. Intelligent stuttering speech recognition: A succinct review. Multimed Tools Appl 81, 24145–24166 (2022). https://doi.org/10.1007/s11042-022-12817-z

Download citation

Received: 11 February 2021
Revised: 21 February 2022
Accepted: 09 March 2022
Published: 19 March 2022
Issue Date: July 2022
DOI: https://doi.org/10.1007/s11042-022-12817-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Intelligent stuttering speech recognition: A succinct review

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Speech Emotion Recognition: A Comprehensive Survey

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interests/competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Intelligent stuttering speech recognition: A succinct review

Abstract

Access this article

Similar content being viewed by others

Automatic speech recognition: a survey

A deep learning approaches in text-to-speech system: a systematic review and recent research perspective

Speech Emotion Recognition: A Comprehensive Survey

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interests/competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation