Abstract
Stuttering speech recognition is a well-studied concept in speech signal processing. Classification of speech disorder is the main focus of this study. Classification of stuttered speech is becoming more important with the enhancement of machine learning and deep learning. In this study, some of the recent and most influencing stuttering speech recognition methods are reviewed with a discussion on different categories of stuttering. The stuttering speech recognition process is divided mainly into four segments-input speech pre-emphasis, segmentation, feature extraction, and stutter classification. All these segments are briefly elaborated and related researches are discussed. It is observed that different traditional machine learning and deep learning classification approaches are employed to recognize stuttered speech in last few decades. A comprehensive analysis is presented on different feature extraction and classification method with their efficiency.
Similar content being viewed by others
References
Alam MJ, Kinnunen T, Kenny P, Ouellet P, O’Shaughnessy D (2013) Multitaper MFCC and PLP features for speaker verification using i-vectors. Speech Comm 55(2):237–251
Alanazi F, Elhadad A, Hamad S, Ghareeb A (2019) Sensors data collection framework using mobile identification with secure data sharing model. Int J Electrical Comput Eng 9(5):4258
Berndt DJ, Clifford J (1994) Using dynamic time warping to find patterns in time series. In KDD workshop (Vol. 10, no. 16, pp. 359-370).
Bhattacharya S, Das N, Sahu S, Mondal A, & Borah S. (2020). Deep classification of sound: A concise review. First doctoral symposium on natural computing research(DANCER-2020), Springer, India.
Boulmaiz A, Messadeg D, Doghmane N, Taleb-Ahmed A (2017) Design and implementation of a robust acoustic recognition system for waterbird species using TMS320C6713 DSK. Int J Ambient Comput Intell (IJACI) 8(1):98–118
Buza O, Toderean G, Nica A, Caruntu A (2006) Voice signal processing for speech synthesis. In 2006 IEEE international conference on automation, quality and testing, robotics (Vol. 2, pp. 360-364). IEEE.
Chee LS, Ai OC, Yaacob S (2009) Overview of automatic stuttering recognition system. In proc. international conference on man-machine systems, no. October, Batu Ferringhi, Penang Malaysia (pp. 1-6).
Chee LS, Ai OC, Hariharan M, Yaacob S (2009) Automatic detection of prolongations and repetitions using LPCC. In 2009 international conference for technical postgraduates (TECHPOS) (pp. 1-4). IEEE.
Das N, Chakraborty S, Chaki J, Padhy N, Dey N (2020) Fundamentals, present and future perspectives of speech enhancement. IntJ Speech Technol. 1-19.
Dave N (2013) Feature extraction methods LPC, PLP and MFCC in speech recognition. Int J Advan Res Eng Technol 1(6):1–4
Dey N (2019) Intelligent speech signal processing, 1st edn. Academic Press
Elhadad A, Hamad S, Khalifa A, Ghareeb A (2017) High capacity information hiding for privacy protection in digital video files. Neural Comput Applic 28(1):91–95
Elhadad A, Ghareeb A, Abbas S (2021) A blind and high-capacity data hiding of DICOM medical images based on fuzzification concepts. Alexandria Eng J 60(2):2471–2482
Fook CY, Muthusamy H, Chee LS, Yaacob SB, Adom AHB (2013) Comparison of speech parameterization techniques for the classification of speech disfluencies. Turkish J Electrical Eng Comput sci 21(sup. 1):1983–1994
Geetha YV, Pratibha K, Ashok R, Ravindra SK (2000) Classification of childhood disfluencies using neural networks. J Fluen Disord 25(2):99–117
Girish M, Anil R, Ahmed A, & Hithaish Kumar M (2017). Word repetition analysis in stuttered speech using MFCC and dynamic time warping. National Conference on Communication and Image Processing TJIT, Bangalore.
Gupta H, Gupta D (2016) LPC and LPCC method of feature extraction in speech recognition system. In 2016 6th international conference-cloud system and big data engineering (confluence) (pp. 498-502). IEEE.
Gupta S, Jaafar J, Ahmad WW, Bansal A (2013) Feature extraction using MFCC. Signal Image Process: Int J (SIPIJ) 4(4):101–108
Hariharan M, Chee LS, Ai OC, Yaacob S (2012) Classification of speech dysfluencies using LPC based parameterization techniques. J Med Syst 36(3):1821–1830
Hariharan M, Vijean V, Fook CY, Yaacob S (2012) Speech stuttering assessment using sample entropy and Least Square support vector machine. In 2012 IEEE 8th international colloquium on signal processing and its applications (pp. 240-245). IEEE.
Healey EC (2010) What the literature tells us about listeners' reactions to stuttering: implications for the clinical management of stuttering. Sem Speech Language 31, no. 04, pp. 227-235). © Thieme Medical Publishers.
Hermansky H (1990) Perceptual linear predictive (PLP) analysis of speech. J Acoust Soc Am 87(4):1738–1752
Hidayat R, Bejo A, Sumaryono S, Winursito A (2018) Denoising speech for MFCC feature extraction using wavelet transformation in speech recognition system. In 2018 10th international conference on information technology and electrical engineering (ICITEE) (pp. 280-284). IEEE.
Hossan MA, Memon S, Gregory MA (2010) A novel approach for MFCC feature extraction. In 2010 4th international conference on signal processing and communication systems (pp. 1-5). IEEE.
Hosseini R, Walsh B, Tian F, Wang S (2018) An fNIRS-based feature learning and classification framework to distinguish hemodynamic patterns in children who stutter. IEEE Trans Neural Syst Rehabil Eng 26(6):1254–1263
Howell P, Sackin S (1995) Automatic recognition of repetitions and prolongations in stuttered speech. In proceedings of the first world congress on fluency disorders (Vol. 2, pp. 372-374). Nijmegen, the Netherlands: university press Nijmegen.
Howell P, Sackin S, Glenn K (1997) Development of a two-stage procedure for the automatic recognition of dysfluencies in the speech of children who stutter: II. ANN recognition of repetitions and prolongations with supplied word segment markers. J Speech, Language, Hearing Res 40(5):1085–1096
Howell P, Davis S, Bartrip J, Wormald L (2004) Effectiveness of frequency shifted feedback at reducing disfluency for linguistically easy, and difficult, sections of speech (original audio recordings included). Stammer Res: On-Line J Publish Brit Stamm Assoc 1(3):309
Jain AK, Mao J, Mohiuddin KM (1996) Artificial neural networks: A tutorial. Computer 29(3):31–44
Khalil OH, Elhadad A, Ghareeb A (2020) A blind proposed 3D mesh watermarking technique for copyright protection. Imaging Sci J 68(2):90–99
Khan N (2015) The effect of stuttering on speech and learning process, A case study. Int J Stud English Language Literature (IJSELL) 3(4):89–103
Km RK, Ganesan S (2011) Comparison of multidimensional MFCC feature vectors for objective assessment of stuttered disfluencies. Int J Adv Netw Appl 2(05):854–860
KN VN, Meharunnisa SP (2016) Detection and analysis of stuttered speech. Int J Adv Res Electronics Comm Eng (IJARECE) 5(4):2278–909X
Kourkounakis T, Hajavi A & Etemad A (2020). FluentNet: end-to-end detection of speech disfluency with deep learning. arXiv preprint arXiv:2009.11394.
Kumar P, Biswas A, Mishra AN, Chandra M (2010) Spoken language identification using hybrid feature extraction methods. arXiv preprint arXiv:1003.5623.
Li Q, Huang Y (2010) An auditory-based feature extraction algorithm for robust speaker identification under mismatched conditions. IEEE Trans Audio Speech Lang Process 19(6):1791–1801
Likitha MS, Gupta SRR, Hasitha K, Raju AU (2017) Speech based human emotion recognition using MFCC. In 2017 international conference on wireless communications, signal processing and networking (WiSPNET) (pp. 2257-2260). IEEE.
Maas AL, Qi P, Xie Z, Hannun AY, Lengerich CT, Jurafsky D, Ng AY (2017) Building DNN acoustic models for large vocabulary speech recognition. Comput Speech Lang 41:195–213
Mahesha P, Vinod DS (2013) Classification of speech dysfluencies using speech parameterization techniques and multiclass SVM. In international conference on heterogeneous networking for quality, reliability, security and robustness (pp. 298-308). Springer, Berlin, Heidelberg.
Mahesha P, Vinod DS (2015) Combining cepstral and prosodic features for classification of disfluencies in stuttered speech. In intelligent computing, communication and devices (pp. 623–633). Springer, New Delhi
Manjula G, Kumar S (2016) Overview of Analysis and Classification of Stuttered Speech Proceed 11th IRF Int Conf
Manjula G, Kumar MS, Geetha YV, Kasar T (2017) Identification and validation of repetitions/prolongations in stuttering speech using epoch features. Int J Appl Eng Res 12(22):11976–11980
Manjula G, Shivakumar M, Geetha YV (2019) Adaptive optimization based neural network for classification of stuttered speech. In Proceedings of the 3rd international Conference on Cryptography, Security and Privacy (pp. 93-98).
Meenakshi M (2020) Machine learning algorithms and their real-life applications: A survey. Available at SSRN 3595299
Mirri S, Delnevo G, Roccetti M (2020) Is a COVID-19 second wave possible in Emilia-Romagna (Italy)? Forecasting a future outbreak with particulate pollution and machine learning. Computation 8(3):74
Mohan BJ (2014) Speech recognition using MFCC and DTW. In 2014 international conference on advances in electrical engineering (ICAEE) (pp. 1-4). IEEE.
Nöth E, Niemann H, Haderlein T, Decher M, Eysholdt U, Rosanowski F, Wittenberg T (2000) Automatic stuttering recognition using hidden Markov models In Sixth International Conference on Spoken Language Processing
Oue S, Marxer R, Rudzicz F (2015) Automatic dysfluency detection in dysarthric speech using deep belief networks. In proceedings of SLPAT 2015: 6th workshop on speech and language processing for assistive technologies (pp. 60-64).
Pálfy J, Pospíchal J (2011) Recognition of repetitions using support vector machines. In signal processing algorithms, architectures, arrangements, and applications SPA 2011 (pp. 1-6). IEEE.
Pinelli P (1992) Neurophysiology in the science of speech. Curr Opinion Neurol Neurosurg 5(5):744–755
Prakash CO, Sai YP, Kumar VN (2018) Design and implementation of silent pause stuttered speech recognition system
Qi F, Bao C, Liu Y (2004, December) A novel two-step SVM classifier for voiced/unvoiced/silence classification of speech. In 2004 international symposium on Chinese spoken language processing (pp. 77-80). IEEE.
Raghavendra M, Rajeswari P (2016) Determination of disfluencies associated in stuttered speech using MFCC feature extraction. Comput. Speech Lang, IJEDR 4(2):2321–9939
Ramteke PB, Koolagudi SG, Afroz F (2016). Repetition detection in stuttered speech. In Proceedings of 3rd international conference on advanced computing, networking and informatics (pp. 611–617). Springer, New Delhi
Ravikumar KM, Reddy B, Rajagopal R, Nagaraj H (2008) Automatic detection of syllable repetition in read speech for objective assessment of stuttered disfluencies. Proceed World Acad Sci, Eng Technol 36:270–273
Ravikumar KM, Rajagopal R, Nagaraj HC (2009) An approach for objective assessment of stuttered speech using MFCC features. ICGST Int J Digital Signal Process, DSP 9(1):19–24
Revada LKV, Rambatla VK, Ande KVN (2011) A novel approach to speech recognition by using generalized regression neural networks. Int J Comput Sci Issues (IJCSI) 8(2):484
Savin PS, Ramteke PB & Koolagudi SG (2016). Recognition of repetition and prolongation in stuttered speech using ANN. In proceedings of 3rd international conference on advanced computing, networking and informatics (pp. 65–71). Springer, New Delhi
Sen S, Dutta A, Dey N (2019) Audio processing and speech recognition: concepts. Springer, Techniques and Research Overviews
Sen S, Dutta A, Dey N (2019) Speech processing and recognition system. Audio Processing and Speech Recognition. Springer Briefs in Applied Sciences and Technology. Springer, Singapore
Sharma U, Maheshkar S, Mishra AN (2015) Study of robust feature extraction techniques for speech recognition system. In 2015 international conference on futuristic trends on computational analysis and knowledge management (ABLAZE) (pp. 654-658). IEEE.
Shirvan RA, Tahami E (2011) Voice analysis for detecting Parkinson's disease using genetic algorithm and KNN classification method. In 2011 18th Iranian conference of biomedical engineering (ICBME) (pp. 278-283). IEEE.
Subasi A, Gursoy MI (2010) EEG signal classification using PCA, ICA, LDA and support vector machines. Expert Syst Appl 37(12):8659–8666
Suguna N, Thanushkodi K (2010) An improved k-nearest neighbor classification using genetic algorithm. Int J Comp Sci 7(2):18–21
Surya AA, Varghese SM (2016) Automatic speech recognition system for stuttering disabled persons. Int J Control Theory Appl 9(43):16–20
Świetlicka I, Kuniszyk-Jóźkowiak W, & Smołka E (2009). Artificial neural networks in the disabled speech analysis. In computer recognition systems 3 (pp. 347–354). Springer, Berlin, Heidelberg
Szczurowska I, Kuniszyk-Jóźkowiak W, Smołka E (2014) The application of Kohonen and multilayer perceptron networks in the speech nonfluency analysis. Arch Acoust 31(4 (S)):205–210
Tan TS, Ariff AK, Ting CM, Salleh SH (2007) Application of Malay speech technology in Malay speech therapy assistance tools. In 2007 International Conference on Intelligent and Advanced Systems (pp. 330-334). IEEE.
UCLASS DATABASE, URL:https://www.uclass.psychol.ucl.ac.uk/ [ last access date: 01/01/2021]
Wahyuni ES (2017) Arabic speech recognition using MFCC feature extraction and ANN classification. In 2017 2nd international conferences on information technology, information systems and electrical engineering (ICITISEE) (pp. 22-25). IEEE.
Wiśniewski M, Kuniszyk-Jóźkowiak W, Smołka E, Suszyński W (2007) Automatic detection of prolonged fricative phonemes with the hidden Markov models approach. J Med Inform Technol:11
Wiśniewski, M., Kuniszyk-Jóźkowiak, W., Smołka, E., & Suszyński, W. (2007). Automatic detection of disorders in a continuous speech with the hidden Markov models approach. In computer recognition systems 2 (pp. 445–453). Springer, Berlin, Heidelberg
Xie L, Liu ZQ (2006) A comparative study of audio features for audio-to-visual conversion in mpeg-4 compliant facial animation. In 2006 international conference on machine Learni ng and cybernetics (pp. 4359-4364). IEEE.
Yairi E (2007) Subtyping stuttering I: A review. J Fluen Disord 32(3):165–196
Yuhas BP, Goldstein MH, Sejnowski TJ, Jenkins RE (1990) Neural network models of sensory integration for improved vowel recognition. Proc IEEE 78(10):1658–1668
Zhang JM, Harman M, Ma L, Liu Y (2020) Machine learning testing: survey, landscapes and horizons. IEEE Trans Softw Eng
Funding
There is no funding for this research work.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interests/competing interests
The authors want to declare that, there are no conflicts of interests / competing interests in this research work.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Banerjee, N., Borah, S. & Sethi, N. Intelligent stuttering speech recognition: A succinct review. Multimed Tools Appl 81, 24145–24166 (2022). https://doi.org/10.1007/s11042-022-12817-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-12817-z