Abstract
Due to the improvements of dysarthric automatic speech recognition (ASR) during the last few decades, the demand for assessment and evaluation of such technologies increased significantly. Evaluation methods of ASRs are now required to consider multiple qualitative and quantitative metrics. In this study, the exploratory factor analysis is conducted to classify the evaluation metrics that is applied by researchers. The metrics with high Pearson correlation coefficiency (\(r > .9\)) are placed in same groups so the number of metrics from 23 is reduced to six main metrics. Artificial neural networks (ANNs) do not require any internal knowledge of system parameters and provide solutions for problems with multi-variables while delivering speedy calculations; hence, they can be used as an alternative to analytical approaches based on obtained evaluation metrics. Here, the adaptive neuro-fuzzy inference system (ANFIS) was employed for ASR performance evaluation in which it applies an ANN to estimate the fuzzy logic membership function parameters of the fuzzy inference system (FIS). The proposed algorithm was deployed in MATLAB and employed to measure the performances of two dysarthric ARS systems based on MVML and MVSL active learning theories. The assessment results presented in this paper show the effectiveness of the developed method.
Similar content being viewed by others
References
Asemi A, Asemi A (2014) Intelligent MCDM method for supplier selection under fuzzy environment. Int J Inf Sci Manag (IJISM) 12(2):33–40
Asemi A, Baba MS, Haji Abdullah R, Idris N (2014) Fuzzy multi criteria decision making applications: a review study. Eprints.um, pp 344–351
Assaleh K, Al-Rousan M (2005) Recognition of Arabic sign language alphabet using polynomial classifiers. EURASIP J Appl Signal Process 2005:2136–2145
Avci E, Akpolat ZH (2006) Speech recognition using a wavelet packet adaptive network based fuzzy inference system. Expert Syst Appl 31(3):495–503
Bangor A, Kortum PT, Miller JT (2008) An empirical evaluation of the system usability scale. Int J Hum Comput Interact 24(6):574–594
Bennett I, Babu BR, Morkhandikar K, Gururaj P (2014) Speech recognition system interactive agent. Google Patents
Bhandari B, Grant M (2007) User satisfaction and sustainability of drinking water schemes in rural communities of Nepal. Sustain Sci Pract Policy 3(1):12–20
Dybkjær L, Bernsen NO (2001) Usability evaluation in spoken language dialogue systems. In: Proceedings of the workshop on evaluation for language and dialogue systems, vol 9
Ekici BB, Aksoy UT (2011) Prediction of building energy needs in early stage of design by using ANFIS. Expert Syst Appl 38(5):5352–5358
Harman HH (1976) Modern factor analysis. Chicago University Press
Hasegawa-Johnson M, Gunderson J, Perlman A, Huang T (2006) HMM-based and SVM-based recognition of the speech of talkers with spastic dysarthria. Paper presented at the Proceedings of the 2006 IEEE international conference on acoustics, speech, and signal processing
Hawley MS, Enderby P, Green P, Cunningham S, Brownsell S, Carmichael J, Parker M, Hatzis A, O’Neill P, Palmer R (2007) A speech-controlled environmental control system for people with severe dysarthria. Med Eng Phys 29(5):586–593. https://doi.org/10.1016/j.medengphy.2006.06.009
İnal M (2008) Determination of dielectric properties of insulator materials by means of ANFIS: a comparative study. J Mater Process Technol 195(1):34–43
Jang JSR (1993) ANFIS: adaptive-network-based fuzzy inference system. IEEE Trans Syst Man Cybernet 23(3):665–685
Khajeh A, Modarress H, Rezaee B (2009) Application of adaptive neuro-fuzzy inference system for solubility prediction of carbon dioxide in polymers. Expert Syst Appl 36(3):5728–5732
Kim H, Hasegawa-Johnson M, Perlman A, Gunderson J, Huang T, Watkin K, Frame S (2008). Dysarthric speech database for universal access research. Paper presented at the proceedings of the 9th annual conference of the international speech communication association, Brisbane, Australia
Kitchenham BA, Charters S (2007) Guidelines for performing systematic literature reviews in software engineering. EBSE Technical Report EBSE, pp 1–57
Mansourvar M, Asemi A, Raj RG et al (2017) A fuzzy inference system for skeletal age assessment in living individual. Int J Fuzzy Syst 19:838. https://doi.org/10.1007/s40815-016-0232-7
Minker W (1998) Evaluation methodologies for interactive speech systems. Paper presented at the first international conference on language resources and evaluation
Morales SOC, Cox SJ (2009) Modelling errors in automatic speech recognition for dysarthric speakers. EURASIP J Adv Signal Process 2009(1):1–14
Motamed S, Setayeshi S, Rabiee A (2017) Speech emotion recognition based on a modified brain emotional learning model. Biol Inspir Cognit Archit 19:32–38
Petković D, Ćojbašić Ž (2012) Adaptive neuro-fuzzy estimation of autonomic nervous system parameters effect on heart rate variability. Neural Comput Appl 21(8):2065–2070
Petković D, Issa M, Pavlović ND, Zentner L, Ćojbašić Ž (2012) Adaptive neuro fuzzy controller for adaptive compliant robotic gripper. Expert Syst Appl 39(18):13295–13304
Polur PD, Miller GE (2005) Effect of high-frequency spectral components in computer recognition of dysarthric speech based on a Mel-cepstral stochastic model. J Rehabil Res Dev 42(3):363–371. https://doi.org/10.1682/jrrd.2004.06.0067
Prabhu V, Gunasekaran G (2016) Fuzzy logic based Nam speech recognition for Tamil syllables. Indian J Sci Technol 9(1):1–12
Rudzicz F (2012) Using articulatory likelihoods in the recognition of dysarthric speech. Speech Commun 54(3):430–444. https://doi.org/10.1016/j.specom.2011.10.006
Selouani S-A, Yakoub MS, O’Shaughnessy D (2009) Alternative speech communication system for persons with severe speech disorders. EURASIP J Adv Signal Process. https://doi.org/10.1155/2009/540409
Shahamiri SR, Salim B, Salwah S (2014) A multi-views multi-learners approach towards dysarthric speech recognition using multi-nets artificial neural networks. IEEE Trans Neural Syst Rehabil Eng 22(5):1053–1063
Spiliotopoulos D, Stavropoulou P, Kouroupetroglou G (2009) Spoken dialogue interfaces: integrating usability. Springer, Berlin
Sun S (2013) A survey of multi-view machine learning. Neural Comput Appl 23(7–8):2031–2038
Vanus J, Smolon M, Martinek R, Koziorek J, Zidek J, Bilik P (2015) Testing of the voice communication in smart home care. Human-Centric Comput Info Sci 5(1):1–22
Wang W, Zhou Z-H (2008) On multi-view active learning and the combination with semi-supervised learning. Paper presented at the proceedings of the 25th international conference on Machine learning
Wolfe J, Morais M, Schafer E, Agrawal S, Koch D (2015) Evaluation of speech recognition of cochlear implant recipients using adaptive, digital remote microphone technology and a speech enhancement sound processing algorithm. J Am Acad Audiol 26(5):502–508
Zhang Q, Sun S (2010) Multiple-view multiple-learner active learning. Pattern Recognit 43(9):3113–3119
Acknowledgements
This paper was funded by University of Malaya Research Grant (UMRG), Project No. RP003B-13ICT and UM High Impact Research Grant UM-MOHE UM.C/HIR/MOHE/FCSIT/05 from the Ministry of Higher Education, Malaysia.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that there is no conflict of intrest.
Additional information
Communicated by V. Loia.
Rights and permissions
About this article
Cite this article
Asemi, A., Salim, S.S.B., Shahamiri, S.R. et al. Adaptive neuro-fuzzy inference system for evaluating dysarthric automatic speech recognition (ASR) systems: a case study on MVML-based ASR. Soft Comput 23, 3529–3544 (2019). https://doi.org/10.1007/s00500-018-3013-4
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-018-3013-4