Abstract
Recently, there have been many modern speech technologies, including those of speech synthesis and recognition, developed to help people with disabilities. While most of such technologies have successfully been applied to process speech of normal speakers, they may not be effective for speakers with speech disorder, depending on their severity. This paper proposes an automated method to preliminarily assess the ability of a speaker in pronouncing a word. Based on signal features, an indicator called pronouncibility index (Π) is introduced to express speech quality with two complementary measures, called distance-based and confusion-based factors. In the distance-based factor, the 1-norm, 2-norm and 3-norm distance are investigated while boundary-based and Gaussian-based approaches are introduced for confusion-based factors. The Π is used to estimate performance of speech recognition when it is applied to recognize speech of a dysarthric speaker. Three measures are applied to evaluate the effectiveness of Π, rank-order inconsistency, correlation coefficient, and root-mean-square of difference. The evaluations had been done by comparing its predicted recognition rates with ones predicted by the standard methods called the articulatory and intelligibility tests based on the two recognition systems (HMM and ANN). For the phoneme-test set (the training set), Π outperforms the articulatory and intelligibility tests in all three evaluations. The performance of Π decreases for the device-control set (the test set), and the intelligibility test becomes the best method followed by Π and the articulatory test. In general, Π is a promising indicator for predicting recognition rate with comparison to the standard assessments.
Similar content being viewed by others
References
Aronson AE (1993) Dysarthria: differential dianogsis (audio type). Mentor Seminars, Rochester
Bernthal J, Bankson N (1993) Articulation and phonological disorders. 3rd ed, Prentice Hall, Boston
Bodt M, Hernadez-Diaz M, Van De Heyning P (2002) Intelligibility as a linear combination of dimensions in dysarthric speech. J Commun Disord 35(3): 283–292
Chen K, Liu L (2009) Best k: critical clustering structures in categorical datasets. Knowl Inf Syst 20(1): 1–33
Collins M (1984) Integrating perceptual and instrumental procedures in dysarthria assessment. J Commun Disord 5: 159–170
David H (1988) The method of paired comparisons. Oxford University Press, New York
Deller J, Hsu D, Ferrier L (1991) On the use of hidden markov modeling for recognition of dysarthric speech. Comput Methods Programs Biomed 35: 125–139
Enderby P (1980) Frenchay dysarthria assessment. Br J Disord Commun 15: 165–173
Enderby P (1983) Frenchay Dysarthria Assessment. College Hill Press, USA
Esaki T, Hashiyama T, Tsukamoto Y (2006) Regularized fuzzy clustering by confusion degree based on dempster-shafer theory. In: Proceedings of the IEEE international conference on systems, man and cybernetics 4: 3192–3197
Exarchos TP, Tsipouras MGCP, Fotiadis DI (2009) An optimized sequential pattern matching methodology for sequence classification. Knowl Inf Syst 19(2): 249–264
Forrest K, Weismer G (1997) Acoustic analysis of dysarthric speech. In: McNeil M (eds) Clinical management of sensorimotor speech disorders. Thieme, New York, pp 63–80
Green P, Carmichael J, Hatzis A, Enderby P, Hawley M, Parker M (2003) Automatic speech recognition with sparse training data for dysarthric speakers. In: Proceedings of the eighth European conference on speech technology (Eurospeech 2003), Geneva, pp 1189–1192
Hardy J (1983) Cerebral Palsy. Pentice-Hall, NJ
Hawley M (2002) Speech recognition as an input to electronic assistive technology. Br J Occup Ther 65(1): 15–20
Hawley M, Enderby P, Green P, Cunningham S, Brownsell S, Carmichael J, Parker M, Hatzis A, O’Neill P, Palmer R (2007) A speech-controlled environmental control system for people with severe dysarthria. Med Eng Phys 29(5): 586–593
Hawley M, Enderby P, Green P, Cunningham S, Palmer R (2006) Development of a voice-input voice-output communication aid (vivoca) for people with severe dysarthria. In: Mobile computing in medicine: designing mobile questionnaires for elderly and partially sighted people, vol 4061 of lecture notes in computer science, Springer, pp 882–885
HTK (2008) HTK home (online), http://htk.eng.cam.ac.uk
Itakura F (1975) Minimum prediction residual principle applied to speech recognition. IEEE Trans Acoust 23(1): 67–72
Kayasith P, Theeramunkong T (2009) Speech clarity index (ψ): a distance-based speech quality indicator and recognition rate prediction for dysarthric speakers with cerebral palsy. IEICE Trans Inf Syst E92-D(3): 460–468
Kayasith P, Theeramunkong T (2009) Speech confusion index \({\phi}\): a confusion-based speech quality indicator and recognition rate prediction for dysarthria. Comput Math Appl 58(8): 1534–1549
Kayasith P, Theeramunkong T, Nuttakorn T (2006b) Speech confusion index \({\phi}\): a recognition rate indicator for dysarthric speakers. In: Advances in natural language processing, vol 4139 of lecture notes in computer science, Springer, pp 604–615
Kayasith P, Theeramunkong T, Thubthong N (2007) Incorporated speech overlapped factor \({\phi}\) into speech clarity index (ψ): method to improve dysarthric speech severity evaluation. In i-CREATe ’07: proceedings of the 1st international convention on rehabilitation engineering & assistive technology, ACM, New York, NY, USA, pp 133–138
Kent RD (1996) Hearing and believing: some limits to auditory-perceptual assessment of speech and voice disorders. J Speech Hear Disord 7: 7–23
Kent RD, Miolo G, Bloedel S (1994) The intelligibility of children’s speech: a review of evaluation procedures. Am J Speech Lang Pathol 3: 81–95
Keogh E, Ratanamahatana C (2005) Exact indexing of dynamic time warping. Knowl Inf Syst 7(3): 358–386
Kotler A, Thomas-Stonel N (1997) Effects of speech training on the accuracy of speech recognition for an individual with speech impairment. J Augment Altern Commun 12: 71–80
Maier A, Haderlein T, Eysholdt U, Rosanowski F, Batliner A, Schuster M, Noth E (2009) Peaks—a system for the automatic evaluation of voice and speech disorders. Speech Commun 51(5): 425–437
Markov K, Dang J, Nakamura S (2006) Integration of articulatory and spectrum features based on the hybrid hmm/bn modeling framework. Speech Commun 48(2): 161–175
Mermelstein P (1976) Distance measures for speech recognition, psychological and instrumental. In: Chen R (eds) Pattern recognition and artificial intelligence. Academic Press, New York, pp 374–388
NICO (2008) NICO home (online), http://nico.nikkostrom.com
Pfitzner D, Leibbrandt R, Powers D (2009) Characterization and evaluation of similarity measures for pairs of clustering. Knowl Inf Syst 19(3): 361–394
Power M, Braida L (1996) Consistency among speech parameter vectors: application to predicting speech intelligibility. J Acoust Soc Am 100(6): 3882–3898
Rosen K, Yampolsky S (2000) Automatic speech recognition and a review of its functioning with dysarthric speech. J Augment Altern Commun 16: 46–60
Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust ASSP-26(1): 43–49
Shriberg L, Austin D, Lewis BA, McSweeny JL, Wilson D (1997) The percentage of consonants correct (pcc) metric: Extensions and reliability data. J Speech Lang Hear Res 40: 708–722
Shriberg L, Kwiatkowski J (1982) Phonological disorders iii: a procedure for assessing severity of involvement. J Speech Hear Disord 47(3): 70–256
Thubthong N, Kayasith P (2004) Incorporated tone model speech recognition for thai dysarthria. In: 11th international society for augmentative and alternate communication, Natal Brazil
Wahlster W (2000) Verbmoil: foundation of speech-to-speech translation. Springer, NY
Zhou A, Cao F, Qian W, Jin C (2008) Tracking clusters in evolving data streams over sliding windows. Knowl Inf Syst 15(2): 181–214
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kayasith, P., Theeramunkong, T. Pronouncibility index (Π): a distance-based and confusion-based speech quality measure for dysarthric speakers. Knowl Inf Syst 27, 367–391 (2011). https://doi.org/10.1007/s10115-010-0301-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-010-0301-4