Skip to main content
Log in

Pronouncibility index (Π): a distance-based and confusion-based speech quality measure for dysarthric speakers

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

Recently, there have been many modern speech technologies, including those of speech synthesis and recognition, developed to help people with disabilities. While most of such technologies have successfully been applied to process speech of normal speakers, they may not be effective for speakers with speech disorder, depending on their severity. This paper proposes an automated method to preliminarily assess the ability of a speaker in pronouncing a word. Based on signal features, an indicator called pronouncibility index (Π) is introduced to express speech quality with two complementary measures, called distance-based and confusion-based factors. In the distance-based factor, the 1-norm, 2-norm and 3-norm distance are investigated while boundary-based and Gaussian-based approaches are introduced for confusion-based factors. The Π is used to estimate performance of speech recognition when it is applied to recognize speech of a dysarthric speaker. Three measures are applied to evaluate the effectiveness of Π, rank-order inconsistency, correlation coefficient, and root-mean-square of difference. The evaluations had been done by comparing its predicted recognition rates with ones predicted by the standard methods called the articulatory and intelligibility tests based on the two recognition systems (HMM and ANN). For the phoneme-test set (the training set), Π outperforms the articulatory and intelligibility tests in all three evaluations. The performance of Π decreases for the device-control set (the test set), and the intelligibility test becomes the best method followed by Π and the articulatory test. In general, Π is a promising indicator for predicting recognition rate with comparison to the standard assessments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aronson AE (1993) Dysarthria: differential dianogsis (audio type). Mentor Seminars, Rochester

    Google Scholar 

  2. Bernthal J, Bankson N (1993) Articulation and phonological disorders. 3rd ed, Prentice Hall, Boston

    Google Scholar 

  3. Bodt M, Hernadez-Diaz M, Van De Heyning P (2002) Intelligibility as a linear combination of dimensions in dysarthric speech. J Commun Disord 35(3): 283–292

    Article  Google Scholar 

  4. Chen K, Liu L (2009) Best k: critical clustering structures in categorical datasets. Knowl Inf Syst 20(1): 1–33

    Article  Google Scholar 

  5. Collins M (1984) Integrating perceptual and instrumental procedures in dysarthria assessment. J Commun Disord 5: 159–170

    Google Scholar 

  6. David H (1988) The method of paired comparisons. Oxford University Press, New York

    MATH  Google Scholar 

  7. Deller J, Hsu D, Ferrier L (1991) On the use of hidden markov modeling for recognition of dysarthric speech. Comput Methods Programs Biomed 35: 125–139

    Article  Google Scholar 

  8. Enderby P (1980) Frenchay dysarthria assessment. Br J Disord Commun 15: 165–173

    Article  Google Scholar 

  9. Enderby P (1983) Frenchay Dysarthria Assessment. College Hill Press, USA

    Google Scholar 

  10. Esaki T, Hashiyama T, Tsukamoto Y (2006) Regularized fuzzy clustering by confusion degree based on dempster-shafer theory. In: Proceedings of the IEEE international conference on systems, man and cybernetics 4: 3192–3197

    Google Scholar 

  11. Exarchos TP, Tsipouras MGCP, Fotiadis DI (2009) An optimized sequential pattern matching methodology for sequence classification. Knowl Inf Syst 19(2): 249–264

    Article  Google Scholar 

  12. Forrest K, Weismer G (1997) Acoustic analysis of dysarthric speech. In: McNeil M (eds) Clinical management of sensorimotor speech disorders. Thieme, New York, pp 63–80

    Google Scholar 

  13. Green P, Carmichael J, Hatzis A, Enderby P, Hawley M, Parker M (2003) Automatic speech recognition with sparse training data for dysarthric speakers. In: Proceedings of the eighth European conference on speech technology (Eurospeech 2003), Geneva, pp 1189–1192

  14. Hardy J (1983) Cerebral Palsy. Pentice-Hall, NJ

    Google Scholar 

  15. Hawley M (2002) Speech recognition as an input to electronic assistive technology. Br J Occup Ther 65(1): 15–20

    Google Scholar 

  16. Hawley M, Enderby P, Green P, Cunningham S, Brownsell S, Carmichael J, Parker M, Hatzis A, O’Neill P, Palmer R (2007) A speech-controlled environmental control system for people with severe dysarthria. Med Eng Phys 29(5): 586–593

    Article  Google Scholar 

  17. Hawley M, Enderby P, Green P, Cunningham S, Palmer R (2006) Development of a voice-input voice-output communication aid (vivoca) for people with severe dysarthria. In: Mobile computing in medicine: designing mobile questionnaires for elderly and partially sighted people, vol 4061 of lecture notes in computer science, Springer, pp 882–885

  18. HTK (2008) HTK home (online), http://htk.eng.cam.ac.uk

  19. Itakura F (1975) Minimum prediction residual principle applied to speech recognition. IEEE Trans Acoust 23(1): 67–72

    Article  Google Scholar 

  20. Kayasith P, Theeramunkong T (2009) Speech clarity index (ψ): a distance-based speech quality indicator and recognition rate prediction for dysarthric speakers with cerebral palsy. IEICE Trans Inf Syst E92-D(3): 460–468

    Article  Google Scholar 

  21. Kayasith P, Theeramunkong T (2009) Speech confusion index \({\phi}\): a confusion-based speech quality indicator and recognition rate prediction for dysarthria. Comput Math Appl 58(8): 1534–1549

    Article  MATH  Google Scholar 

  22. Kayasith P, Theeramunkong T, Nuttakorn T (2006b) Speech confusion index \({\phi}\): a recognition rate indicator for dysarthric speakers. In: Advances in natural language processing, vol 4139 of lecture notes in computer science, Springer, pp 604–615

  23. Kayasith P, Theeramunkong T, Thubthong N (2007) Incorporated speech overlapped factor \({\phi}\) into speech clarity index (ψ): method to improve dysarthric speech severity evaluation. In i-CREATe ’07: proceedings of the 1st international convention on rehabilitation engineering & assistive technology, ACM, New York, NY, USA, pp 133–138

  24. Kent RD (1996) Hearing and believing: some limits to auditory-perceptual assessment of speech and voice disorders. J Speech Hear Disord 7: 7–23

    Google Scholar 

  25. Kent RD, Miolo G, Bloedel S (1994) The intelligibility of children’s speech: a review of evaluation procedures. Am J Speech Lang Pathol 3: 81–95

    Google Scholar 

  26. Keogh E, Ratanamahatana C (2005) Exact indexing of dynamic time warping. Knowl Inf Syst 7(3): 358–386

    Article  Google Scholar 

  27. Kotler A, Thomas-Stonel N (1997) Effects of speech training on the accuracy of speech recognition for an individual with speech impairment. J Augment Altern Commun 12: 71–80

    Article  Google Scholar 

  28. Maier A, Haderlein T, Eysholdt U, Rosanowski F, Batliner A, Schuster M, Noth E (2009) Peaks—a system for the automatic evaluation of voice and speech disorders. Speech Commun 51(5): 425–437

    Article  Google Scholar 

  29. Markov K, Dang J, Nakamura S (2006) Integration of articulatory and spectrum features based on the hybrid hmm/bn modeling framework. Speech Commun 48(2): 161–175

    Article  Google Scholar 

  30. Mermelstein P (1976) Distance measures for speech recognition, psychological and instrumental. In: Chen R (eds) Pattern recognition and artificial intelligence. Academic Press, New York, pp 374–388

    Google Scholar 

  31. NICO (2008) NICO home (online), http://nico.nikkostrom.com

  32. Pfitzner D, Leibbrandt R, Powers D (2009) Characterization and evaluation of similarity measures for pairs of clustering. Knowl Inf Syst 19(3): 361–394

    Article  Google Scholar 

  33. Power M, Braida L (1996) Consistency among speech parameter vectors: application to predicting speech intelligibility. J Acoust Soc Am 100(6): 3882–3898

    Article  Google Scholar 

  34. Rosen K, Yampolsky S (2000) Automatic speech recognition and a review of its functioning with dysarthric speech. J Augment Altern Commun 16: 46–60

    Google Scholar 

  35. Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust ASSP-26(1): 43–49

    Article  Google Scholar 

  36. Shriberg L, Austin D, Lewis BA, McSweeny JL, Wilson D (1997) The percentage of consonants correct (pcc) metric: Extensions and reliability data. J Speech Lang Hear Res 40: 708–722

    Google Scholar 

  37. Shriberg L, Kwiatkowski J (1982) Phonological disorders iii: a procedure for assessing severity of involvement. J Speech Hear Disord 47(3): 70–256

    Google Scholar 

  38. Thubthong N, Kayasith P (2004) Incorporated tone model speech recognition for thai dysarthria. In: 11th international society for augmentative and alternate communication, Natal Brazil

  39. Wahlster W (2000) Verbmoil: foundation of speech-to-speech translation. Springer, NY

    Google Scholar 

  40. Zhou A, Cao F, Qian W, Jin C (2008) Tracking clusters in evolving data streams over sliding windows. Knowl Inf Syst 15(2): 181–214

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Thanaruk Theeramunkong.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kayasith, P., Theeramunkong, T. Pronouncibility index (Π): a distance-based and confusion-based speech quality measure for dysarthric speakers. Knowl Inf Syst 27, 367–391 (2011). https://doi.org/10.1007/s10115-010-0301-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-010-0301-4

Keywords

Navigation