Pronouncibility index (Π): a distance-based and confusion-based speech quality measure for dysarthric speakers

Kayasith, Prakasith; Theeramunkong, Thanaruk

doi:10.1007/s10115-010-0301-4

Pronouncibility index (Π): a distance-based and confusion-based speech quality measure for dysarthric speakers

Regular Paper
Published: 13 June 2010

Volume 27, pages 367–391, (2011)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

Prakasith Kayasith^1,2 &
Thanaruk Theeramunkong¹

119 Accesses
3 Citations
Explore all metrics

Abstract

Recently, there have been many modern speech technologies, including those of speech synthesis and recognition, developed to help people with disabilities. While most of such technologies have successfully been applied to process speech of normal speakers, they may not be effective for speakers with speech disorder, depending on their severity. This paper proposes an automated method to preliminarily assess the ability of a speaker in pronouncing a word. Based on signal features, an indicator called pronouncibility index (Π) is introduced to express speech quality with two complementary measures, called distance-based and confusion-based factors. In the distance-based factor, the 1-norm, 2-norm and 3-norm distance are investigated while boundary-based and Gaussian-based approaches are introduced for confusion-based factors. The Π is used to estimate performance of speech recognition when it is applied to recognize speech of a dysarthric speaker. Three measures are applied to evaluate the effectiveness of Π, rank-order inconsistency, correlation coefficient, and root-mean-square of difference. The evaluations had been done by comparing its predicted recognition rates with ones predicted by the standard methods called the articulatory and intelligibility tests based on the two recognition systems (HMM and ANN). For the phoneme-test set (the training set), Π outperforms the articulatory and intelligibility tests in all three evaluations. The performance of Π decreases for the device-control set (the test set), and the intelligibility test becomes the best method followed by Π and the articulatory test. In general, Π is a promising indicator for predicting recognition rate with comparison to the standard assessments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Strategic Approach for Robust Dysarthric Speech Recognition

Article 01 February 2024

Comparative analysis of Dysarthric speech recognition: multiple features and robust templates

Article 08 April 2022

Visualization of Intelligibility Measured by Language-Independent Features

References

Aronson AE (1993) Dysarthria: differential dianogsis (audio type). Mentor Seminars, Rochester
Google Scholar
Bernthal J, Bankson N (1993) Articulation and phonological disorders. 3rd ed, Prentice Hall, Boston
Google Scholar
Bodt M, Hernadez-Diaz M, Van De Heyning P (2002) Intelligibility as a linear combination of dimensions in dysarthric speech. J Commun Disord 35(3): 283–292
Article Google Scholar
Chen K, Liu L (2009) Best k: critical clustering structures in categorical datasets. Knowl Inf Syst 20(1): 1–33
Article Google Scholar
Collins M (1984) Integrating perceptual and instrumental procedures in dysarthria assessment. J Commun Disord 5: 159–170
Google Scholar
David H (1988) The method of paired comparisons. Oxford University Press, New York
MATH Google Scholar
Deller J, Hsu D, Ferrier L (1991) On the use of hidden markov modeling for recognition of dysarthric speech. Comput Methods Programs Biomed 35: 125–139
Article Google Scholar
Enderby P (1980) Frenchay dysarthria assessment. Br J Disord Commun 15: 165–173
Article Google Scholar
Enderby P (1983) Frenchay Dysarthria Assessment. College Hill Press, USA
Google Scholar
Esaki T, Hashiyama T, Tsukamoto Y (2006) Regularized fuzzy clustering by confusion degree based on dempster-shafer theory. In: Proceedings of the IEEE international conference on systems, man and cybernetics 4: 3192–3197
Google Scholar
Exarchos TP, Tsipouras MGCP, Fotiadis DI (2009) An optimized sequential pattern matching methodology for sequence classification. Knowl Inf Syst 19(2): 249–264
Article Google Scholar
Forrest K, Weismer G (1997) Acoustic analysis of dysarthric speech. In: McNeil M (eds) Clinical management of sensorimotor speech disorders. Thieme, New York, pp 63–80
Google Scholar
Green P, Carmichael J, Hatzis A, Enderby P, Hawley M, Parker M (2003) Automatic speech recognition with sparse training data for dysarthric speakers. In: Proceedings of the eighth European conference on speech technology (Eurospeech 2003), Geneva, pp 1189–1192
Hardy J (1983) Cerebral Palsy. Pentice-Hall, NJ
Google Scholar
Hawley M (2002) Speech recognition as an input to electronic assistive technology. Br J Occup Ther 65(1): 15–20
Google Scholar
Hawley M, Enderby P, Green P, Cunningham S, Brownsell S, Carmichael J, Parker M, Hatzis A, O’Neill P, Palmer R (2007) A speech-controlled environmental control system for people with severe dysarthria. Med Eng Phys 29(5): 586–593
Article Google Scholar
Hawley M, Enderby P, Green P, Cunningham S, Palmer R (2006) Development of a voice-input voice-output communication aid (vivoca) for people with severe dysarthria. In: Mobile computing in medicine: designing mobile questionnaires for elderly and partially sighted people, vol 4061 of lecture notes in computer science, Springer, pp 882–885
HTK (2008) HTK home (online), http://htk.eng.cam.ac.uk
Itakura F (1975) Minimum prediction residual principle applied to speech recognition. IEEE Trans Acoust 23(1): 67–72
Article Google Scholar
Kayasith P, Theeramunkong T (2009) Speech clarity index (ψ): a distance-based speech quality indicator and recognition rate prediction for dysarthric speakers with cerebral palsy. IEICE Trans Inf Syst E92-D(3): 460–468
Article Google Scholar
Kayasith P, Theeramunkong T (2009) Speech confusion index \({\phi}\): a confusion-based speech quality indicator and recognition rate prediction for dysarthria. Comput Math Appl 58(8): 1534–1549
Article MATH Google Scholar
Kayasith P, Theeramunkong T, Nuttakorn T (2006b) Speech confusion index \({\phi}\): a recognition rate indicator for dysarthric speakers. In: Advances in natural language processing, vol 4139 of lecture notes in computer science, Springer, pp 604–615
Kayasith P, Theeramunkong T, Thubthong N (2007) Incorporated speech overlapped factor \({\phi}\) into speech clarity index (ψ): method to improve dysarthric speech severity evaluation. In i-CREATe ’07: proceedings of the 1st international convention on rehabilitation engineering & assistive technology, ACM, New York, NY, USA, pp 133–138
Kent RD (1996) Hearing and believing: some limits to auditory-perceptual assessment of speech and voice disorders. J Speech Hear Disord 7: 7–23
Google Scholar
Kent RD, Miolo G, Bloedel S (1994) The intelligibility of children’s speech: a review of evaluation procedures. Am J Speech Lang Pathol 3: 81–95
Google Scholar
Keogh E, Ratanamahatana C (2005) Exact indexing of dynamic time warping. Knowl Inf Syst 7(3): 358–386
Article Google Scholar
Kotler A, Thomas-Stonel N (1997) Effects of speech training on the accuracy of speech recognition for an individual with speech impairment. J Augment Altern Commun 12: 71–80
Article Google Scholar
Maier A, Haderlein T, Eysholdt U, Rosanowski F, Batliner A, Schuster M, Noth E (2009) Peaks—a system for the automatic evaluation of voice and speech disorders. Speech Commun 51(5): 425–437
Article Google Scholar
Markov K, Dang J, Nakamura S (2006) Integration of articulatory and spectrum features based on the hybrid hmm/bn modeling framework. Speech Commun 48(2): 161–175
Article Google Scholar
Mermelstein P (1976) Distance measures for speech recognition, psychological and instrumental. In: Chen R (eds) Pattern recognition and artificial intelligence. Academic Press, New York, pp 374–388
Google Scholar
NICO (2008) NICO home (online), http://nico.nikkostrom.com
Pfitzner D, Leibbrandt R, Powers D (2009) Characterization and evaluation of similarity measures for pairs of clustering. Knowl Inf Syst 19(3): 361–394
Article Google Scholar
Power M, Braida L (1996) Consistency among speech parameter vectors: application to predicting speech intelligibility. J Acoust Soc Am 100(6): 3882–3898
Article Google Scholar
Rosen K, Yampolsky S (2000) Automatic speech recognition and a review of its functioning with dysarthric speech. J Augment Altern Commun 16: 46–60
Google Scholar
Sakoe H, Chiba S (1978) Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans Acoust ASSP-26(1): 43–49
Article Google Scholar
Shriberg L, Austin D, Lewis BA, McSweeny JL, Wilson D (1997) The percentage of consonants correct (pcc) metric: Extensions and reliability data. J Speech Lang Hear Res 40: 708–722
Google Scholar
Shriberg L, Kwiatkowski J (1982) Phonological disorders iii: a procedure for assessing severity of involvement. J Speech Hear Disord 47(3): 70–256
Google Scholar
Thubthong N, Kayasith P (2004) Incorporated tone model speech recognition for thai dysarthria. In: 11th international society for augmentative and alternate communication, Natal Brazil
Wahlster W (2000) Verbmoil: foundation of speech-to-speech translation. Springer, NY
Google Scholar
Zhou A, Cao F, Qian W, Jin C (2008) Tracking clusters in evolving data streams over sliding windows. Knowl Inf Syst 15(2): 181–214
Article Google Scholar

Download references

Author information

Authors and Affiliations

Sirindhorn International Institute of Technology (SIIT), Thammasat University, Bangkok, Thailand
Prakasith Kayasith & Thanaruk Theeramunkong
National Electronics and Computer Technology Center (NECTEC), Thailand Science Park, Bangkok, Thailand
Prakasith Kayasith

Authors

Prakasith Kayasith
View author publications
You can also search for this author in PubMed Google Scholar
Thanaruk Theeramunkong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thanaruk Theeramunkong.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kayasith, P., Theeramunkong, T. Pronouncibility index (Π): a distance-based and confusion-based speech quality measure for dysarthric speakers. Knowl Inf Syst 27, 367–391 (2011). https://doi.org/10.1007/s10115-010-0301-4

Download citation

Received: 23 August 2008
Revised: 24 November 2009
Accepted: 03 May 2010
Published: 13 June 2010
Issue Date: June 2011
DOI: https://doi.org/10.1007/s10115-010-0301-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pronouncibility index (Π): a distance-based and confusion-based speech quality measure for dysarthric speakers

Abstract

Access this article

Similar content being viewed by others

A Strategic Approach for Robust Dysarthric Speech Recognition

Comparative analysis of Dysarthric speech recognition: multiple features and robust templates

Visualization of Intelligibility Measured by Language-Independent Features

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Pronouncibility index (Π): a distance-based and confusion-based speech quality measure for dysarthric speakers

Abstract

Access this article

Similar content being viewed by others

A Strategic Approach for Robust Dysarthric Speech Recognition

Comparative analysis of Dysarthric speech recognition: multiple features and robust templates

Visualization of Intelligibility Measured by Language-Independent Features

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation