Skip to main content
Log in

Using speech rhythm knowledge to improve dysarthric speech recognition

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

We introduce a new framework to improve the dysarthric speech recognition by using the rhythm knowledge. This approach builds speaker-dependent (SD) recognizers with respect to the dysarthria severity level of each speaker. This severity level is determined by a hybrid classifier combining class posterior distributions and a hierarchical structure of multilayer perceptrons. To perform this classification, rhythm-based features are used as input parameters since the preliminary evidence from perceptual experiments shows that rhythm troubles may be the common characteristic of various types of dysarthria. Then, a speaker-dependent dysarthric speech recognition is performed by using Hidden Markov Models (HMMs). The Nemours database of American dysarthric speakers is used throughout experiments. Results show the relevance of rhythm metrics and the effectiveness of the proposed framework to improve the performance of dysarthric speech recognition.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Arvaniti, A. (2009). Rhythm timing and the timing of rhythm. Phonetica, 66, 46–63.

    Article  Google Scholar 

  • Darley, F. L., Aronson, A., & Brown, J. R. (1975). Motor speech disorders. Philadelphia: Saunders.

    Google Scholar 

  • Enderby, P., & Pamela, M. (1983). Frenchay dysarthria assessment. London: College Hill Press.

    Google Scholar 

  • Godino-Llorente, J. I., & Gomez-Vilda, P. (2004). Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors. IEEE Transactions on Biomedical Engineering, 51, 380–384.

    Article  Google Scholar 

  • Grabe, E., & Low, E. L. (2002). Durational variability in speech and the rhythm class hypothesis. Papers in Laboratory Phonology, 7, 515–546.

    Google Scholar 

  • Hasegawa-Johnson, M., Gunderson, J., Perlman, A., & Huang, T. (2006). HMM-based and SVM-based recognition of the speech of talkers with spastic dysarthria. In International conference on acoustics, speech and signal processing (ICASSP) (pp. 1060–1063).

    Google Scholar 

  • HTK (2009). The HTK book (Version 3.4.1). Cambridge: Speech Group Cambridge University.

    Google Scholar 

  • Liss, J., White, L., Mattys, S., Lansford, K., Lotto, A., Spitzer, S., & Caviness, J. (2009). Quantifying speech rhythm abnormalities in the dysarthrias. Journal of Speech, Language, and Hearing Research, 52, 1334–1352.

    Article  Google Scholar 

  • Polikoff, J. B., & Bunnell, H. T. (1999). The nemours database of dysarthric speech: A perceptual analysis. In The XIVth international congress of phonetic sciences (ICPhS) (pp. 783–786).

    Google Scholar 

  • Polur, D., & Miller, G. (2006). Investigation of an HMM/ANN hybrid structure in pattern recognition application using cepstral analysis of dysarthric (distorted) speech signals. Medical Engineering & Physics, 28, 741–748.

    Article  Google Scholar 

  • Ramus, F., Nespor, M., & Mehler, J. (1999). Correlates of linguistic rhythm in the speech signal. Cognition, 73, 265–292.

    Article  Google Scholar 

  • Rudzicz, F. (2009). Phonological features in discriminative classification of dysarthric speech. In International conference on acoustics, speech and signal processing (ICASSP) (pp. 4605–4608).

    Google Scholar 

  • Schwarz, P., Matejka, P., & Cernocky, J. (2006). Hierarchical structures of neural networks for phoneme recognition. In International conference on acoustics, speech and signal processing (ICASSP) (pp. 325–328).

    Google Scholar 

  • Selouani, S. A., Yakoub, M., & O’Shaughnessy, D. (2009). Alternative speech communication system, for persons with severe speech disorders. EURASIP Journal on Advances in Signal Processing, 2009, 540409. doi:10.1155/2009/540409.

  • Tolba, H., & Eltorgoman, A. (2009). Towards the improvement of automatic recognition of dysarthric speech. In IEEE international conference ICSIT (pp. 277–281).

    Google Scholar 

  • Tsuji, T., Fukuda, O., Ichinobe, H., & Kaneko, M. (1999). A log-linearized Gaussian mixture network and its application to EEG pattern classification. IEEE Transactions on Systems, Man, and Cybernetics, 29, 60–72.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S.-A. Selouani.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Selouani, SA., Dahmani, H., Amami, R. et al. Using speech rhythm knowledge to improve dysarthric speech recognition. Int J Speech Technol 15, 57–64 (2012). https://doi.org/10.1007/s10772-011-9104-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-011-9104-6

Keywords

Navigation