Abstract
We introduce a new framework to improve the dysarthric speech recognition by using the rhythm knowledge. This approach builds speaker-dependent (SD) recognizers with respect to the dysarthria severity level of each speaker. This severity level is determined by a hybrid classifier combining class posterior distributions and a hierarchical structure of multilayer perceptrons. To perform this classification, rhythm-based features are used as input parameters since the preliminary evidence from perceptual experiments shows that rhythm troubles may be the common characteristic of various types of dysarthria. Then, a speaker-dependent dysarthric speech recognition is performed by using Hidden Markov Models (HMMs). The Nemours database of American dysarthric speakers is used throughout experiments. Results show the relevance of rhythm metrics and the effectiveness of the proposed framework to improve the performance of dysarthric speech recognition.
Similar content being viewed by others
References
Arvaniti, A. (2009). Rhythm timing and the timing of rhythm. Phonetica, 66, 46–63.
Darley, F. L., Aronson, A., & Brown, J. R. (1975). Motor speech disorders. Philadelphia: Saunders.
Enderby, P., & Pamela, M. (1983). Frenchay dysarthria assessment. London: College Hill Press.
Godino-Llorente, J. I., & Gomez-Vilda, P. (2004). Automatic detection of voice impairments by means of short-term cepstral parameters and neural network based detectors. IEEE Transactions on Biomedical Engineering, 51, 380–384.
Grabe, E., & Low, E. L. (2002). Durational variability in speech and the rhythm class hypothesis. Papers in Laboratory Phonology, 7, 515–546.
Hasegawa-Johnson, M., Gunderson, J., Perlman, A., & Huang, T. (2006). HMM-based and SVM-based recognition of the speech of talkers with spastic dysarthria. In International conference on acoustics, speech and signal processing (ICASSP) (pp. 1060–1063).
HTK (2009). The HTK book (Version 3.4.1). Cambridge: Speech Group Cambridge University.
Liss, J., White, L., Mattys, S., Lansford, K., Lotto, A., Spitzer, S., & Caviness, J. (2009). Quantifying speech rhythm abnormalities in the dysarthrias. Journal of Speech, Language, and Hearing Research, 52, 1334–1352.
Polikoff, J. B., & Bunnell, H. T. (1999). The nemours database of dysarthric speech: A perceptual analysis. In The XIVth international congress of phonetic sciences (ICPhS) (pp. 783–786).
Polur, D., & Miller, G. (2006). Investigation of an HMM/ANN hybrid structure in pattern recognition application using cepstral analysis of dysarthric (distorted) speech signals. Medical Engineering & Physics, 28, 741–748.
Ramus, F., Nespor, M., & Mehler, J. (1999). Correlates of linguistic rhythm in the speech signal. Cognition, 73, 265–292.
Rudzicz, F. (2009). Phonological features in discriminative classification of dysarthric speech. In International conference on acoustics, speech and signal processing (ICASSP) (pp. 4605–4608).
Schwarz, P., Matejka, P., & Cernocky, J. (2006). Hierarchical structures of neural networks for phoneme recognition. In International conference on acoustics, speech and signal processing (ICASSP) (pp. 325–328).
Selouani, S. A., Yakoub, M., & O’Shaughnessy, D. (2009). Alternative speech communication system, for persons with severe speech disorders. EURASIP Journal on Advances in Signal Processing, 2009, 540409. doi:10.1155/2009/540409.
Tolba, H., & Eltorgoman, A. (2009). Towards the improvement of automatic recognition of dysarthric speech. In IEEE international conference ICSIT (pp. 277–281).
Tsuji, T., Fukuda, O., Ichinobe, H., & Kaneko, M. (1999). A log-linearized Gaussian mixture network and its application to EEG pattern classification. IEEE Transactions on Systems, Man, and Cybernetics, 29, 60–72.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Selouani, SA., Dahmani, H., Amami, R. et al. Using speech rhythm knowledge to improve dysarthric speech recognition. Int J Speech Technol 15, 57–64 (2012). https://doi.org/10.1007/s10772-011-9104-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10772-011-9104-6