Skip to main content
Log in

Analysis of Short-Time Magnitude Spectra for Improving Intelligibility Assessment of Dysarthric Speech

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

The discriminating information about dysarthria exists at the fine level in the short-time Fourier transform magnitude spectra (STFT-MS). To capture the discriminating information present in STFT-MS using Mel-frequency cepstral coefficients (MFCCs), this paper firstly studied the role of increasing the size of Mel-filterbank and inverse Mel-filterbank. A novel feature extraction technique is then proposed for the assessment of dysarthria using speech. In the proposed approach, the STFT-MS is processed through an accumulator (digital integrator) to capture the spectral dynamics (SD). The accumulator output over a frequency range is a growing or decaying function of frequency depending on the peaks and valleys present within that region due to the pitch and resonance structure of the vocal tract system. The SD over a band of frequency is computed from the accumulator output by finding non-local differences between frequency points placed linearly in non-overlapping mode. The SDs are logarithmically compressed (LCSD) to normalize the magnitude of SD computed in different frequency regions. The LCSD represents (M) dimensional feature vector when it is computed at M linearly spaced frequency points. The i-vector-based dysarthric-level assessment system on the universal access speech reported in this study shows that the performance of the MFCC feature improves significantly by increasing the Mel-filterbank size. The MFCCs computed from the inverse Mel-filterbank (IMFCCs) contain additional information to MFCCs. The use of LCSD provides improved performance than MFCCs and IMFFCs and is also provides additional to MFCCs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data Availability

The data that support the findings in this manuscript are available in the link https://forms.illinois.edu/sec/1713398 on request.

References

  1. S. Albawi, T.A Mohammed, S. Al-Zawi, Understanding of a convolutional neural network, in Proceedings of International Conference on Engineering and Technology, pp. 1–6 (2017)

  2. A. Asaei, M. Cernak, H. Bourlard, Perceptual information loss due to impaired speech production. IEEE Trans. Audio Speech Lang. Process. 25(12), 2433–2443 (2017)

    Article  Google Scholar 

  3. K.K. Baker, L.O. Ramig, E.S. Luschei, M.E. Smith, Thyroarytenoid muscle activity associated with hypophonia in Parkinson disease and aging. Neurology 51(6), 1592–1598 (1998)

    Article  Google Scholar 

  4. S. Balakrishnama, A. Ganapathiraju, Linear discriminant analysis-a brief tutorial. Inst. Signal Inf. Process. 18, 1–8 (1998)

    Google Scholar 

  5. A. Benba, A. Jilbab, A. Hammouch, Detecting patients with Parkinson’s disease using Mel frequency cepstral coefficients and support vector machines. Int. J. Electrical Eng. Inf. 7(2), 297–307 (2015)

    Google Scholar 

  6. A. Benba, A. Jilbab, A. Hammouch, Discriminating between patients with Parkinson’s and neurological diseases using cepstral analysis. IEEE Trans. Neural Syst. Rehabil. Eng. 24(10), 1100–1108 (2016)

    Article  Google Scholar 

  7. C. Bhat, H. Strik, Automatic assessment of sentence-level dysarthria intelligibility using BLSTM. IEEE J. Sel. Top. Signal Process. 14(2), 322–330 (2020)

    Article  Google Scholar 

  8. C. Bhat, B. Vachhani, S.K. Kopparapu, Automatic assessment of dysarthria severity level using audio descriptors, in Proceedings of International Conference on Acoustics, Speech and Signal Processing, pp. 5070–5074 (2017)

  9. J.C. Brown, Calculation of a constant Q spectral transform. J. Acoust. Soc. Am. 89(1), 425–434 (1991)

    Article  Google Scholar 

  10. Carmichael, J.N.: Introducing objective acoustic metrics for the Frenchay Dysarthria Assessment procedure. Ph.D. thesis, University of Sheffield (2007)

  11. J.K. Casper, R. Leonard, Understanding Voice Problems: A Physiological Perspective for Diagnosis and Treatment (Lippincott Williams & Wilkins, London, 2006)

    Google Scholar 

  12. S. Chakroborty, A. Roy, G. Saha, Improved closed set text-independent speaker identification by combining MFCC with evidence from flipped filter banks. Int. J. Signal Process. 4(2), 114–122 (2007)

    Google Scholar 

  13. S. Chakroborty, G. Saha, Improved text-independent speaker identification using fused mfcc & imfcc feature sets based on gaussian filter. Int. J. Signal Process. 5(1), 11–19 (2009)

    Google Scholar 

  14. H. Chandrashekar, V. Karjigi, N. Sreedevi, Spectro-temporal representation of speech for intelligibility assessment of dysarthria. IEEE J. Sel. Top. Signal Process. 14(2), 390–399 (2019)

    Article  Google Scholar 

  15. G. Constantinescu, D. Theodoros, T. Russell, E. Ward, S. Wilson, R. Wootton, Assessing disordered speech and voice in Parkinson’s disease: a telerehabilitation application. Int. J. Lang. Commun. Disord. 45(6), 630–644 (2010)

    Article  Google Scholar 

  16. F.L. Darley, A.E. Aronson, J.R. Brown, Clusters of deviant speech dimensions in the dysarthrias. J. Speech Hear. Res. 12(3), 462–496 (1969)

    Article  Google Scholar 

  17. S. Davis, P. Mermelstein, Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)

    Article  Google Scholar 

  18. M.S. De Bodt, M.E.H.D. Huici, P.H. Van De Heyning, Intelligibility as a linear combination of dimensions in dysarthric speech. J. Commun. Disord. 35(3), 283–292 (2002)

    Article  Google Scholar 

  19. N. Dehak, R. Dehak, J.R. Glass, D.A. Reynolds, P. Kenny, et al., Cosine similarity scoring without score normalization techniques, in Proceedings Odyssey, pp. 1–5 (2010)

  20. N. Dehak, P.J. Kenny, R. Dehak, P. Dumouchel, P. Ouellet, Front-end factor analysis for speaker verification. IEEE Trans. Audio Speech Lang. Process. 19(4), 788–798 (2010)

    Article  Google Scholar 

  21. J.R. Deller Jr., J.G. Proakis, J.H. Hansen, Discrete time Processing of Speech Signals (Prentice Hall PTR, New York, 1993)

    Google Scholar 

  22. P.C. Doyle, H.A. Leeper, A.L. Kotler, N. Thomas-Stonell, C. O’Neill, M.C. Dylke, K. Rolls, Dysarthric speech: a comparison of computerized speech recognition and listener intelligibility. J. Rehabil. Res. Dev. 34, 309–316 (1997)

    Google Scholar 

  23. S.A. Factor, W. Weiner, Parkinson’s disease: diagnosis and clinical management. Science 2, 1158 (2007)

    Google Scholar 

  24. T.H. Falk, W.Y. Chan, F. Shein, Characterization of atypical vocal source excitation, temporal dynamics and prosody for objective measurement of dysarthric word intelligibility. Speech Commun. 54(5), 622–631 (2012)

    Article  Google Scholar 

  25. T.H. Falk, R. Hummel, W.Y. Chan, Quantifying perturbations in temporal dynamics for automated assessment of spastic dysarthric speech intelligibility, in Proceedings of International Conference on Acoustics, Speech and Signal Processing, pp. 4480–4483 (2011)

  26. N. García, J.C. Vásquez-Correa, J.R. Orozco-Arroyave, E. Nöth, Multimodal i-vectors to detect and evaluate parkinson’s disease, in Proceedings of INTERSPEECH, pp. 2349–2353 (2018)

  27. D. Garcia-Romero, C.Y. Espy-Wilson, Analysis of i-vector length normalization in speaker recognition systems, in Proceedings of INTERSPEECH, pp. 249–252 (2011)

  28. K. Gurugubelli, A.K. Vuppala, Perceptually enhanced single frequency filtering for dysarthric speech detection and intelligibility assessment, in Proceedings of International Conference on Acoustics, Speech and Signal Processing, pp. 6410–6414 (2019)

  29. K. Gurugubelli, A.K. Vuppala, Analytic phase features for dysarthric speech detection and intelligibility assessment. Speech Commun. 121, 1–15 (2020)

    Article  Google Scholar 

  30. A.O. Hatch, S. Kajarekar, A. Stolcke, Within-class covariance normalization for SVM-based speaker recognition, in Proceedings of Spoken Language Processing (2006)

  31. E. Hermann, M.M. Doss, Dysarthric speech recognition with lattice-free mmi, in Proceedings of International Conference on Acoustics, Speech and Signal Processing, pp. 6109–6113 (2020)

  32. H. Hermansky, Speech recognition from spectral dynamics. Sadhana 36(5), 729–744 (2011)

    Article  Google Scholar 

  33. A.A. Joshy, R. Rajan, Automated dysarthria severity classification using deep learning frameworks, in Proceedings of European Signal Processing Conference, pp. 116–120 (2021)

  34. K. Kadi, S.A. Selouani, B. Boudraa, M. Boudraa, Fully automated speaker identification and intelligibility assessment in dysarthria disease using auditory knowledge. Biocybern. Biomed. Eng. 36(1), 233–247 (2016)

    Article  Google Scholar 

  35. K.L. Kadi, S.A. Selouani, B. Boudraa, M. Boudraa, Automated diagnosis and assessment of dysarthric speech using relevant prosodic features. Trans. Eng. Technol. 2, 529–542 (2014)

    Article  Google Scholar 

  36. P. Kenny, T. Stafylakis, P. Ouellet, M.J. Alam, P. Dumouchel, Plda for speaker verification with utterances of arbitrary duration, in Proceedings of International Conference on Acoustics, Speech and Signal Processing, pp. 7649–7653 (2013)

  37. R.D. Kent, Research on speech motor control and its disorders: a review and prospective. J. Commun. Disord. 33(5), 391–428 (2000)

    Article  Google Scholar 

  38. H. Kim, M. Hasegawa-Johnson, A. Perlman, J. Gunderson, T.S. Huang, K. Watkin, S. Frame, Dysarthric speech database for universal access research, in Proceedings of INTERSPEECH, pp. 1741–1744 (2008)

  39. J. Kim, N. Kumar, A. Tsiartas, M. Li, S.S. Narayanan, Automatic intelligibility classification of sentence-level pathological speech. Comput. Speech Lang. 29(1), 132–144 (2015)

    Article  Google Scholar 

  40. M.J. Kim, B. Cao, K. An, J. Wang, Dysarthric speech recognition using convolutional LSTM neural network, in Proceedings of INTERSPEECH, pp. 2948–2952 (2018)

  41. M.J. Kim, H. Kim, Combination of multiple speech dimensions for automatic assessment of dysarthric speech intelligibility, in Proceedings of INTERSPEECH, pp. 1323–1326 (2012)

  42. I. Kodrasi, Temporal envelope and fine structure cues for dysarthric speech detection using cnns. IEEE Signal Process. Lett. 28, 1853–1857 (2021)

    Article  Google Scholar 

  43. I. Kodrasi, H. Bourlard, Super-gaussianity of speech spectral coefficients as a potential biomarker for dysarthric speech detection in Proceedings of International Conference on Acoustics, Speech and Signal Processing, pp. 6400–6404 (2019)

  44. J.M. Liss, S. LeGendre, A.J. Lotto, Discriminating dysarthria type from envelope modulation spectra. J. Speech Lang. Hear. Res. 53(5), 1246–1255 (2010)

    Article  Google Scholar 

  45. M. Little, P. McSharry, E. Hunter, J. Spielman, L. Ramig, Suitability of dysphonia measurements for telemonitoring of Parkinson’s disease. Nat. Preced. 2, 1–27 (2008)

    Google Scholar 

  46. K. Maity, G. Pradhan, J.P. Singh, A pitch and noise robust keyword spotting system using smac features with prosody modification. Circuits Syst. Signal Process. 40(4), 1892–1904 (2021)

    Article  Google Scholar 

  47. D. Martínez, P. Green, H. Christensen, Dysarthria intelligibility assessment in a factor analysis total variability space, in Proceedings of INTERSPEECH, pp. 2133–2137 (2013)

  48. D. Martínez, E. Lleida, P. Green, H. Christensen, A. Ortega, A. Miguel, Intelligibility assessment and speech recognizer word accuracy rate prediction for dysarthric speakers in a factor analysis subspace. ACM Trans. Access. Comput. 6(3), 1–21 (2015)

    Article  Google Scholar 

  49. N. Narendra, P. Alku, Dysarthric speech classification using glottal features computed from non-words, words and sentences, in Proceedings of INTERSPEECH, pp. 3403–3407 (2018)

  50. N. Narendra, P. Alku, Dysarthric speech classification from coded telephone speech using glottal features. Speech Commun. 110, 47–55 (2019)

    Article  Google Scholar 

  51. N. Narendra, P. Alku, Automatic assessment of intelligibility in speakers with dysarthria from coded telephone speech using glottal features. Comput. Speech Lang. 65, 1–14 (2021)

    Article  Google Scholar 

  52. S. Oue, R. Marxer, F. Rudzicz, Automatic dysfluency detection in dysarthric speech using deep belief networks, in Proceedings of Speech and Language Processing for Assistive Technologies, pp. 60–64 (2015)

  53. M. Perez, W. Jin, D. Le, N. Carlozzi, P. Dayalu, A. Roberts, E.M. Provost, Classification of huntington disease using acoustic and lexical features, in Proceedings of INTERSPEECH, pp. 1898–1902 (2018)

  54. P.N. Petkov, W.B. Kleijn, Spectral dynamics recovery for enhanced speech intelligibility in noise. IEEE Trans. Audio Speech Lang. Process. 23(2), 327–338 (2014)

    Article  Google Scholar 

  55. P. Pontes, A. Brasolotto, M. Behlau, Glottic characteristics and voice complaint in the elderly. J. Voice 19(1), 84–94 (2005)

    Article  Google Scholar 

  56. G. Pradhan, S.R.M. Prasanna, Speaker verification by vowel and nonvowel like segmentation. IEEE Trans. Audio Speech Lang. Process. 21(4), 854–867 (2013)

    Article  Google Scholar 

  57. L. Rabiner, B.H. Juang, Fundamentals of Speech Recognition (Prentice-Hall, Inc., London, 1993)

    Google Scholar 

  58. F. Rudzicz, Phonological features in discriminative classification of dysarthric speech, in Proceedings of International Conference on Acoustics, Speech and Signal Processing, pp. 4605–4608 (2009)

  59. S.A. Selouani, H. Dahmani, R. Amami, H. Hamam, Using speech rhythm knowledge to improve dysarthric speech recognition. Int. J. Speech Technol. 15(1), 57–64 (2012)

    Article  Google Scholar 

  60. D.J. Sheskin, Handbook of Parametric and Nonparametric Statistical Procedures (Chapman and Hall/CRC, London, 2003)

    Book  MATH  Google Scholar 

  61. S. Skodda, W. Visser, U. Schlegel, Vowel articulation in Parkinson’s disease. J. Voice 25(4), 467–472 (2011)

    Article  Google Scholar 

  62. A. Tripathi, S. Bhosale, S.K. Kopparapu, Improved speaker independent dysarthria intelligibility classification using deepspeech posteriors, in Proceedings of International Conference on Acoustics, Speech and Signal Processing, pp. 6114–6118 (2020)

  63. A. Tsanas, M.A. Little, P.E. McSharry, J. Spielman, L.O. Ramig, Novel speech signal processing algorithms for high-accuracy classification of Parkinson’s disease. IEEE Trans. Biomed. Eng. 59(5), 1264–1271 (2012)

    Article  Google Scholar 

  64. G. Van Nuffelen, C. Middag, M. De Bodt, J.P. Martens, Speech technology-based assessment of phoneme intelligibility in dysarthria. Int. J. Lang. Commun. Disord. 44(5), 716–730 (2009)

    Article  Google Scholar 

  65. O. Viikki, K. Laurila, Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Commun. 25(1–3), 133–147 (1998)

    Article  Google Scholar 

  66. J.G. Wilpon, L.R. Rabiner, T. Martin, An improved word-detection algorithm for telephone-quality speech incorporating both syntactic and semantic constraints. AT &T Bell Lab. Tech. J. 63(3), 479–498 (1984)

    Article  Google Scholar 

  67. B. Wilson, J. Blaney, Acoustic variability in dysarthria and computer speech recognition. Clin. Linguist. Phon. 14(4), 307–327 (2000)

    Article  Google Scholar 

  68. I.C. Yadav, G. Pradhan, Significance of pitch-based spectral normalization for children’s speech recognition. IEEE Signal Process. Lett. 26(12), 1822–1826 (2019)

    Article  Google Scholar 

  69. V. Young, A. Mihailidis, Difficulties in automatic speech recognition of dysarthric speakers and implications for speech-based applications used by the elderly: A literature review. Assist. Technol. 22(2), 99–112 (2010)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Laxmi Priya Sahu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sahu, L.P., Pradhan, G. Analysis of Short-Time Magnitude Spectra for Improving Intelligibility Assessment of Dysarthric Speech. Circuits Syst Signal Process 41, 5676–5698 (2022). https://doi.org/10.1007/s00034-022-02047-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-022-02047-x

Keywords

Navigation