Elsevier

Speech Communication

Volume 125, December 2020, Pages 61-68
Speech Communication

Duration of the rhotic approximant /ɹ/ in spastic dysarthria of different severity levels

https://doi.org/10.1016/j.specom.2020.09.006Get rights and content

Highlights

  • Utilization of duration of rhotic approximant /ɹ/ to assess dysarthric speech.

  • Utilization of QCP spectrograms in measuring the duration of rhotic approximant.

  • This study reveals the effect of dysarthria severity on the duration of /ɹ/.

  • In addition, the study shows the effect of phonetic context on /ɹ/ duration.

Abstract

Dysarthria is a motor speech disorder leading to imprecise articulation of speech. Acoustic analysis capable of detecting and assessing articulation errors is useful in dysarthria diagnosis and therapy. Since speakers with dysarthria experience difficulty in producing rhotics due to complex articulatory gestures of these sounds, the hypothesis of the present study is that duration of the rhotic approximant /ɹ/ distinguishes dysarthric speech of different severity levels. Duration measurements were conducted using the third formant (F3) trajectories estimated from quasi-closed-phase (QCP) spectrograms. Results indicate that the severity level of spastic dysarthria has a significant effect on duration of /ɹ/. In addition, the phonetic context has a significant effect on duration of /ɹ/, the ɪ-r-ɛ context showing the largest difference in /ɹ/ duration between dysarthric speech of the highest severity levels and healthy speech. The results of this preliminary study can be used in the future to develop signal processing and machine learning methods to automatically predict the severity level of spastic dysarthria from speech signals.

Introduction

Dysarthria is a disorder resulting from weaknesses of neuromuscular execution in motor speech production due to brain tumours, brain injury, stroke, cerebral palsy and facial paralysis (Duffy, 2013). It causes defects in the articulation of speech sounds, which reduces the intelligibility of speech (Doyle et al., 1997). Previous acoustic studies have indicated that dysarthric speech shows increased word and syllable durations, slower transitions between phonemes, and reduced overall speech rate, which are all correlated with a reduced range of articulatory movements in patients with spastic dysarthria (Kent et al., 1992). Slower speaking rate and prolongation of syllable duration have been reported in a few previous studies as distinguished features of dysarthric speech. A study of syllable duration showed that speakers with spastic and ataxic dysarthria exhibit greater mean syllable duration compared to healthy controls (Turner and Weismer, 1993). Another study on syllabic timing revealed that the reduced range of articulatory movements gives rise to a prolongation of syllables in speakers with cerebellar dysarthria (Ackermann and Hertrich, 1994). However, it has also been reported that prolongation affects some syllables more than others (Kent et al., 1999).

The articulatory complexity in manner of articulation (MOA) and place of articulation (POA) can affect duration of speech units in dysarthria (Van Nuffelen et al., 2009). A study on tongue tip kinematic deviations in speakers with spastic dysarthria showed a reduced range of movements during the production of alveolar sounds (Kim et al., 2010b). Moreover, the study on American English native speakers revealed that POAs such as alveolar, post-alveolar, and palatal-alveolar are difficult articulatory gestures for speakers with cerebral palsy (Kim et al., 2010a). The investigation by Kim et al. (2010a) prompted the present authors to study the effects of dysarthria severity in the production of alveolar sounds, more specifically, the rhotic approximant /ɹ/. The hypothesis of this study is that the severity level of spastic dysarthria is reflected by the duration of the rhotic approximant /ɹ/ due to complex articulatory gestures required in production of rhotic sounds. It is expected that the duration of the rhotic approximant /ɹ/ increases as a function of dysarthria severity. In addition, the relative change of duration between dysarthric speech of different severity levels and healthy speech is expected to be more prominent in the rhotic approximant /ɹ/ compared to the word overall duration. This study of duration of the rhotic approximant /ɹ/ is a preliminary step in efforts to get better acoustic measures for automatic prediction of the dysarthria severity level from speech signals.

In normal speech, segmental duration depends on several factors such as phonetic context, lexical stress, phrase boundaries, speaking rate and gender (van Santen and Olive, 1990, Tsao and Weismer, 1997, Robb et al., 2005, Van Borsel and De Maesschalck, 2008). The investigations by Simpson, 2001, Simpson, 2009 indicated that gender could be an important factor in accounting for changes in duration of speech sounds and that due to the differences in vocal tract cross-section, articulatory distances required to attain phonetics targets are different between males and females. On the other hand, previous studies investigating the effect of phonetic context on duration of speech segments have revealed that phonetic context alters duration of sound units (McCauley and Skenes, 1987, Jongman, 1989, Mendoza et al., 2003, Koenig, 2007). In particular, the studies by Lockenvitz et al. (2015) and Narayanan et al. (1999) showed that segmental duration of rhotic sounds is longer in vocalic contexts than in consonantal contexts. From the above studies, it can be concluded that gender and phonetic context have been shown to affect duration of the rhotic approximant in normal speech. Therefore, the present study considers gender and phonetic context along with dysarthria severity as factors to investigate duration of the rhotic approximant /ɹ/.

Several studies have shown that word durations are longer in utterances spoken by speakers with spastic dysarthria compared to healthy talkers (Kent et al., 1979, Turner and Weismer, 1993, Ackermann and Hertrich, 1994). Moreover, the study by Liss et al. (2009) indicated that duration of vocalic and consonantal segments is longer in speakers with spastic dysarthria than in speakers with Parkinsonian or ataxic dysarthria. The study by Lee and Hustad (2013) revealed that duration of speech units in children with cerebral palsy increases with the severity of dysarthria. Previous investigations on spastic dysarthria have focused on the relationship between dysarthria severity and duration of the overall utterance (e.g. word duration). In contrast to this, the present study addresses the effect of dysarthria severity on duration of the alveolar sound /ɹ/. Duration of /ɹ/ is studied using the third formant (F3) trajectory estimated from speech. The F3 trajectory is estimated from the spectrogram computed using quasi-closed-phase (QCP) analysis that has been shown to be an accurate method to estimate formants (Airaksinen et al., 2014). Effects of three factors (speaker gender, dysarthria severity, phonetic context of /ɹ/) are analysed on duration of /ɹ/ using dysarthric speech data of the UA-Speech database (Kim et al., 2008). In addition, prolongation of /ɹ/ duration due to dysarthria is compared to prolongation of word duration.

Section snippets

Studies on /ɹ/ in healthy speakers

Alveolar approximants produced by American English healthyspeakers have been studied widely in the literature (Espy-Wilson et al., 2000, Zhou et al., 2007, Arai, 2014). Based on the POA and MOA, the rhotic sounds are broadly classified into trills, approximants, taps, flaps, and fricatives. In American English dialects, the most prevalent rhotic sound is the rhotic approximant /ɹ/ (Alwan et al., 1997). Different articulatory configurations involved in the production of rhotic sounds in American

Articulation

A study on American English /ɹ/ reported six different articulatory configurations in the production of rhotic sounds (Delattre and Freeman, 1968). These are broadly classified into alveolar, post-alveolar, and bunched articulations (Alwan et al., 1997, Harper et al., 2016, Zhou et al., 2007, Arai, 2014). In both alveolar and post-alveolar articulations, two constrictions are formed. The first constriction is formed in the lower pharynx region using tongue root and the second constriction is

The experimental setup

In order to estimate the duration of /ɹ/ using the F3 trajectory, this study takes advantage of spectrograms computed with QCP analysis (Airaksinen et al., 2014). Duration of /ɹ/ is measured from healthy and dysarthric speech of the UA-Speech database (Kim et al., 2008). The UA-Speech database provides information about three attributes (speaker gender, dysarthria severity, phonetic context of /ɹ/) whose impact on duration of /ɹ/ is investigated. In the following sub-sections, QCP analysis, the

Results

Previous studies have shown that word durations are longer in dysarthric speech compared to healthy speech (Turner and Weismer, 1993, Ackermann and Hertrich, 1994). Moreover, word duration is more straightforward to be measured than /ɹ/ duration. Therefore, an experiment was conducted in the current study to first study whether duration of /ɹ/ is capable of better indicating changes in the severity level of dysarthria compared to word duration. If duration of /ɹ/ turned out to be a better

Discussion

The present study investigates duration of the rhotic approximant /ɹ/ in dysarthric speech in American English as a function of dysarthria severity, phonetic context, and gender. The study showed that duration of /ɹ/ was significantly higher in dysarthric speech compared to healthy speech. In addition, it was shown that the relative increase in duration of /ɹ/ due to dysarthria was larger than the corresponding relative duration increase in words. Duration of the rhotic approximant produced in

Conclusion

Duration of the American English rhotic approximant /ɹ/ was studied in this investigation in the context of spastic dysarthria severity. Duration values were estimated from F3 trajectories that were obtained by analysing visually QCP spectrograms using a known, straightforward criterion in rhotic detection. The analysis was conducted for dysarthric and healthy speech using the UA-Speech database. The study focused on the effects of three factors (gender, dysarthria severity, phonetic context)

CRediT authorship contribution statement

Krishna Gurugubelli: Conception and design of study, Analysis and/or interpretation of data, Drafting the manuscript, Revising the manuscript critically for important intellectual content. Anil Kumar Vuppala: Conception and design of study, Analysis and/or interpretation of data, Drafting the manuscript, Revising the manuscript critically for important intellectual content. N.P. Narendra: Conception and design of study, Analysis and/or interpretation of data, Drafting the manuscript, Revising

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This study was partly funded by the Academy of Finland (project number 330139).

References (44)

  • KentR.D. et al.

    Acoustic studies of dysarthric speech: Methods, progress, and potential

    J. Commun. Disord.

    (1999)
  • MendozaE. et al.

    Temporal variability in speech segments of spanish: Context and speaker related differences

    Speech Commun.

    (2003)
  • van SantenJ.P. et al.

    The analysis of contextual effects on segmental duration

    Comput. Speech Lang.

    (1990)
  • AckermannH. et al.

    Speech rate and rhythm in cerebellar dysarthria: An acoustic analysis of syllabic timing

  • AiraksinenM. et al.

    Quasi closed phase glottal inverse filtering analysis with weighted linear prediction

    IEEE/ACM Trans. Audio Speech Lang. Process.

    (2014)
  • AlkuP. et al.

    Formant frequency estimation of high-pitched vowels using weighted linear prediction

    J. Acoust. Soc. Am.

    (2013)
  • AlwanA. et al.

    Toward articulatory-acoustic models for liquid approximants based on MRI and EPG data. Part II. The rhotics

    J. Acoust. Soc. Am.

    (1997)
  • Arai, T., 2013. On why Japanese /r/ sounds are difficult for children to acquire. In: Proc. Interspeech. pp....
  • Arai, T., 2014. Retroflex and bunched English /r/ with physical models of the human vocal tract. In: Proc. Interspeech....
  • BoyceS. et al.

    Coarticulatory stability in American English /r/

    J. Acoust. Soc. Am.

    (1997)
  • DelattreP. et al.

    A dialect study of American r’s by X-ray motion picture

    Linguistics

    (1968)
  • DoyleP.C. et al.

    Dysarthric speech: a comparison of computerized speech recognition and listener intelligibility

    J. Rehabil. Res. Dev.

    (1997)
  • DuffyJ.R.

    Motor Speech Disorders: Substrates, Differential Diagnosis, and Management

    (2013)
  • Espy-WilsonC.Y.

    A feature-based semivowel recognition system

    J. Acoust. Soc. Am.

    (1994)
  • Espy-WilsonC.Y. et al.

    Acoustic modeling of American English /r/

    J. Acoust. Soc. Am.

    (2000)
  • Gowda, D., Airaksinen, M., Alku, P., 2016. Quasi closed phase analysis of speech signals using time varying weighted...
  • Harper, S., Goldstein, L., Narayanan, S.S., 2016. L2 acquisition and production of the English rhotic pharyngeal...
  • JacewiczE. et al.

    Vowel duration in three american english dialects

    Am. Speech

    (2007)
  • JacewiczE. et al.

    Between-speaker and within-speaker variation in speech tempo of American English

    J. Acoust. Soc. Am.

    (2010)
  • JongmanA.

    Duration of frication noise required for identification of English fricatives

    J. Acoust. Soc. Am.

    (1989)
  • Kaland, C., Galatà, V., Spreafico, L., Vietti, A., 2016. /r/ as language marker in bilingual speech production and...
  • KentJ.F. et al.

    Quantitative description of the dysarthria in women with amyotrophic lateral sclerosis

    J. Speech Lang. Hear. Res.

    (1992)
  • Cited by (0)

    View full text