Research Article
Cross-linguistic perception of clearly spoken English tense and lax vowels based on auditory, visual, and auditory-visual information

https://doi.org/10.1016/j.wocn.2020.100980Get rights and content

Highlights

  • Clear speech benefits depend on the phonetic contrast and the stimulus modality.

  • Acoustic/visual modifications from clear speech conflict with vowel tensity cues.

  • Natives and non-natives show a clear speech disadvantage on visual lax vowels.

  • Results support cue-specific models of clear speech effects.

Abstract

The effect of clear speech on the integration of auditory and visual cues to the tense-lax vowel distinction in English was investigated in native and non-native (Mandarin) perceivers. Clear speech benefits for tense vowels /i, ɑ, u/ were found for both groups across modalities, while lax vowels /ɪ, ʌ, ʊ/ showed a clear speech disadvantage for both groups when presented in the visual-only modality, with Mandarin perceivers showing a further disadvantage for lax vowels presented audio-visually, and no difference in speech styles auditorily. English perceiver responses were then simulated in an ideal perceiver model which both identified auditory (F1, F2, spectral change, duration) and visual (horizontal lip stretch, duration) cues predictive of the clear speech advantage for tense vowels, and indicated which dimensions presented the greatest conflict between cues to tensity and modifications from clear speech (F2 and duration acoustically, duration visually). Altogether, by combining clear speech acoustics, articulation, and perception into a single integrated framework we are able to identify some of the signal properties responsible for both beneficial and detrimental speech style modifications.

Introduction

Face-to-face speech communication may adopt different forms and styles depending on speaking environments or communicative needs. In auditorily or visually challenging contexts, talkers often alter speech production using a clarified, hyper-articulated speech style to enhance intelligibility. This results in both articulatory and acoustic modifications (Gagné et al., 2002, Helfer, 1997, Moon and Lindblom, 1994, Payton et al., 1994, Picheny et al., 1985, Uchanski et al., 1996). This well attested style of speech raises important questions as to whether and how these articulatory and acoustic changes are utilized by the perceiver to improve intelligibility. While the question of perceiver benefits has been addressed by several prior studies (Bradlow and Bent, 2002, Ferguson and Kewley-Port, 2002, Krause and Braida, 2002, Picheny et al., 1985, Uchanski et al., 1996), fully addressing these questions requires us to simultaneously understand the (implicit) motivation of the talker to modify their articulation, the specific articulatory changes of the talker, and the resultant effects on perception. This has not been attempted by prior studies. Thus, the present study investigates the entire speech chain, by examining the effects of clear (relative to plain) speech on auditory-visual (AV) perception of English tense and lax vowels by native (English) and non-native (Mandarin) perceivers, as well as the association between articulatory-acoustic clear-speech modifications and intelligibility.

Clear speech, a type of hyper-articulation, has been explained within the framework of the H & H (hyper- and hypo-articulation) theory (Lindblom, 1990). Under this view, hyper-articulated speech is typically produced with the intention to enhance sound category discriminability in response to challenging listening situations. Clear speech has been claimed to arise from two levels of modifications: signal-based and code-based (Bradlow & Bent, 2002).

First, talkers could globally modify the signal to enhance general acoustic clarity or saliency (signal-based modifications). For example, they could raise the pitch or change the dynamic pitch range, decrease speaking rate and insert more pauses, or they could increase the amplitude to help separate speech and noise. Such modifications would presumably be uniformly beneficial to all listeners, both native (L1) and non-native (L2).

Second, talkers could also engage what Bradlow and Bent term code-based modifications. Such modifications could enhance acoustic distance between phonemic categories, for example, by altering the formants to make two vowels more phonetically distinct (e.g., Leung, Jongman, Wang, & Sereno, 2016), by non-uniformly modifying segment durations (e.g., lengthening typically longer tense vowels more than lax) (Leung et al., 2016), by producing less vowel reduction (Ferguson & Kewley-Port, 2007), or by just maintaining pronunciation norms (coarticulation, voice onset time) in speech (Ohala, 1995).

Both of these modifications must retain segmental cues and keep those cue values within the intended category, so that phonemic categorical distinctions can be maintained (Moon and Lindblom, 1994, Ohala, 1995). Thus, clear-speech effects must involve coordination of signal- and code-based strategies to enhance as well as preserve phonemic distinctions (Moon and Lindblom, 1994, Ohala, 1995, Smiljanić and Bradlow, 2009). This may be more challenging in cases where signal-based cues like duration or pitch also serve code-based functions.

In considering the interaction of clear-speech effects on various cues on perception, it is clear that cues and their influences cannot be examined individually. McMurray and Jongman (2011), for example, examined 24 distinct cues to fricatives (and see Cole, Linebaugh, Munson, & McMurray, 2010, for applications to vowels). Individually, most, if not all, of these cues were highly variable and were insufficient to distinguish the fricatives, and even optimally weighting and combining them could not lead to listener-like levels of performance. However, when the same cues were subjected to a simple model that accounted for various causal factors (e.g., talker differences, coarticulation), they were able to predict listener performance fairly accurately. This suggests that to properly understand the way a given factor (like clear speech) affects perception, one must determine (1) how its effects on multiple cues are weighted and combined to lead to the percept; and (2) how the effect of the factor of interest (e.g., clear speech) fits into the context of other known influences on the acoustics (e.g., talker differences). We accomplish this here by using the Computing Cues Relative to Expectations (C-CuRE) framework (McMurray & Jongman, 2011), which relativizes cues to speaker means and then combines them in a statistical learning model (typically within the logistic family of models) meant to approximate the decision problem presented to listeners in a perception experiment. We use this framework for the following: (1) to weight and combine cues; (2) to understand the variety of factors (clear speech and beyond) that influence the acoustics and articulation; and (3) to link acoustic and visible articulatory modifications to response patterns in perception.

Clear speech has been shown to be more intelligible than plain, conversational speech. This is particularly so when listening conditions are challenging, such as in background noise (Ferguson and Kewley-Port, 2002, Ferguson and Quené, 2014, Krause and Braida, 2002, Payton et al., 1994, Uchanski et al., 1996), or when listeners are hearing-impaired (Bradlow et al., 2003, Liu et al., 2004, Picheny et al., 1985) or are non-native listeners (Bradlow & Bent, 2002). Clear speech typically results in a gain of about 7–38% of tokens recognized in clear speech relative to plain speech (Ferguson and Kewley-Port, 2002, Ferguson and Quené, 2014, Maniwa et al., 2009, Payton et al., 1994, Uchanski et al., 1996). This clear-speech advantage has been observed at different linguistic levels, for sentences (Bradlow and Bent, 2002, Gagné et al., 1995, Krause and Braida, 2002, Payton et al., 1994), words (Gagné et al., 1994, Uchanski et al., 1996), and segments (Ferguson and Kewley-Port, 2002, Ferguson and Quené, 2014, Gagné et al., 2002).

Specifically relevant for the current study is research on vowel intelligibility in English. Ferguson (2004) tested the intelligibility of ten English vowels (/i, ɪ, e, ɛ, æ, ɑ, ʌ, o, ʊ, u/ in a /bVd/ context) in plain and clear speech styles by 7 young healthy adult native English-speaking listeners. The stimuli were presented auditorily in multi-talker babble noise (−10 dB SNR). The results show that clear speech was 8.5% more intelligible on average than plain speech. Results for individual vowels as shown in Figure 1 of Ferguson (2004) suggest a significant clear-speech advantage for /æ, ɑ, ʌ/. Detailed analyses of the acoustics of the stimuli in Ferguson (2004), as well as the relation between the acoustics and intelligibility, were provided in a subsequent study by Ferguson and Quené (2014). We will refer to those results in Section 1.3.2 below.

Clear speech can also improve intelligibility in visual (facial) speech perception (Gagné et al., 1994, Gagné et al., 2002, Helfer, 1997, Lander and Capek, 2013, Van Engen et al., 2014). For example, Gagné et al., 1994, Gagné et al., 2002 examined the perception of clear and plain French CV syllables (/b, d, g, v, z, ʒ/ + /i, y, a/) and found significant clear-speech gains in the intelligibility of AV, visual-only, as well as auditory-only presentations. These findings demonstrate the existence of a clear-speech advantage across input modalities, suggesting that clear speech affects not only acoustic cues, but also visual cues.

Gagné et al. (2002) suggest the magnitude of the clear-speech benefit in visual speech may be less than in the auditory modality. Moreover, while either speaking clearly or providing visual speech information can be beneficial, the combination of the two can result in greater intelligibility gains than each domain alone (Helfer, 1997). Thus, speech style and modality may interact to affect speech intelligibility. This raises the question of what factors give rise to this interaction. However, research has not systematically explored under what circumstances clear-speech benefits may differ in auditory versus visual conditions.

A critical issue in understanding the mechanisms of these variable intelligibility gains is the question of the degree to which perceivers weight (or use) inputs from different modalities (or different cues within a modality). In AV speech perception, the weight granted to auditory versus visual cues can be affected by the relative quality of the information in each channel (Gagné et al., 2002, Hazan et al., 2010). For example, a compensatory modality weighting effect has been found where perceivers utilize information from an alternate modality (e.g., visual) when the other (auditory) was degraded (Hazan et al., 2010, Van Engen et al., 2014). Similarly, perceivers rely more on the auditory modality for low vowels, as the acoustic cue to vowel height (F1) is more salient, whereas they put more weight on the visual input to perceive rounded vowels since the visual cue (lip-rounding) is more salient (Robert-Ribes et al., 1998, Traunmüller and Öhrström, 2007). Likewise, higher visual perceptual accuracy was found for identification of the visually more salient labial/labio-dental consonantal contrasts compared to visually less salient alveolar/post-alveolar contrasts (Hazan et al., 2006, Wang et al., 2008).

These patterns of AV weighting raise questions regarding the role of clear speech in AV perception: Does clear speech enhance code-based cues only, making them more salient as category-distinctive cues? Or, does clear speech involve global signal-based enhancement, resulting in increased salience of information across modalities? Or do these enhancements vary across modality?

Although clear speech consistently benefits typical native language adult listeners, research on non-native perception suggests clear speech may be less helpful or even detrimental for L2 listeners (Bradlow and Bent, 2002, Fenwick et al., 2015, Granlund et al., 2012, Smiljanić and Bradlow, 2011). For example, Bradlow and Bent (2002) found substantially smaller clear-speech benefits for non-native listeners as compared to native listeners in the intelligibility of clearly produced English sentences.

What can account for such differences? Bradlow and Bent (2002) suggest that both groups are able to take advantage of signal-based modifications, which are largely language-independent, accounting for the small benefit in L2 listeners. However, these groups may differ in their ability to use code-based modifications. Native speakers have extensive experience with the language and are knowledgeable about the particular phonetic realizations of segments in their language, as well as the higher-level contextual structures. This enables them to make use of code-based modifications. In contrast, non-native speakers have less experience with these aspects of the code (in their L2) and may not have been able to perceive or utilize code-based clear-speech cue enhancements specific to the L2.

Research has shown evidence supporting a code-based component to the small clear-speech intelligibility gains in non-native listeners. For example, in contrast to non-proficient L2 listeners (Bradlow & Bent, 2002), fluent L2 listeners showed significantly larger clear-speech intelligibility gains in the perception of English sentences (Smiljanić & Bradlow, 2011). Indeed, further research at the segmental level has shown that the degree and direction of clear-speech effects on non-native speech intelligibility may depend on the relation between L1 and L2 phonetic inventories. Fenwick et al. (2015) tested AV perception of Sindhi consonants in consonant–vowel syllables in clear and plain speech by Australian-English perceivers. The consonants contrasted both in place of articulation (POA) and voicing, and in their proximity to the perceivers’ L1 (English), with phonologically “two-category” contrasts (/ɓ-ɗ/ [POA] and /f-v/ [voicing]) and phonetic-level “category-goodness” differences (/d̪-ɖ/ [POA] and /t̪-d̪/ [voicing]). While the results showed no clear-speech effects for the stimuli with POA contrasts, a clear-speech benefit was found for voicing only for the phonetic-level category goodness differences but not for the two-category contrasts. The results show that clear speech can benefit non-native perception when the contrasts are perceived as differing in phonetic “category-goodness”, indicating benefits from within-category enhancement may be at the “signal” rather than “code” level (cf. Bradlow & Bent, 2002) for the non-native listeners.

These non-native patterns in clear speech reflect the influence of linguistic experience. Clear-speech benefits may be less robust when non-native listeners are less knowledgeable about the sounds in the L2, or about the specific cues to phonetic contrasts in the L2 (Smiljanić & Bradlow, 2009), or when they are less proficient in the L2 (Smiljanić & Bradlow, 2011). Such findings underscore the possibility of code-based modifications that are specific to the phonetics and phonology of the language. On the other hand, non-native listeners may also benefit from clear speech in the L2 when the modifications are perceived as signal-enhancing cues in their L1 (cf. Fenwick et al., 2015), supporting additional, more general signal-based modifications for the clear-speech effect.

Together, results from native and non-native clear-speech perception across AV modalities demonstrate differences in clear-speech benefits that may be triggered by saliency-enhancing (signal-based) and category-enhancing (code-based) cues. However, intelligibility data alone cannot disentangle whether any observed perceptual patterns are directly attributable to signal-based or code-based modifications in production.

Isolating code-based from signal-based effects in clear speech is challenging with intelligibility data alone, particularly in L1 speakers where variation in linguistic knowledge cannot be brought into play. However, in L2 speakers this can be difficult as well, given overlap between the languages and variation in the degree of L2 experience.

In contrast, phonetic studies may be able to isolate code-based changes by examining the specific acoustic and articulatory modifications to aspects of the signal that indicate speech categories. Understanding the details of what is changing acoustically and articulatorily/visually will shed light on differentiating code-based and signal-based effects in clear speech.

Research on acoustic and articulatory correlates of clear speech (Ferguson and Quené, 2014, Ferguson and Kewley-Port, 2002, Ferguson and Kewley-Port, 2007, Leung et al., 2016, Tang et al., 2015, Tasko and Greilick, 2010, Yehia et al., 2002) has shown that clear speech involves more extreme articulatory configurations and correspondingly, more exaggerated acoustic properties than are seen in plain speech. In the acoustic domain, studies examining English vowels produced in controlled segmental contexts (Ferguson and Quené, 2014, Ferguson and Kewley-Port, 2002, Ferguson and Kewley-Port, 2007, Leung et al., 2016) or excised from natural sentential contexts (Hazan and Baker, 2011, Kim and Davis, 2014, Lam et al., 2012, Picheny et al., 1985, Smiljanić and Bradlow, 2008) consistently reveal that vowel duration increases in clear speech relative to plain speech. Given that this is a global lengthening across all vowels, it is assumed to be a signal-based effect.

However, vowel length is a useful phonetic cue for distinguishing tense and lax vowels. In Leung et al. (2016), measures of both absolute and relative vowel duration showed a greater lengthening in clear speech for tense vowels than for lax vowels. This data suggests that clear-speech modifications differentially enhance the properties of vowels (tense vowels being intrinsically longer than lax vowels), suggesting instead a code-based modification.

In this same vein, clear and plain vowels also differ in the spectral domain. Clearly produced vowels are characterized by a larger vowel space (F1 × F2 space) than plain vowels (Cooke and Lu, 2010, Ferguson and Kewley-Port, 2007, Ferguson and Quené, 2014, Leung et al., 2016, Smiljanić and Bradlow, 2005), suggesting a code-based modification. Moreover, F1 modifications may also reflect signal-based properties: plain-to-clear-speech modifications generally involve a global increase in F1 regardless of the height of the vowel (Ferguson and Kewley-Port, 2002, Ferguson and Quené, 2014, Huber et al., 1999, Lu and Cooke, 2008). Furthermore, clearly produced vowels are globally found to be more dynamic than plain vowels, as indicated by relative formant changes along the formant trajectories (Ferguson and Kewley-Port, 2002, Ferguson and Kewley-Port, 2007, Leung et al., 2016, Moon and Lindblom, 1994), all suggesting signal-based modification.

However, the degree of vowel dynamicity varies among individual vowels, suggesting a more code-based component. In particular, the more dynamic lax vowels show greater spectral change in clear speech than the intrinsically less dynamic tense vowels (Assmann and Katz, 2005, Ferguson and Kewley-Port, 2007, Hillenbrand and Nearey, 1999, Leung et al., 2016).

Articulatory studies have also revealed both code- and signal-based clear-speech modifications. For example, Tang et al. (2015), which examined visible articulatory movements in English vowel production using computational image analysis, has shown that talkers modify their speaking style to produce clear speech with exaggerated visual cues corresponding to code-based articulatory features of different vowels. In particular, in clear compared to plain speech, the results show greater horizontal lip stretch for front unrounded vowels and greater degree of lip rounding and protrusion for rounded vowels. On the other hand, signal-based modifications are shown by a larger jaw opening across vowels in clear relative to plain speech, which is probably a consequence of increased articulatory effort in general, as also claimed previously (Kim & Davis, 2014).

In sum, these production studies have documented both signal-based and code-based changes in clear speech. Yet the question remains as to how the effects seen in these acoustic and articulatory measurements are linked to intelligibility. In particular, no acoustic or articulatory analysis has yet adopted the more comprehensive approach, as in the C-CuRE framework of McMurray and Jongman (2011), to ask how specific acoustic cues (as opposed to broad measures of clarity like vowel space area) contribute to perception, or how this may be impacted by other sources of variation.

Research relating clear-speech acoustic patterns to perception could be crucial in identifying the locus of the clear-speech advantage as it can reveal which modifications most predict intelligibility gains. Such work is scarce.

Lam et al. (2012) used regression analyses to directly relate acoustic features in clear speech to sentence intelligibility. In clear speech, increases in intelligibility were related to greater increases in the area of the tense vowel space, greater dynamic spectral changes for lax vowels, along with greater reduction in speaking rate and greater increases in intensity. Although not specifically targeting segment-level intelligibility, these findings indicate that enhanced intelligibility in clear speech may be associated with different acoustic cues depending on the features of different sound categories.

Ferguson and Quené (2014) used Generalized Linear Mixed Modeling to relate their acoustic measurements to the intelligibility data reported in Ferguson (2004). Their results are generally in good agreement with those of Lam et al. (2012) in that a decrease in speaking rate, increase in F1 (due to greater mouth opening in an effort to increase intensity), and increase in the vowel space area all contributed to a clear-speech intelligibility benefit. In addition, greater F1 and F2 movement over the vowel nucleus in the clear production of the vowels /e, o, ʊ, u/ was also seen to enhance their intelligibility. Thus, like Lam and colleagues, this suggests both signal and code-based modifications are important.

In terms of articulation, studies using kinematic measures have shown positive correlations of articulation and acoustics with clear-speech effects on intelligibility (Kim and Davis, 2014, Kim et al., 2011, Tasko and Greilick, 2010). For example, Kim et al. (2011) tracked the motion of facial markers during clear speech produced in quiet or in the presence of background noise (Lombard speech), and coupled this with tests of the audio-visual intelligibility of these productions in noise. Motion tracking results revealed a greater degree of articulatory movement in speech in noise (clear speech) than in quiet (plain speech), with the differences correlated with speech acoustics. Moreover, increased movement of the jaw and mouth (greater degree of opening) during clear speech translated to increased intelligibility, indicating that clear speech is also more visually distinct than plain speech.

With the exception of sentence-level intelligibility (e.g., Kim et al., 2011), research has not examined the degree to which specific articulatory cues contribute to enhanced intelligibility in clear-speech segments, nor is there robust evidence identifying signal- and code-based modifications in acoustic cues that lead to intelligibility gains. The gap in this area of work reveals the need for research to establish the link between specific articulatory and acoustic features used in clear-speech segmental productions and the impact of these features on the intelligibility of clear-speech segments. Critically, here by adopting an explicit computational model of perception (McMurray & Jongman, 2011), we can examine the impact of clear speech on the way in which multiple cues combine to yield perception.

The above-reviewed findings on AV clear-speech intelligibility indicate that the perception of clear-speech effects may depend on factors such as the saliency of the source of modifications (acoustic and articulatory), perceptual weighting in auditory and visual modalities, and perceivers’ linguistic experience. However, research has not systematically examined the extent to which these inter-related factors collectively affect intelligibility, nor is it clear whether these modifications are global signal-based changes, or more phonetically specific, code-based changes. Thus, the current study addressed how speech style interacts with AV input modality and perceiver experience in the intelligibility of clear-speech segments, and what acoustic and articulatory modifications are responsible for these interactions.

Specifically, the present study investigates AV perception of English tense and lax vowels in clear speech by native English and Mandarin (L2) perceivers. This study aims to isolate the effects of signal- and code-based acoustic and articulatory clear-speech modifications on the intelligibility of these vowels in two ways. First, we compare the patterns by native and non-native listeners who may interpret signal- and code-level cues differently based on their native language experience. Second, we relate differences in identification to differences in both signal- and code-based cues measured from the acoustic and visual input.

Tense and lax vowels were chosen as target stimuli due to their unique articulatory and acoustic characteristics in relation to clear-speech features. As noted previously (Leung et al., 2016), features that mark plain-to-clear speech modifications and lax-to-tense vowel contrasts are similar, both involving increased duration, fundamental frequency (f0) and intensity, and more peripheral formant frequencies (associated with an expanded vowel space), as well as increased dynamic temporal and spectral changes (Cooke and Lu, 2010, Ferguson and Kewley-Port, 2002, Ferguson and Kewley-Port, 2007, Ferguson and Quené, 2014, Hazan and Baker, 2011, Kim and Davis, 2014, Krause and Braida, 2002, Lu and Cooke, 2008, Picheny et al., 1985). These similarities provide a unique test case to unravel the underlying mechanisms governing clear-speech production and perception based on how the same physical features may be utilized differently depending on different priorities needed for efficient communication.

In terms of the interactive effects of speech style and input modality, first, we hypothesize greater overall intelligibility for vowels produced in clear speech relative to plain speech. This should be seen across tensity (tense vs. lax vowel stimuli) and modality (A vs. V) conditions. This is based on our previous findings of greater articulatory (jaw, lip) movements (Tang et al., 2015) as well as greater acoustic (temporal, spectral) changes (Leung et al., 2016) in plain-to-clear modifications for both tense and lax vowels. However, based on our findings of greater acoustic distinctions between tense and lax vowels in clear (relative to plain) speech, but similar articulatory plain-to-clear modifications for both tensity categories, we predict that the Speech Style × Input Modality interaction would be reflected in perception as well. In particular, code-based acoustic modifications that result in greater tense-lax differences may enhance auditory intelligibility in clear speech, whereas articulatory modifications that do not differentiate tense and lax vowels should not provide a comparable benefit in the visual domain.

Regarding the effects of linguistic experience, we recruited native Mandarin perceivers as the non-native group in order to test the signal- versus code-based hypothesis for clear speech, since unlike English, Mandarin does not have lax counterparts to its tense vowels and this difference poses difficulties for Mandarin native speakers in perceiving the tense and lax vowel distinctions in English (Jia et al., 2006, Wang and Munro, 2004). Based on the previous findings of language-specific, code-based clear-speech effects in the auditory domain (Bradlow and Bent, 2002, Smiljanić and Bradlow, 2011), we predict greater clear-speech benefits for native English than for Mandarin perceivers, particularly for perception of the lax vowels that are unfamiliar to the Mandarin perceivers. However, in the visual domain, on the basis of the previous findings that non-native perceivers may utilize signal-based clear-speech enhancements (Fenwick et al., 2015) and that non-native perceivers generally rely more on the visual domain than native perceivers (Hazan et al., 2006, Wang et al., 2008), we expect Mandarin perception in the current study to be more affected by clear than plain speech (although the effects may be skewed if attention was paid to incorrect visual cues, Hazan et al., 2006, Kirchhoff and Schimmel, 2005, Wang et al., 2008).

Finally, we relate articulatory, acoustic, and perception data to determine the relative weight of each articulatory and acoustic cue in predicting perceiver performance. Extending the previous findings of positive correlations between specific articulatory and acoustic clear-speech modifications and improved overall sentence intelligibility (Ferguson and Kewley-Port, 2002, Kim et al., 2011), we predict similar positive correlations in segmental intelligibility. Furthermore, we expect enhanced clear-speech intelligibility to correlate with those articulatory and acoustic features used to make quantitative modifications, whereas we expect the features used to characterize phonemic categorical contrasts to correlate with identification of different vowels across speech styles.

Section snippets

Perceivers

Twenty-one (19 female) native perceivers of Western Canadian English (aged 19–27, mean: 22) and 30 (18 female) non-native perceivers (aged 18–26, mean: 22) who had Mandarin as their first language (L1) were recruited from the undergraduate and graduate population at Simon Fraser University, Canada. The perceivers reported normal hearing, normal or corrected vision, and no history of speech or language disorders.

The Mandarin perceivers were late, intermediate-level learners of English. According

Results

First, English and Mandarin perceiver identifications of tense and lax vowels were analyzed separately for overall accuracy as a function of speech style and stimulus modality. Accompanying the overall accuracy analysis, we also analyzed the accuracy of identifying specific features (among tensity, height, backness, and rounding distinctions) to understand what specific cue enhancements or distortions underlie the overall effects of clear speech on tense/lax vowel perception. Finally, acoustic

Discussion

Previous research indicates that the perception of clear speech depends on several factors, including the saliency of the source of information (acoustic or articulatory) (Maniwa et al., 2009, Robert-Ribes et al., 1998), the perceptual weighting of auditory and visual cues (Gagné et al., 2002, Helfer, 1997), and the linguistic experience of the perceivers (Bradlow and Bent, 2002, Fenwick et al., 2015). The goal of the present study was to provide a comprehensive approach to the study of clear

Concluding remarks

The approach advocated in the current study is to carefully examine properties of the signal, both acoustic and visual, at the individual cue level to determine which specific properties define the categories that must be identified in perception, and how those properties are affected by changes in speech style. That is, we know from literature on clear-speech acoustics and articulation that modifications are non-uniform across cues, and therefore we expect perceptual uptake of acoustic/visual

Acknowledgements

Portions of this study were presented at the 5th Joint Meeting of the Acoustical Society of America and the Acoustical Society of Japan in Honolulu, Hawaii, 2016. We thank Quentin Qin, Lisa Tang, and members of the SFU Language and Brain Lab for their assistance in stimulus development and data collection. This project was supported by a research grant from the Social Sciences and Humanities Research Council of Canada (SSHRC Insight Grant 435-2012-1641).

References (58)

  • H. Traunmüller et al.

    Audiovisual perception of openness and lip rounding in front vowels

    Journal of Phonetics

    (2007)
  • X. Wang et al.

    Computer-based training for learning English vowel contrasts

    System

    (2004)
  • H.C. Yehia et al.

    Linking facial animation, head motion and speech acoustics

    Journal of Phonetics

    (2002)
  • P.F. Assmann et al.

    Synthesis fidelity and time-varying spectral change in vowels

    Journal of the Acoustical Society of America

    (2005)
  • D. Bates et al.

    Fitting linear mixed-effects models using lme4

    Journal of Statistical Software

    (2015)
  • A. Bradlow et al.

    The clear speech effect for non-native listeners

    Journal of the Acoustical Society of America

    (2002)
  • A.R. Bradlow et al.

    Speaking clearly for children with learning disabilities

    Journal of Speech, Language, and Hearing Research

    (2003)
  • C.G. Clopper et al.

    Acoustic characteristics of the vowel systems of six regional varieties of American English

    Journal of the Acoustical Society of America

    (2005)
  • M. Cooke et al.

    Spectral and temporal changes to speech produced in the presence of energetic and informational maskers

    Journal of the Acoustical Society of America

    (2010)
  • Fenwick, S., Davis, C., Best, C. T., & Tyler, M. D. (2015). The effect of modality and speaking style on the...
  • S. Ferguson

    Talker differences in clear and conversational speech: Vowel intelligibility for normal-hearing listeners

    Journal of the Acoustical Society of America

    (2004)
  • S.H. Ferguson et al.

    Vowel intelligibility in clear and conversational speech for normal-hearing and hearing-impaired listeners

    Journal of the Acoustical Society of America

    (2002)
  • S.H. Ferguson et al.

    Talker differences in clear and conversational speech: Acoustic characteristics of vowels

    Journal of Speech, Language, and Hearing Research

    (2007)
  • S.H. Ferguson et al.

    Acoustic correlates of vowel intelligibility in clear and conversational speech for young normal-hearing and elderly hearing-impaired listeners

    Journal of the Acoustical Society of America

    (2014)
  • J.P. Gagné et al.

    Across talker variability in auditory, visual, and audiovisual speech intelligibility for conversational and clear speech

    Journal of the Academy of Rehabilitative Audiology

    (1994)
  • J.P. Gagné et al.

    Auditory, visual, and audiovisual speech intelligibility for sentence-length stimuli: An investigation of conversational and clear speech

    The Volta Review

    (1995)
  • V. Hazan et al.

    Acoustic-phonetic characteristics of speech produced with communicative intent to counter adverse listening conditions

    Journal of the Acoustical Society of America

    (2011)
  • V. Hazan et al.

    The use of visual cues in the perception of non-native consonant contrasts

    Journal of the Acoustical Society of America

    (2006)
  • K. Helfer

    Auditory and auditory-visual perception of clear and conversational speech

    Journal of Speech, Language, and Hearing Research

    (1997)
  • Cited by (0)

    View full text