Effects of aging on the ability to benefit from prior knowledge of message content in masked speech recognition

https://doi.org/10.1016/j.specom.2011.11.003Get rights and content

Abstract

Under conditions in the presence of competing talkers, presenting the early part of a target sentence in quiet improves recognition of the last keyword of the sentence. This content-priming effect depends on a working-memory resource holding the information of the early presented part of the target speech (the content prime). Older adults usually exhibit declined working memory and experience more difficulties in speech recognition under “cocktail-party” conditions. This study investigated whether speech masking also affects recall of the content prime and whether the content-priming effect declines in older adults. The results show that in both younger adults and older adults, although the content prime was heard in quiet, recall of keywords in the prime was significantly affected by the signal-to-masker ratio of the target/masker presentation. The vulnerability of prime recall to speech masking was larger in older adults than that in younger adults. Also, the content-priming effect disappeared in older adults, even though older adults are able to use the content prime to determine the target speech in the presence of competing talkers. Thus, a speech masker affects not only recognition but also recall of speech, and there is an age-related decline in both content-priming-based unmasking of the target speech and recall of the prime.

Highlights

► Pre-presenting early part of a target sentence in quiet improves recognition of the last keyword. ► Recall of prime keywords is affected by speech masking. ► The vulnerability of prime recall to speech masking is larger in older adults. ► The content-priming effect disappears in older adults. ► Both younger adults and older adults are able to use the content prime to determine target speech in multiple-people talking.

Introduction

The cocktail-party problem, “How do we recognize what one person is saying when others are speaking at the same time?” proposed by Cherry (1953), has been an important issue in psychology, neurophysiology, signal processing, and computer engineering for half a century. It reflects humans’ remarkable ability to detect, locate, discriminate, and identify individual speech sources in the presence of competing talkers.

To improve their recognition of the target speech in a noisy environment in the presence of competing talkers, listeners use some perceptual/cognitive cues available in the environment to facilitate selective attention to the target speech and/or to suppress influences of competing speech stimuli. When peripheral neural activity elicited by a signal is overwhelmed by that elicited by a masker, leading to a degraded or noisy neural representation of the signal, making it difficult for subsequent cognitive processes to extract the signal, this masker produces energetic masking (Freyman et al., 1999, Kidd et al., 1994, Kidd et al., 1998, Leek et al., 1991). However, some of the perceptual/cognitive cues used by listeners do not (substantially) affect energetic masking. These cues include precedence-effect-induced spatial separation between the target image and masker image (Freyman et al., 1999, Freyman et al., 2001, Huang et al., 2008, Huang et al., 2009, Li et al., 2004, Rakerd et al., 2006, Wu et al., 2005), prior knowledge about where and/or when the target speech will occur (Best et al., 2007, Best et al., 2008, Kidd et al., 2005), knowledge/familiarity of the target-talker’s voice (Brungart et al., 2001, Helfer and Freyman, 2009, Huang et al., 2010, Newman and Evers, 2007, Yang et al., 2007), prior knowledge about the topic of the target sentence (Helfer and Freyman, 2008), and viewing a speaker’s movements of the speech articulators (Grant and Seitz, 2000, Helfer and Freyman, 2005, Rosenblum et al., 1996, Rudmann et al., 2003, Sumby and Pollack, 1954, Summerfield, 1979). It appears that many perceptual/cognitive cues, if they facilitate listeners’ selective attention on the target speech and ignorance of competing speech, can improve recognition of the target speech against competing speech by reducing informational masking (for the concept of informational masking, see Arbogast et al., 2002, Agus et al., 2009, Freyman et al., 1999, Helfer and Freyman, 2009, Kidd et al., 1994, Kidd et al., 1998, Kidd et al., 2005, Leek et al., 1991, Schneider et al., 2007).

In addition to the cues described above, prior knowledge (memory) of the early part of a target sentence (i.e., the content prime) improves listeners’ recognition of speech in a masker. More specifically, when either a noise masker or speech masker is present, recognition of the last (third) keyword in a three-keyword sentence is improved if the content prime, an early segment of the same sentence (including the first two keywords), is presented in quiet (Ezzatian et al., 2011, Freyman et al., 2004, Yang et al., 2007). Since the target sentences used in these studies are meaningless (“nonsense”), listeners receive no contextual support from the content prime for recognizing the last keyword. Moreover, the priming benefit is much larger when the masker is speech than when the masker is noise (Ezzatian et al., 2011, Freyman et al., 2004, Yang et al., 2007). As suggested by Freyman et al. (2004), the content prime mainly helps listeners focus attention more quickly on the target, thereby facilitating recognition of the last keyword in the target stream against speech informational masking, “which is caused by confusion between the target and masker and/or uncertainty regarding the target” (Helfer and Freyman, 2009).

It should be emphasized that in humans the content-priming effect depends on a memory resource that holds the prime-content information during the target/masker presentation. However, working memory in humans, which is a system for temporary storage and processing of information during the performance of cognitive tasks (Baddeley, 1986), is vulnerable to disruptive influences. Thus, recall of the content prime may be affected by the presentation of the masker, particularly at low signal-to-masker ratios (SMRs). In previous human studies of the content-priming effects (Ezzatian et al., 2011, Freyman et al., 2004, Yang et al., 2007), the accuracy of recalling the prime is not reported. One of the purposes of this study is to investigate whether recall of keywords in the content prime is affected by speech masking.

Older adults often experience difficulties understanding speech under conditions with multiple people talking at the same time (e.g., Agus et al., 2009, Cheesman et al., 1995, Duquesnoy, 1983, Frisina and Frisina, 1997, Gelfand et al., 1988, Helfer and Freyman, 2008, Helfer and Freyman, 2009, Helfer and Wilber, 1990, Helfer et al., 2010, Huang et al., 2008, Huang et al., 2010, Humes and Roberts, 1990, Jerger et al., 1991, Rossi-Katz and Arehart, 2009, Schneider et al., 2000, Tun et al., 2002). The age-related difficulties may be due to both age-related bottom-up deficits at the sensory level (including reduced temporal and/or spectral sensitivities) and age-related top-down deficits at the cognitive level (including declines in selective attention, working memory, inhibitory control, and processing pace) (for reviews see Schneider, 1997, Schneider et al., 2007). Particularly, working memory generally declines in older adults (Salthouse, 1991, Verhaeghen et al., 1993). Previous studies have shown that in addition to the peripheral contribution to sound audibility, some cognitive factors such as working memory, attention, inhibitory control, and speed of processing contribute significantly to speech perception, particularly under noisy listening conditions (for reviews see Humes, 2007, Schneider et al., 2007). The age-related declines in cognitive function may also be associated with age-related impairment of speech recognition. Particularly related to this study, the inhibitory–deficit hypothesis (Hasher and Zacks, 1988) suggests that the age-related decline in working memory is a result of a decrease in the ability to inhibit irrelevant information in working memory. Decreased inhibitory mechanisms cannot prevent irrelevant information from both coming into working memory and occupying storage capacity/processing resources, leading to reduced working memory. Thus, because the presentation of the content prime in quiet is immediately followed by the target/masker complex, it is predicted that recall of keywords in the prime is vulnerable to speech masking, particularly for older adults. Also, if there is an age-related deficit in the memorial preservation of the prime signal, the content-priming effect would be reduced in older adults. However, a recent study by Ezzatian et al. (2011) shows that English-speaking older adults are equivalent to their age controls (younger adults) in the amount of benefit they gain from content priming, suggesting that older adults are as capable as younger adults in using the prime to facilitate parsing the auditory scene and recognizing words. This study also investigated whether recall of the prime content is more affected by the speech masker in older adults than in younger adults, and whether there is an age-related reduction of the benefit from content priming in Chinese-speaking old adults.

In previous studies of the content-priming effect (Ezzatian et al., 2011, Freyman et al., 2004, Yang et al., 2007), the content prime was not the only cue for segregating the target speech from competing (masking) speech. In a test trial of these studies, the target sentence was started about 1 s after the onset of the masker, and listeners were instructed to attend to the speech sentence with the delayed onset and repeat the sentence after all the stimuli terminated. Thus, the masker/target onset delay was heavily used by listeners to reduce the target/masker confusion and quickly determine which stimulus stream was the target among the target/masker complex. If the masker/target onset delay is removed, the content prime will become the only semantic cue helping listeners attend to the target sentence (see Helfer and Freyman, 2009). The present study specifically investigated whether the content prime can be used to determine the target-speech stream in the presence of competing talkers when the onset delay cue is absent and whether an increase of the prime length from four syllables to eight syllables improves recognition of the last keyword in younger adults and older adults.

Section snippets

Participants

Twenty-four younger adults (15 females and 9 males, mean age = 24.0 yr between 20 and 27 yr) recruited from Peking University and 12 older adults (7 females and 5 males, mean age = 66.4 yr between 57 and 75 yr) recruited from the local community participated in Experiment 1 of this study. All the participants had symmetrical hearing (no more than a 15-dB difference between the two ears). Younger participants had pure-tone hearing thresholds no more than 25 dB HL between 0.125 and 8 kHz, and the older

Experiment 2: Without the onset delay cue

In Experiment 2, the masker/target onset delay was removed, and the content prime became the only semantic cue providing participants a means of attending to the target sentence. Thus, Experiment 2 was to investigate whether listeners are still able to determine the target-speech stream in a speech complex with three female voices when the onset-delay cue is absent, and whether an increase of the prime length from four syllables (including the first keyword) to eight syllables (including both

Recall of keywords in the prime content was affected by speech masking

As mentioned in Section 1, under “cocktail-party” listening conditions, listeners can use various perceptual/cognitive cues to facilitate their selective attention on the target speech and ignorance of competing speech, leading to a release of the target speech from informational masking. The content prime helps listeners focus attention more quickly on the target and facilitate recognition of the last keyword in the target stream (Ezzatian et al., 2011, Freyman et al., 2004, Yang et al., 2007).

Conclusions

The results of this study indicate that the content prime plays a role in unmasking speech from informational masking. The content prime helps listeners determine the target-speech stream among a multi-people-talking complex. Thus, even when the powerful onset-delay cue is absent, listeners are still able to use the semantic cue to determine and follow the target speech against speech masking. Moreover, recall of the content prime is also affected by speech masking, and the masking effect is

Acknowledgments

This work was supported by the “973” National Basic Research Program of China (2009CB320901; 2010DFA31520; 2011CB707805), the National Natural Science Foundation of China (31170985; 30711120563, 90920302, 60811140086), the Chinese Ministry of Education (20090001110050), and “985” grants from Peking University.

References (59)

  • V. Best et al.

    Object continuity enhances selective auditory attention

    Proc. Natl. Acad. Sci. USA

    (2008)
  • V. Best et al.

    Visually-guided attention enhances target identification in a complex auditory scene

    J. Assoc. Res. Otolaryngol.

    (2007)
  • D.S. Brungart et al.

    Informational and energetic masking effects in the perception of multiple simultaneous talkers

    J. Acoust. Soc. Amer.

    (2001)
  • S.-Y. Cao et al.

    Improvement of intelligibility of ideal binary-masked noisy speech by adding background noise

    J. Acoust. Soc. Amer.

    (2011)
  • M.F. Cheesman et al.

    Comparison of growth of masking functions and speech discrimination abilities in younger and older adults

    Audiology

    (1995)
  • C.E. Cherry

    Some experiments on the recognition of speech, with one and with two ears

    J. Acoust. Soc. Amer.

    (1953)
  • A.J. Duquesnoy

    Effect of a single interfering noise or speech source upon the binaural sentence intelligibility of aged persons

    J. Acoust. Soc. Amer.

    (1983)
  • P. Ezzatian et al.

    The effect of priming on release from informational masking is equivalent for younger and older adults

    Ear Hear.

    (2011)
  • R.L. Freyman et al.

    Spatial release from informational masking in speech recognition

    J. Acoust. Soc. Amer

    (2001)
  • R.L. Freyman et al.

    Effect of number of masking talkers and auditory priming on informational masking in speech recognition

    J. Acoust. Soc. Amer.

    (2004)
  • R.L. Freyman et al.

    The role of perceived spatial separation in the unmasking of speech

    J. Acoust. Soc. Amer.

    (1999)
  • Fukada, T., Tokuda, K., Kobayashi, T., Imai, S., 1992. An adaptive algorithm for mel-cepstral analysis of speech. In:...
  • S.A. Gelfand et al.

    Sentence reception in noise from one versus two sources: effects of aging and hearing loss

    J. Acoust. Soc. Amer.

    (1988)
  • K.W. Grant et al.

    The use of visible speech cues for improving auditory detection of spoken sentences

    J. Acoust. Soc. Amer.

    (2000)
  • K.S. Helfer

    Auditory and auditory-visual perception of clear and conversational speech

    J. Speech Lang. Hear. Res.

    (1997)
  • K.S. Helfer et al.

    Aging, spatial cues, and single- versus dual-task performance in competing speech perception

    J. Acoust. Soc. Amer.

    (2010)
  • K.S. Helfer et al.

    The role of visual speech cues in reducing energetic and informational masking

    J. Acoust. Soc. Amer.

    (2005)
  • K.S. Helfer et al.

    Aging and speech-on-speech masking

    Ear Hear.

    (2008)
  • K.S. Helfer et al.

    Lexical and indexical cues in masking by competing speech

    J. Acoust. Soc. Amer.

    (2009)
  • Cited by (0)

    View full text