Elsevier

Speech Communication

Volume 57, February 2014, Pages 114-125
Speech Communication

Enhancement of speech perception in noise by periodicity processing: A neurobiological model and signal processing algorithm

https://doi.org/10.1016/j.specom.2013.09.007Get rights and content

Highlights

  • There is a fundamental relationship between pitch strength and loudness.

  • This inspired the formulation of a new conceptual neurobiological model.

  • A computer algorithm to enhance speech in noise was based on this model.

  • Noise could be reduced by 14 dB without effecting speech intelligibility.

  • Periodicity processing may have evolved to boost the salience of vocalizations.

Abstract

The perceived loudness of sound increases with its tonality or periodicity, and the pitch strength of tones are linearly proportional to their sound pressure level. These observations suggest a fundamental relationship between pitch strength and loudness. This relationship may be explained by the superimposition of inputs to inferior colliculus neurons from cochlear nucleus chopper cells and phase locked spike trains from the lateral lemniscus. The regularity of chopper cell outputs increases for stimuli with periodicity at the same frequency as their intrinsic chopping rate. So inputs to inferior colliculus cells become synchronized for periodic stimuli, leading to increased likelihood that they will fire and increased salience of periodic signal components at the characteristic frequency of the inferior colliculus cell. A computer algorithm to enhance speech in noise was based on this model. The periodicity of the outputs of a Gammatone filter bank after each sound onset was determined by first sampling each filter channel at a range of typical chopper cell frequencies and then passing these amplitudes through a step function to simulate the firing of coincidence detecting neurons in the inferior colliculus. Filter channel amplification was based on the maximum accumulated spike count after each onset, resulting in increased amplitudes for filter channels with greater periodicity. The speech intelligibility of stimuli in noise was not changed when the algorithm was used to remove around 14 dB of noise from stimuli with signal–noise ratios of around 0 dB. This mechanism is a likely candidate for enhancing speech recognition in noise, and raises the proposition that pitch itself is an epiphenomenon that evolved from neural mechanisms that boost the hearing sensitivity of animals to vocalizations.

Introduction

Pitch height is a psychological dimension related to the frequency or period of sounds. Pitch strength may be defined as the certainty of a pitch height (Beerends and Houtsma, 1989, Rakowski, 1996), and has been measured by pairwise comparisons between stimuli of various tonal characteristics where loudness was held constant (Fastl and Stoll, 1979). Pitch strength and loudness judgments are not independent. For pure tones, pitch strength increases linearly with sound pressure level over a range of 60 dB (Fastl, 1989), and both loudness and pitch strength increase with stimulus duration up to around 200 ms (Fastl, 1989, Viemeister and Wakefield, 1991). In contrast to the variability of pitch strength with loudness, the pitch height of a 1 kHz pure tone varies by less than 1% for sound pressure levels between 40 and 80 dB SPL (Zwicker and Fastl, 1999).

Further evidence for a relationship between pitch strength and loudness include the findings that pure tones produce much greater loudness than noise bands (Fastl and Stoll, 1979), such that the sound pressure level of octave band noise with an embedded pure tone is 10 dB less than octave band noise alone with matched loudness (Kryter and Pearsons, 1965). Soeta et al., 2004, Kidd et al., 1989 showed that this effect was preserved even if the width of the noise band was less than one critical bandwidth. Furthermore, Soeta et al. (2007) showed that increases in the pitch strength of iterated rippled noise increased perceived loudness, thereby suggesting that waveform periodicity subserves the relationship between loudness and pitch strength. Furthermore, in studies of auditory evoked potentials, an N1 peak and sustained negative potentials at latencies between 400 and 800 ms were associated with periodicity processing by appending iterated rippled noise to bandpass noise with the same spectral profile (Krumbholz et al., 2003, Seither-Preisler et al., 2006). Increases in the amplitudes of the N1 peak and the longer latency sustained negative potentials are likely associated with enhanced spike rates, and were proportional to the iterated rippled noise pitch strength (Krumbholz et al., 2003, Seither-Preisler et al., 2006).

Most research on the neurobiological function of periodicity processing has focused on the importance of periodicity processing for pitch (Cariani, 2001, Dau et al., 1997, Dicke et al., 2007, Guérin et al., 2006, McLachlan, 2009, Meddis and O’Mard, 2006, Nelson and Carney, 2004). However, the ability to hear tones in noise has important implications for speech intelligibility under noisy conditions (Plomp, 1994). In support of this, the detection of common periodicity has been successfully used to segment parts of noisy speech signals that contain vowel sounds (Hu and Wang, 2008), and the use of periodicity-based features was found to enhance the automatic speech recognition rates of voiced components of speech in the presence of noise (Ishizuka and Nakatani, 2006). Other researchers have considered the role of periodicity processing in segregating vowels by autocorrelation and similar algorithms such as recurrent neural networks (Assmann and Summerfield, 1990, Cariani, 2001, de Cheveigné and Kawahara, 1999, Meddis and Hewitt, 1992), and in the formation of stable auditory images of periodic sounds through the cross-channel temporal alignment of the maxima of auditory nerve spike rates (Patterson et al., 1995).

While most techniques that attempt to improve speech intelligibility in noise seek to remove the energy associated with noise by estimating its spectrum (Yoo et al., 2007), a small number of studies have sought to enhance spectral components associated with vowels (Cheng and O’Shaughnessy, 1991, Turicchia and Sarpeshkar, 2005). However, many studies have shown that transient voiced information is very important for speech intelligibility (Hazan and Simpson, 1998, Howell and Rosen, 1983, Richardson et al., 2004, Strange et al., 1983), and algorithms that enhance speech transients given prior knowledge of the speech content have been successful in increasing recognition rates for speech in noise (Yoo et al., 2007). Such algorithms cannot recover non-voiced components of speech such as fricatives, and these components must be deduced from the voiced information when masked by noise.

This paper introduces a neurobiologically inspired model of periodicity processing (Periodic Sampling Model, PSM) that may enhance the loudness of transient formants in speech relative to noise without the long signal durations required to extract fundamental frequency information. Overall, the PSM proposes that inferior colliculus spike rates integrated over multiple waveform periods after each sound onset will be greater at specific sub-bands (or best modulation frequencies, BMF) within auditory frequency channels that contain periodic auditory nerve (AN) spike rate modulations. The specific loudness associated with amplitude modulated filter channels will be greater, in keeping with the relationship between pitch strength and loudness observed by Fastl (1989). The second part of the paper reports a pilot study in which an algorithm based on the PSM is used to enhance periodic components of speech signals embedded in pink noise and speech babble, and speech intelligibility of processed speech stimuli were compared with unprocessed stimuli for normal hearing participants.

Section snippets

Neurobiological motivation

This model focuses on brainstem networks that involve the cochlear nucleus (CN), inferior colliculus (IC), and the ventral nucleus of the lateral lemniscus (VLL). The cell types that project to the IC from the CN comprise multipolar (otherwise known as stellate or chopper), fusiform, octopus, and giant cells (Cant and Benson, 2003). Sustained chopper cells exhibit highly regular firing patterns when excited near their characteristic frequency (CF), with a frequency of firing known as the cell’s

Assessment of the effect of the PSM algorithm on speech intelligibility

The PSM algorithm was used to reduce the amplitude of non-tonal signal components by 14 dB, which is greater than the 10 dB SPL increase in the perceived loudness of tonal signals reported by Kryter and Pearsons (1965). Since the algorithm replicates normal hearing mechanisms it is unlikely to improve speech intelligibility for healthy listeners; so it is hypothesized that there will be no loss of intelligibility in the processed examples, despite the reduction of the amplitude of non-periodic

General discussion

Overall, the results confirm that the PSM algorithm could segregate periodic from non-periodic components of a signal within the time required to boost the salience of spoken phonemes. This confirms that it is possible to enhance the tonal components of speech and other animal vocalizations by periodicity processing in the brain stem.

The observation that word intelligibility for normal hearing listeners was not reduced by the PSM algorithm under most of the experimental conditions shows that

Acknowledgment

This work was supported by Australian Research Council Discovery Project Grants DP1094830 and DP120103039.

References (67)

  • G. Langner et al.

    Temporal and spatial coding of periodicity information in the inferior colliculus of awake chinchilla (Chinchilla laniger)

    Hear. Res.

    (2002)
  • N.M. McLachlan

    A computational model of human pitch strength and height judgments

    Hear. Res.

    (2009)
  • M. Milczynski et al.

    Perception of Mandarin Chinese with cochlear implants using enhanced temporal pitch cues

    Hear. Res.

    (2012)
  • D. Oertel

    Encoding of timing in the brain stem auditory nuclei of vertebrates

    Neuron

    (1997)
  • A. Seither-Preisler et al.

    From noise to pitch: transient and sustained responses of the auditory evoked field

    Hear. Res.

    (2006)
  • Y. Soeta et al.

    Loudness in relation to iterated rippled noise

    J. Sound Vib.

    (2007)
  • I. Stiebler

    Tone-threshold mapping in the inferior colliculus of the house–mouse

    Neurosci. Lett.

    (1986)
  • D.L. Strait et al.

    Specialization among the specialized: auditory brainstem function is tuned in to timbre

    Cortex

    (2012)
  • P.F. Assmann et al.

    Modeling the perception of concurrent vowels: vowels with different fundamental frequencies

    J. Acoust. Soc. Am.

    (1990)
  • J.G. Beerends et al.

    Pitch identification of simultaneous diotic and dichotic two-tone complexes

    J. Acoust. Soc. Am.

    (1989)
  • J. Bench et al.

    Speech–Hearing Tests and the Spoken Language of Hearing-Impaired Children

    (1979)
  • C.C. Blackburn et al.

    Regularity analysis in a compartmental model of chopper units in the anteroventral cochlear nucleus

    J. Neurophysiol.

    (1991)
  • C.C. Blackburn et al.

    Effects of off-BF tones on responses of chopper units in ventral cochlear nucleus; I. Regularity and temporal adaptation patterns

    J. Neurophysiol.

    (1992)
  • I. Brons et al.

    Perceptual effects of noise reduction with respect to personal preference, speech intelligibility, and listening effort

    Ear Hear.

    (2013)
  • Y.M. Cheng et al.

    Speech enhancement based conceptually on auditory evidence

    IEEE Trans. Signal Process.

    (1991)
  • E. Covey et al.

    The monaural nuclei of the lateral lemniscus in an echolocating bat: parallel pathways for analyzing temporal features of sound

    J. Neurosci.

    (1991)
  • T. Dau et al.

    Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers

    J. Acoust. Soc. Am.

    (1997)
  • U. Dicke et al.

    A neural circuit transforming temporal periodicity information into a rate-based representation in the mammalian auditory system

    J. Acoust. Soc. Am.

    (2007)
  • G. Ehret et al.

    Spectral and intensity coding

  • Fastl, H., 1989. Pitch strength of pure tones. In: Proceedings of the 13th International Congress on Acoustics, pp....
  • M.J. Ferragamo et al.

    Octopus cells of the mammalian ventral cochlear nucleus sense the rate of depolarization

    J. Neurophysiol.

    (2002)
  • R.D. Frisina et al.

    Encoding of amplitude modulation in the gerbil cochlear nucleus. I. A hierarchy of enhancement

    Hear. Res.

    (1990)
  • T. Herzke et al.

    Improved numerical methods for Gammatone filterbank analysis and synthesis

    Acta Acust.

    (2007)
  • View full text