Enhancement of speech perception in noise by periodicity processing: A neurobiological model and signal processing algorithm
Introduction
Pitch height is a psychological dimension related to the frequency or period of sounds. Pitch strength may be defined as the certainty of a pitch height (Beerends and Houtsma, 1989, Rakowski, 1996), and has been measured by pairwise comparisons between stimuli of various tonal characteristics where loudness was held constant (Fastl and Stoll, 1979). Pitch strength and loudness judgments are not independent. For pure tones, pitch strength increases linearly with sound pressure level over a range of 60 dB (Fastl, 1989), and both loudness and pitch strength increase with stimulus duration up to around 200 ms (Fastl, 1989, Viemeister and Wakefield, 1991). In contrast to the variability of pitch strength with loudness, the pitch height of a 1 kHz pure tone varies by less than 1% for sound pressure levels between 40 and 80 dB SPL (Zwicker and Fastl, 1999).
Further evidence for a relationship between pitch strength and loudness include the findings that pure tones produce much greater loudness than noise bands (Fastl and Stoll, 1979), such that the sound pressure level of octave band noise with an embedded pure tone is 10 dB less than octave band noise alone with matched loudness (Kryter and Pearsons, 1965). Soeta et al., 2004, Kidd et al., 1989 showed that this effect was preserved even if the width of the noise band was less than one critical bandwidth. Furthermore, Soeta et al. (2007) showed that increases in the pitch strength of iterated rippled noise increased perceived loudness, thereby suggesting that waveform periodicity subserves the relationship between loudness and pitch strength. Furthermore, in studies of auditory evoked potentials, an N1 peak and sustained negative potentials at latencies between 400 and 800 ms were associated with periodicity processing by appending iterated rippled noise to bandpass noise with the same spectral profile (Krumbholz et al., 2003, Seither-Preisler et al., 2006). Increases in the amplitudes of the N1 peak and the longer latency sustained negative potentials are likely associated with enhanced spike rates, and were proportional to the iterated rippled noise pitch strength (Krumbholz et al., 2003, Seither-Preisler et al., 2006).
Most research on the neurobiological function of periodicity processing has focused on the importance of periodicity processing for pitch (Cariani, 2001, Dau et al., 1997, Dicke et al., 2007, Guérin et al., 2006, McLachlan, 2009, Meddis and O’Mard, 2006, Nelson and Carney, 2004). However, the ability to hear tones in noise has important implications for speech intelligibility under noisy conditions (Plomp, 1994). In support of this, the detection of common periodicity has been successfully used to segment parts of noisy speech signals that contain vowel sounds (Hu and Wang, 2008), and the use of periodicity-based features was found to enhance the automatic speech recognition rates of voiced components of speech in the presence of noise (Ishizuka and Nakatani, 2006). Other researchers have considered the role of periodicity processing in segregating vowels by autocorrelation and similar algorithms such as recurrent neural networks (Assmann and Summerfield, 1990, Cariani, 2001, de Cheveigné and Kawahara, 1999, Meddis and Hewitt, 1992), and in the formation of stable auditory images of periodic sounds through the cross-channel temporal alignment of the maxima of auditory nerve spike rates (Patterson et al., 1995).
While most techniques that attempt to improve speech intelligibility in noise seek to remove the energy associated with noise by estimating its spectrum (Yoo et al., 2007), a small number of studies have sought to enhance spectral components associated with vowels (Cheng and O’Shaughnessy, 1991, Turicchia and Sarpeshkar, 2005). However, many studies have shown that transient voiced information is very important for speech intelligibility (Hazan and Simpson, 1998, Howell and Rosen, 1983, Richardson et al., 2004, Strange et al., 1983), and algorithms that enhance speech transients given prior knowledge of the speech content have been successful in increasing recognition rates for speech in noise (Yoo et al., 2007). Such algorithms cannot recover non-voiced components of speech such as fricatives, and these components must be deduced from the voiced information when masked by noise.
This paper introduces a neurobiologically inspired model of periodicity processing (Periodic Sampling Model, PSM) that may enhance the loudness of transient formants in speech relative to noise without the long signal durations required to extract fundamental frequency information. Overall, the PSM proposes that inferior colliculus spike rates integrated over multiple waveform periods after each sound onset will be greater at specific sub-bands (or best modulation frequencies, BMF) within auditory frequency channels that contain periodic auditory nerve (AN) spike rate modulations. The specific loudness associated with amplitude modulated filter channels will be greater, in keeping with the relationship between pitch strength and loudness observed by Fastl (1989). The second part of the paper reports a pilot study in which an algorithm based on the PSM is used to enhance periodic components of speech signals embedded in pink noise and speech babble, and speech intelligibility of processed speech stimuli were compared with unprocessed stimuli for normal hearing participants.
Section snippets
Neurobiological motivation
This model focuses on brainstem networks that involve the cochlear nucleus (CN), inferior colliculus (IC), and the ventral nucleus of the lateral lemniscus (VLL). The cell types that project to the IC from the CN comprise multipolar (otherwise known as stellate or chopper), fusiform, octopus, and giant cells (Cant and Benson, 2003). Sustained chopper cells exhibit highly regular firing patterns when excited near their characteristic frequency (CF), with a frequency of firing known as the cell’s
Assessment of the effect of the PSM algorithm on speech intelligibility
The PSM algorithm was used to reduce the amplitude of non-tonal signal components by 14 dB, which is greater than the 10 dB SPL increase in the perceived loudness of tonal signals reported by Kryter and Pearsons (1965). Since the algorithm replicates normal hearing mechanisms it is unlikely to improve speech intelligibility for healthy listeners; so it is hypothesized that there will be no loss of intelligibility in the processed examples, despite the reduction of the amplitude of non-periodic
General discussion
Overall, the results confirm that the PSM algorithm could segregate periodic from non-periodic components of a signal within the time required to boost the salience of spoken phonemes. This confirms that it is possible to enhance the tonal components of speech and other animal vocalizations by periodicity processing in the brain stem.
The observation that word intelligibility for normal hearing listeners was not reduced by the PSM algorithm under most of the experimental conditions shows that
Acknowledgment
This work was supported by Australian Research Council Discovery Project Grants DP1094830 and DP120103039.
References (67)
Breaking the wave: effects of attention and learning on concurrent sound perception
Hear. Res.
(2007)- et al.
Parallel auditory pathways: projection patterns of the different neuronal populations in the dorsal and ventral cochlear nuclei
Brain Res. Bull.
(2003) Neural timing nets
Neural Networks
(2001)- et al.
Multiple period estimation and pitch perception model
Speech Commun.
(1999) - et al.
Scaling of pitch strength
Hear. Res.
(1979) - et al.
Evaluation of two computational models of amplitude modulation coding in the inferior colliculus
Hear. Res.
(2006) - et al.
The effect of cue-enhancement on the intelligibility of nonsense word and sentence materials presented in noise
Speech Commun.
(1998) - et al.
Perception of rise time and explanations of the affricate/fricative contrast
Speech Commun.
(1983) - et al.
Temporal integration in absolute pitch identification of absolute pitch
Hear. Res.
(2007) - et al.
A feature extraction method using subband based periodicity and aperiodicity decomposition with noise robust frontend processing for automatic speech recognition
Speech Commun.
(2006)
Temporal and spatial coding of periodicity information in the inferior colliculus of awake chinchilla (Chinchilla laniger)
Hear. Res.
A computational model of human pitch strength and height judgments
Hear. Res.
Perception of Mandarin Chinese with cochlear implants using enhanced temporal pitch cues
Hear. Res.
Encoding of timing in the brain stem auditory nuclei of vertebrates
Neuron
From noise to pitch: transient and sustained responses of the auditory evoked field
Hear. Res.
Loudness in relation to iterated rippled noise
J. Sound Vib.
Tone-threshold mapping in the inferior colliculus of the house–mouse
Neurosci. Lett.
Specialization among the specialized: auditory brainstem function is tuned in to timbre
Cortex
Modeling the perception of concurrent vowels: vowels with different fundamental frequencies
J. Acoust. Soc. Am.
Pitch identification of simultaneous diotic and dichotic two-tone complexes
J. Acoust. Soc. Am.
Speech–Hearing Tests and the Spoken Language of Hearing-Impaired Children
Regularity analysis in a compartmental model of chopper units in the anteroventral cochlear nucleus
J. Neurophysiol.
Effects of off-BF tones on responses of chopper units in ventral cochlear nucleus; I. Regularity and temporal adaptation patterns
J. Neurophysiol.
Perceptual effects of noise reduction with respect to personal preference, speech intelligibility, and listening effort
Ear Hear.
Speech enhancement based conceptually on auditory evidence
IEEE Trans. Signal Process.
The monaural nuclei of the lateral lemniscus in an echolocating bat: parallel pathways for analyzing temporal features of sound
J. Neurosci.
Modeling auditory processing of amplitude modulation. I. Detection and masking with narrow-band carriers
J. Acoust. Soc. Am.
A neural circuit transforming temporal periodicity information into a rate-based representation in the mammalian auditory system
J. Acoust. Soc. Am.
Spectral and intensity coding
Octopus cells of the mammalian ventral cochlear nucleus sense the rate of depolarization
J. Neurophysiol.
Encoding of amplitude modulation in the gerbil cochlear nucleus. I. A hierarchy of enhancement
Hear. Res.
Improved numerical methods for Gammatone filterbank analysis and synthesis
Acta Acust.
Cited by (3)
Computer Ontology of Mathematical Models of Cyclic Space-Time Structure Signals
2022, CEUR Workshop ProceedingsThe contribution of brainstem and cerebellar pathways to auditory recognition
2017, Frontiers in Psychology