Elsevier

NeuroImage

Volume 27, Issue 3, September 2005, Pages 533-543
NeuroImage

Neuromagnetic responses reflect the temporal pitch change of regular interval sounds

https://doi.org/10.1016/j.neuroimage.2005.05.003Get rights and content

Abstract

The pitch onset response (POR) evoked by the transition between two regular interval sounds (RIS) with different pitch was studied by recording the neuromagnetic responses with a 122-channel whole head magnetoencephalograph (MEG).

The parameters of RIS were varied giving rise to characteristic changes in the latency of the first prominent deflection occurring about 100 to 140 ms after the transition. These latency differences of the neurophysiological signal correlated strongly with the psychoacoustic findings obtained from the same individuals. Some of the observed changes cannot be explained by obvious physical differences as changes in the spectrum, but only by temporal processing mechanisms as the auditory image model (Patterson, R.D., Allerhand, M., Giguere, C., 1995. Time-domain modelling of peripheral auditory processing: a modular architecture and a software platform. J. Acoust. Soc. Am. 98, 1890–1894). The location of the POR evoked by the transition was found to be in lateral Heschl's Gyrus, which gives further evidence that this is the center of processing pitch changes in the auditory cortex.

Introduction

The perception of pitch is important for human subjects to gather information from speech or music. Most sounds in our environment comprise a mix over a wide range of frequencies but still elicit a salient pitch percept. The question of how pitch is encoded by the auditory system led to two theories: place coding and time coding. The place theory is based on the tonotopic organization of the cochlea. This tonotopic organization is maintained throughout the ascending auditory pathway up to the auditory cortex (Brugge, 1985, Romani et al., 1982). The theory of time coding is based on the temporal regularity of sounds. The stimulus-dependent motion of the basilar membrane leads to phase-locked action potentials in the auditory nerve fibers which preserve the temporal regularity of the signal. Licklider (1951) proposed that the perception of pitch is mediated by a neural autocorrelation process in the time domain. The autocorrelation function is the inverse Fourier transform of the power spectrum (Wiener–Khintchine relation). The first peak in the autocorrelation function at time lag τ results in a perceived pitch with a frequency of 1/τ. Patterson et al. (1995) extended this measure of the perceived pitch in their temporal pitch model by strobed temporal integration over a decaying buffer in time.

Most complex sounds that produce a pitch show maxima in their auditory spectrum at higher harmonics of the fundamental frequency and the autocorrelogram have a main peak at the period of the sound. The perceived pitch can be explained with both spectral and temporal auditory mechanisms. A class of sounds that were used in the past for investigation of temporal cues are regular interval sounds (RIS) (e.g. Griffiths et al., 1998). The basis of RIS is noise that is processed for hearing experiments in the following way: the noise signal Φ(t) is multiplied by a gain factor g, delayed by a time d and added back to the original noise. This process is repeated n times. For a positive gain, the perceived pitch of RIS is proportional to the inverse of the delay d, and the pitch strength increases with an increasing number of iterations n (Yost, 1996b). We use the notation RIS(d,g,n) throughout the present paper. The amplitude (y(t)) of RIS is given byy(t)=Φ(t)+k=1ngkΦ(tkd).

The power spectra of RIS (Fig. 1, top to bottom) are combed and come close to a line spectrum for a high number of iterations n. For a positive sign of g (Fig. 1, left), the spectra peak at integral multiples of the inverse of the delay (f=1d,2d,3d,) and the autocorrelation function exhibit a peak at time lag τ=1f0.

If the gain is negative (Fig. 1, right), the spectra peak at odd multiples of f=12d(f=12d,32d,52d,); the peak in the autocorrelation function is at τ=12f0.

The evoked pitch of RIS(d,−1,n) that is generated with a gain factor of minus one was examined in different psychoacoustic studies. Yost et al. (1978) reported for RIS generated with a negative gain factor (g = −1) and n > 4 a pitch shift of an octave below the corresponding RIS with g = +1. But for RIS generated with less than four iterations, they reported the perceived pitch not in the expected region of f=12d, but around f=10.9d and f=11.1d, independent of the delay time d.

Raatgever and Bakkum (1986) reported different results for their pitch matching experiment, using an infinite number of iterations. The change in the perception of RIS(d,−1,∞) depended on the delay d. Pitch matches at f=12d only occurred for delay times of less than 6 ms, but with increasing delay, pitch matched around f=10.9d and f=11.1d compared to RIS(d,+1,∞). Pitch matching is very difficult, and often subjects could not perform the task. In the study of Yost (1996a), four of six subjects were excluded.

Several electrophysiological studies at different stages (e.g. Shofner, 1999) in the auditory pathway were conducted to investigate the processing and the representation of pitch in the auditory cortex (Griffiths et al., 1998). Pantev et al. (1988) and Roberts and Poeppel (1996) showed with MEG the dependence of the transient N100 deflection to sinusoidal tones of different frequencies. These studies reported a correlation between frequency and latency of the N100. Ragot and Lepaul-Ercole (1996) used EEG and showed that for harmonic series with a varying spectrum the latency of the N100 only depends on the fundamental frequency and not on the details of the spectrum. Other studies (e.g. Stufflebeam et al., 1998) revealed that the N100 is also sensitive to other stimulus features as the intensity of the stimuli. Lütkenhöner et al. (2001) proposed that the N100 is arising from multiple sources. The specific response to the onset of pitch was initially isolated by Mäkelä et al. (1988). They used a continuous stimulation with transition from noise to square waves. In this way, responses to the simple onset of energy flux are avoided, and responses to the pitch onset can be extracted. About 100 ms after the transition, a prominent deflection (N100m′) was found to be sensitive to pitch height and pitch salience. In a recent MEG study, Krumbholz et al. (2003) applied at the first time transitions from noise to RIS generated with a positive gain factor to investigate the isolated POR to the onset of RIS. They reported the specific N100m′ response to be involved in pitch processing and found the location in the medial Heschl's Gyrus.

The aim of this paper is to correlate the neuromagnetic responses evoked by RIS with psychoacoustic results. The perceived pitch shift between RIS generated with positive and negative sign of g finds particular attention. In view of the existing controversy on psychoacoustic results, the neurophysiological responses and the perception are investigated not only in dependence of g, but also of the delay time d and the number of iterations n. We also included n = 4096 iterations in our study, where the power spectrum of RIS approaches a line spectrum.

In the first experiment, we applied MEG to record the neuromagnetic responses evoked by RIS. We used a continuous stimulation to extract the pitch onset response (POR). In contrast to earlier studies, we concatenated RIS with fixed n and d, but alternating sign of g to avoid responses evoked by the energy onset (see Fig. 2). In the psychoacoustic experiment, we circumvented the reported difficulty of pitch matching (Yost, 1996a) by using a two-alternative forced choice task. A scale for the relative pitch of RIS was derived according to the Bradley–Terry–Luce (BTL) method of paired comparison (David, 1988). A strong correlation is found between the evoked neurophysiological responses and the perceived pitch of RIS. In the simulation part, we show that the temporal pitch model introduced by Patterson et al. (1995) is able to predict the perceived octave shift and the transition in the perception to ambiguous pitches in the region of ±10% between RIS generated with positive and negative gain.

Section snippets

Subjects

Twenty adult listeners (ten male, ten female) with no reported history of peripheral or central hearing disorder participated in both experiments after giving informed consent. The mean age (±standard deviation) was 33 (±9) years. During the MEG sessions, subjects watched a silent movie of their own choice. They were asked not to pay attention to the stimuli and concentrate on the movie.

MEG stimuli

Digitally generated white noise at a sampling rate of 48,000 Hz was used to produce RIS. The gain g was

Source analysis

The averaged neuromagnetic responses evoked by RIS(2,+1,8) produced a good signal-to-noise ratio (SNR) with a consistent fit for each subject. Therefore, this condition was always used for fitting the two-dipole model. In both hemispheres, the averaged dipoles of the equivalent source waveforms were located in the lateral portion of Heschl's Gyrus (HG) in the primary auditory cortex (left: x = −54(±8), y = −9(±7), z = 12(±11) and right: 54(±7), −11(±6), 14(±8), brackets indicate the standard

Simulation of RIS with the auditory image model

In this section, we give a short description of how the auditory image model (AIM) (Patterson et al., 1995) predicts the perceived pitch elicited by RIS generated with positive or negative gain. AIM is based on the simulation of the spectral analysis performed along the basilar membrane using a bank of auditory band-pass filters: the neural transduction process of the inner hair cells and the primary auditory fibers is simulated in each of the frequency channels defined by the filters. In the

Discussion

The aim of this paper is to investigate the correlation between neuromagnetic responses and the perceived pitch of RIS. Our results reveal a remarkable agreement between the neurophysiological and the psychoacoustic results (Fig. 6). The POR evoked by RIS generated with opposite signs of the gain factor g and delay times of 2 and 4 ms show significant differences in the latency of the N100m′. The corresponding perception of the BTL scaled relative pitch of RIS also differs significantly for

Conclusion

The pitch-specific neurophysiological N100m′ component is investigated in absence of changes in the spectral envelope of the presented sounds since RIS/RIS transitions are applied. The latency of the observed POR correlates highly with our psychoacoustic measurements of the perceived pitch. The latency difference of the N100m′ between RIS generated with opposite signs of the gain factor is significant when the delay time of RIS is set to 2 or 4 ms, but within errors for delay times of 8 and 16

Acknowledgment

This work was supported by the Deutsche Forschungsgemeinschaft (Ru 652/3-1 and Ri 1229/1-1).

References (36)

  • H.A. David

    The Method of Paired Comparisons

    (1988)
  • B. Efron et al.

    An Introduction to the Bootstrap

    (1993)
  • T.D. Griffiths et al.

    Analysis of the temporal structure in sound by the human brain

    Nat. Neurosci.

    (1998)
  • K. Krumbholz et al.

    Neuromagnetic evidence for a pitch processing center in Heschl's Gyrus

    Cereb. Cortex

    (2003)
  • C.M. Leonard et al.

    Normal variation in the frequency and location of human auditory cortex landmarks. Heschl's Gyrus: where is it?

    Cereb. Cortex

    (1998)
  • J.C.R. Licklider

    The duplex theory of pitch perception

    Experientia

    (1951)
  • B. Lütkenhöner et al.

    Latency of auditory evoked field deflection N100m ruled by pitch or spectrum?

    Audiol. Neuro-otol.

    (2001)
  • R. Näätänen et al.

    The N1 wave of the human electric and magnetic response to sound: a review and an analysis of the component structure

    Psychophysiology

    (1987)
  • Cited by (41)

    • MEG correlates of temporal regularity relevant to pitch perception in human auditory cortex

      2022, NeuroImage
      Citation Excerpt :

      That is, the presentation of a pitch-evoking sound in isolation would elicit not only a pitch-onset response (POR) but also an energy-onset response that is irrelevant to pitch. Critically, the POR has different electrophysiological characteristics (e.g., a later latency, a more anterior dipole location) than the N100m (Chait et al., 2005b; Gutschalk et al., 2004; Krishnan et al., 2012, 2014; Krumbholz et al., 2003; Ritter et al., 2005; Seither-Preisler et al., 2006). Moreover, the POR is absent if the repetition rate of the regular stimuli is below the lower limit of pitch (Gutschalk et al., 2004; Krumbholz et al., 2003; Ritter et al., 2005).

    • Neuromagnetic correlates of voice pitch, vowel type, and speaker size in auditory cortex

      2017, NeuroImage
      Citation Excerpt :

      The results are consistent with the fMRI results of von Kriegstein et al. (2006, 2007, 2010) on the neural correlates of speaker size information in auditory cortex, and the MEG results of Andermann et al. (2011) concerning the processing of the size information in French horn tones. More generally, the results are compatible with the earlier findings of fMRI and MEG experiments concerning the processing of spectral and temporal features of complex sounds (e.g., Griffiths et al., 1998; Patterson et al., 2002; Seither-Preisler et al., 2003; Gutschalk et al., 2004; Ritter et al., 2005; see also Kumar et al., 2007). Together these studies suggested that the more anterior source in HG is involved in temporal processing for the extraction of GPR information, and the more posterior sources in PT are involved in spectral processing for the extraction of MFF and vowel type information.

    • Changes in pitch height elicit both language-universal and language-dependent changes in neural representation of pitch in the brainstem and auditory cortex

      2017, Neuroscience
      Citation Excerpt :

      Our findings show that irrespective of language group, latency of CPR components, especially Na, decreases with increasing pitch height. This shortening effect is consistent with previous reports of a shortening of cortical response latency elicited by increasing pitch height: cortical N100 component, transition from silence to sound (Ragot and Lepaul-Ercole, 1996; Roberts and Poeppel, 1996); POR, transition from noise to pitch (Krumbholz et al., 2003; Ritter et al., 2005, 2007; Bidelman, 2015). Systematic decrease in response latency with increasing pitch height reflects, in part, the traveling wave delay along the cochlear partition and tonotopic organization in the auditory cortex (Pantev et al., 1989, 1995; Bidelman and Grall, 2014).

    • Language experience enhances early cortical pitch-dependent responses

      2015, Journal of Neurolinguistics
      Citation Excerpt :

      The reverse stimulus transition from pitch to noise failed to produce a POR. It has been proposed that the human POR, as measured by MEG, reflects synchronized cortical neural activity specific to pitch (Chait, Poeppel, & Simon, 2006; Krumbholz et al., 2003; Ritter, Gunter Dosch, Specht, & Rupp, 2005; Seither-Preisler, Patterson, Krumbholz, Seither, & Lutkenhoner, 2006). POR latency and magnitude, for example, has been shown to depend on pitch salience.

    View all citing articles on Scopus
    View full text