Elsevier

NeuroImage

Volume 22, Issue 2, June 2004, Pages 755-766
NeuroImage

Temporal dynamics of pitch in human auditory cortex

https://doi.org/10.1016/j.neuroimage.2004.01.025Get rights and content

Abstract

Recent functional imaging studies have shown that sounds with temporal pitch produce selective activation in anterolateral Heschl's gyrus. This paper reports a magnetoencephalographic (MEG) study of the temporal dynamics of this activation. The cortical response specific to pitch was isolated from the intensity-related response in Planum temporale using a ‘continuous stimulation’ paradigm in which regular and irregular click trains alternate without interruption. The mean interclick interval (ICI) was 6, 12, 24, or 48 ms; the train length was 720 ms. The auditory sustained field serves as a level-dependent baseline that enhances the signal-to-noise ratio over previous techniques.

The onset of pitch was accompanied by a prominent transient field, followed by a strong sustained field, both of which were associated with sources in lateral Heschl's gyrus. The sustained field rose from baseline about 70 ms after the onset of temporal regularity, asymptoted at about 450 ms, and commenced its return to baseline about 70 ms after pitch offset. The peak of the transient field occurred between 130 and 190 ms after regularity onset depending on the ICI.

The latencies of the cortical pitch response are substantially longer than might be anticipated from temporal models of pitch perception. This finding suggests that the temporal integration associated with periodicity processing occurs in a subcortical structure, and that the cortical responses reflect subsequent processes involving the measurement of pitch values and changes in pitch.

Introduction

There has recently been a series of brain imaging studies of temporal pitch processing in cerebral cortex. The studies focus on broadband stochastic sounds with varying degrees of temporal regularity, which have the advantage that both the pitch and pitch strength can be varied without affecting the average distribution of energy over frequency and time. These sounds make it possible to isolate the neural response associated with the perception of pitch, or a change in pitch, from the general response to the onset and presence of a sound. The initial studies employed regular interval sounds (RIS) in which the temporal fine structure of a random noise is regularized by making a copy of the noise, delaying it, and adding it back to the original noise, repeatedly (Yost et al., 1996). Positron emission tomography (PET) and functional magnetic resonance imaging (fMRI) Griffiths et al., 1998, Griffiths et al., 2001, Patterson et al., 2002 were used to show that there was selective activation for pitch in a region of Heschl's gyrus (HG), anterior and lateral to primary auditory cortex, and the level of activation increased with the degree of temporal regularity (i.e., the pitch strength). Selective activation for melody appeared in regions beyond HG in Planum polare and the superior temporal gyrus.

These studies prompted Gutschalk et al. (2002) to measure the sustained fields (SF) produced by click trains using magnetoencephalography (MEG), and to show that regular click trains, which are perceived to have a sustained temporal pitch, produce a markedly stronger SF than irregular click trains that do not have a pitch. The sensor waveform produced by a regular click train is illustrated by the bold gray line in the left panel of Fig. 1, while that produced by an irregular click train is shown by the thin gray line. (The figure introduces the fields and sources that are the subject of this paper along with the terminology.) After click train onset, the auditory evoked field (AEF) begins with a succession of transient fields (TF) consisting mainly of the peaks P1m, N1m, and P2m. The SF overlaps with these peaks at its onset around 300 ms, after which it rises to steady state around 400 ms, where it stays until the stimulus ends. The analysis of the data revealed that the sensor waves can be explained by the combination of an ‘anterior source’ in lateral HG and a ‘posterior source’ in Planum temporale (PT). Source waves for regular and irregular click trains are illustrated in the right-hand column of Fig. 1. The strength of the SF in the anterior source varied with the degree of temporal regularity in the click train, but the source was insensitive to sound level over a wide range. The SF of the posterior source varied with level but was insensitive to temporal regularity. The double dissociation of responses made it possible to separate these relatively close sources with confidence, and once separated, it became clear that the generators of the N1m are also affected by the regularity of the click train. To differentiate the components of the N1m, we refer to them by their typical peak latency (e.g., N130m and N110m), as suggested by Picton et al. (1995).

Subsequently, Krumbholz et al. (2003) used MEG to demonstrate that the perception of the onset of pitch was accompanied by a transient, surface-negative magnetic field whose source was also in lateral HG. This transient field was isolated by appending a spectrally matched noise to the front of a RIS. The pitch onset occurs without a concomitant change in level, which avoids the production of the TFs associated with changes in intensity Biermann and Heil, 2000, Mäkelä et al., 1988. The latency of the peak of the TF was in the range 100–200 ms and it varied inversely with the pitch of the sound. Moreover, the amplitude of the TF varied directly with the pitch strength, indicating that the TF is, indeed, associated with the onset of the perception of pitch. They called this TF a pitch onset response (POR).

In this paper, we use MEG and continuous alternation of regular and irregular click trains to investigate the temporal properties of the SF and TF simultaneously, to compare the locations of their sources, and their relationship to other components of the AEF. If the posterior source is not affected by regularity, the SF will be the same during the regular and irregular intervals, and we can eliminate the posterior SF from the analysis by setting the measurement baseline during the irregular interval. The activity during the regular click train can then be associated with the anterior source. The AEF evoked during the regular interval of a continuously alternating click train is illustrated by the black waves in Fig. 1.

In the imaging studies involving RIS Griffiths et al., 1998, Griffiths et al., 2001, Krumbholz et al., 2003, Patterson et al., 2002, the auditory image model (AIM) of Patterson et al., 1992, Patterson et al., 1995 was used to simulate the internal representation of the sound, and to generate hypotheses about where different aspects of the processing might be in the auditory pathway. A brief description of the AIM is presented here to illustrate how time-domain auditory models can be used to interpret the transient and sustained components of the AEF and their relationships.

Time-domain models like AIM simulate the spectral analysis performed in the cochlea by the basilar membrane and outer hair cells using a bank of 75–100 band-pass auditory filters.1 It is this process that creates the tonotopic dimension of auditory processing. Then, in each of the frequency channels defined by a filter, the model simulates the neural transduction performed by the inner hair cells and primary auditory fibers, using a unipolar, compressive firing mechanism that effectively records the times of the peaks in the wave flowing from that particular filter. Examples of the basilar membrane motion and neural activity patterns produced in response to a natural vowel with a temporal pitch of 125 Hz are presented in Patterson et al. (1995, Fig. 2).

In the next stage of processing, the system is assumed to evaluate the time-interval information in each frequency channel by computing something like an interspike interval histogram (IIH) for the channel, and it is here that the responses to regular and irregular click trains become markedly different. The IIH evoked by a regular click train shows a pronounced peak at the period of the train in virtually every frequency channel; whereas, there are no enduring peaks in any of the IIHs for an irregular click train. The array of IIHs is called an ‘auditory image’ and it is this representation that gives the model its name. The brain imaging studies with RIS were partly motivated by the desire to delimit the location of this representation in the auditory pathway, since it plays a central role in this time-domain model of auditory perception. Examples of the neural activity patterns and auditory images produced in response to different samples of RIS are presented in Griffiths et al., 1998, Griffiths et al., 2001, Patterson et al. (2002, Fig. 1) and Krumbholz et al. (2003, Figs. 1 and 2).

In the MEG study of Gutschalk et al. (2002), AIM was used to illustrate the neural activity patterns and auditory images produced by regular and irregular click trains (their Fig. 1). There is a vertical ridge of activity in the auditory image of the regular click train, centered on its ICI (5 ms), and no corresponding feature in the auditory image of the irregular click train. Gutschalk et al. (2002) conclude that the pitch-related SF observed in lateral HG represents a process that follows the construction of the auditory image, and that it might represent the integration of pitch information across frequency channels, and/or the calculation of the specific pitch value. Patterson et al. (2002) argue that the fMRI activity produced by RIS in lateral HG leads to similar conclusions.

AIM preserves the temporal information necessary to explain the transient components of the AEF as well as the SF, as would virtually any time-domain model of pitch perception, and so these models also provide a basis for interpreting the dynamics of the AEF. Patterson et al. (1992) argued that the dynamics of auditory perception indicate that activity in the auditory image builds up and decays exponentially, with a half life of 30 ms—a time constant that is much longer than those observed in the filtering and neural transduction processes. The black curve in Fig. 2 shows the height of the vertical ridge in the auditory image, as a function of time, in response to a regular click train with a duration of 720 ms and an ICI of 12 ms (a pitch of 83 Hz). The ordinate is inverted so that the curve goes down when pitch strength increases, to facilitate comparison with the MEG response in lateral HG. The height of the pitch ridge increases (in the negative direction) over the first 100 ms of the sound and then asymptotes to a fixed level that is sustained for the duration of the sound. When the regular click train terminates, the height of the ridge decays back to baseline at the same rate as it grew when the pitch came on. This suggests that there should be a source that produces a SF with the time course resembling the black curve in Fig. 2, in the region of the auditory pathway associated with the auditory image. AIM does not immediately predict that there should be transient fields like those that appear in the AEF. It would be relatively simple, however, to extend the model to include a process that monitors change in the SF, and the output of such a process would be expected to resemble the derivative of the SF function, which is shown by the dashed line in Fig. 1. Thus, the dashed line shows the polarity and form of the TFs that might be expected to accompany the predicted SF.

It is clear from the outset that such a simple model of auditory processing is highly unlikely to explain the complicated dynamics of the AEF. Nevertheless, as we show in the discussion, the model does constrain the interpretation of the transient and sustained components of the AEF and enables us to use MEG data to study auditory processing in considerable detail. Specifically, comparison of the TFs and SFs evoked by regular and irregular click trains appears to confirm that, if there is an auditory image like that proposed in AIM, then it is in a subcortical structure (or maybe in A1), and the processing in lateral HG represents subsequent processing of pitch-related features after they appear in the auditory image.

Section snippets

Materials and methods

The SF involves at least two sources (Gutschalk et al., 2002); the TF involves multiple generators that overlap the SF in time (e.g., Gutschalk et al., 1998, Loveless et al., 1996, Lü et al., 1992, Scherg et al., 1989). In a volume conductor like the brain, adjacent current sources blend and, as a result, spatiotemporal source analysis cannot normally separate more than two or three processes within the confines of auditory cortex (Scherg, 1990). Thus, it is important to use an experimental

Overview

The MEG responses evoked by regular and irregular click trains provide several forms of supporting evidence for the hypothesis that there is a pitch processing region in the anterior part of auditory cortex (HG), and a separate region in the posterior part of auditory cortex (PT) that is highly active in response to the same stimuli at the same time, but which is not concerned with the presence or absence of temporal regularity in the click trains. The results are presented in three parts: (1)

Discussion

In the first part of the discussion, we will briefly address the advantages of the continuous stimulation paradigm, and then focus on the source analysis and the anatomical generators of the transient and sustained AEFs. The final part of the discussion is concerned with the temporal dynamics of the evoked pitch response and the implications for the physiology of pitch perception.

Acknowledgements

The MRI data were acquired in the Department of Neuroradiology, University of Heidelberg, Germany. We are grateful to Prof. Klaus Sartor and Dr. Sabine Heiland, who provided access to the MRI facilities and generous technical support. The research was supported by the Deutsche Forschungsgemeinschaft (Ru 652/1–3) and the UK Medical Research Council (G9900369, G9901257).

References (44)

  • P Morosan et al.

    Human primary auditory cortex: cytoarchitectonic subdivisions and mapping into a spatial reference system

    NeuroImage

    (2001)
  • R.D Patterson et al.

    Complex sounds and auditory images

  • R.D Patterson et al.

    The processing of temporal pitch and melody information in auditory cortex

    Neuron

    (2002)
  • T.W Picton et al.

    Human auditory sustained potentials: II. Stimulus relationships

    Electroencephalogr. Clin. Neurophysiol.

    (1978)
  • J Rademacher et al.

    Probabilistic mapping and volume measurement of human primary auditory cortex

    NeuroImage

    (2001)
  • M Scherg et al.

    Two bilateral sources of the late AEP as identified by a spatio-temporal dipole model

    Electroencephalogr. Clin. Neurophysiol.

    (1985)
  • D.L Barbour et al.

    Auditory cortical responses elicited in awake primates by random spectrum stimuli

    J. Neurosci.

    (2003)
  • S Biermann et al.

    Parallels between timing of onset responses of single neurons in cat and of evoked magnetic fields in human auditory cortex

    J. Neurophysiol.

    (2000)
  • H Braak

    The pigment architecture of the human temporal lobe

    Anat. Embryol.

    (1978)
  • A Brechmann et al.

    Sound-level-dependent representation of frequency-modulations in human auditory cortex: a low noise fMRI study

    J. Neurophysiol.

    (2002)
  • A Galaburda et al.

    Cytoarchitectonic organization of the human auditory cortex

    J. Comp. Neurol.

    (1980)
  • T.D Griffiths et al.

    Analysis of temporal structure in sound by the brain

    Nat. Neurosci.

    (1998)
  • Cited by (114)

    • Early cortical processing of pitch height and the role of adaptation and musicality

      2021, NeuroImage
      Citation Excerpt :

      The average locations of the dipole sources in the pon (N1pon, P2pon, SFpon), pcr (P1pcr, N1pcr, P2pcr), and poff (N1poff, P2poff) segments are shown in Fig. 7, projected onto axial auditory cortex maps (Leonard et al., 1998); the coordinates of the sources along the dimensions of the Talairach space (Talairach and Tournoux, 1988) are reported in supplemental Table 6. The transient AEFs originated from anterolateral Heschl's gyrus, a sub-region of auditory cortex that is involved in the processing of pitch information (Patterson et al., 2002; Krumbholz et al., 2003; Gutschalk et al., 2004; Ritter et al., 2005; Schönwiesner and Zatorre, 2008; Andermann et al., 2014, 2017, 2020); the sources of the sustained activity (SFpon) were located slightly medial and anterior. An interaction effect of AMMA with spatial dimension (F(2,34) = 4.50, p = 0.020*, η2 = 0.21) indicated that the P1pcr generators were located more anterior in high AMMA listeners than in low AMMA listeners.

    View all citing articles on Scopus
    View full text