Elsevier

NeuroImage

Volume 20, Issue 3, November 2003, Pages 1865-1871
NeuroImage

Regular article
Automated method for extracting response latencies of subject vocalizations in event-related fMRI experiments

https://doi.org/10.1016/j.neuroimage.2003.07.020Get rights and content

Abstract

For functional magnetic resonance imaging studies of the neural substrates of language, the ability to have subjects performing overt verbal responses while in the scanner environment is important for several reasons. Most directly, overt responses allow the investigator to measure the accuracy and reaction time of the behavior. One problem, however, is that magnetic resonance gradient noise obscures the audio recordings made of voice responses, making it difficult to discern subject responses and to calculate reaction times. ASSERT (Adaptive Spectral Subtraction for Extracting Response Times), an algorithm for removing MR gradient noise from audio recordings of subject responses, is described here. The signal processing improves intelligibility of the responses and also allows automated extraction of reaction times. The ASSERT-derived response times were comparable to manually measured times with a mean difference of −8.75 ms (standard deviation of difference = 26.2 ms). These results support the use of ASSERT for the purpose of extracting response latencies and scoring overt verbal responses.

Introduction

The use of functional magnetic resonance imaging (fMRI) to study the neural substrates of language has proven to be effective (see e.g., Ojemann et al., 1998, Bookheimer, 2002, Phelps et al., 1997, Turkeltaub et al., 2002, Binder et al., 1997, but many studies have avoided the use of overt verbal responding, despite the common use of this response modality in behavioral and cognitive language studies. Two major problems have been identified in the use of verbal responses.

First, movement and susceptibility artifacts are expected to be produced that affect the statistical quality of images (Barch et al., 1999). This problem has been addressed through the use of event-related designs and special pulse sequences Barch et al., 1999, Birn et al., 1999, Eden et al., 1999, Palmer et al., 2001, de Zubicaray et al., 2001, Huang et al., 2002.

The second problem is that echo planar imaging bold oxygenation-level dependent (EPI BOLD) sequences generate gradient noise that obscures the subject's spoken responses (Munhall et al., 2001). As a consequence, a simple microphone threshold is often insufficient to detect the response onset, making it difficult to measure the voice reaction time. Furthermore, the noise hinders assessment of the content of the subject's response, which is critical to some event-related experiments (e.g., Schlaggar et al., 2002). Making this problem more difficult, the EPI sequence gradient noise spectrum overlaps the human speech spectrum, so simple filtration is ineffective in removing the noise without also degrading the speech signal. Thus, response times were previously extracted manually by measuring the time from stimulus presentation to voice response onset from the recorded audio signal in a sound editing program Palmer et al., 2001, Schlaggar et al., 2002. This process is labor-intensive and somewhat subjective, which may introduce some user variability.

ASSERT (Adaptive Spectral Subtraction for Extracting Response Times) was developed to address this second problem by removing much of the MR gradient noise from the auditory recordings, both improving the voice response intelligibility and facilitating automated extraction of reaction time latencies based on a simple signal amplitude measurement.

Section snippets

Recording apparatus

Audio recordings of spoken responses from subjects were made during fMRI image acquisition on a Siemens MAGNETOM Vision 1.5-T scanner (Erlangen, Germany). Functional images were acquired using a nearly continuous acquisition EPI BOLD sequence (TR = 2.5 s, 16 slices, 3.75 × 3.75 × 8-mm voxels), comparable to the EPI sequences used for other fMRI studies Kelley et al., 1998, Palmer et al., 2001, Schlaggar et al., 2002, with a short delay of 50 ms between whole brain (frame) acquisitions to allow

Results

A representative BOLD run is depicted in Fig. 2. Qualitatively, the subject responses are visually barely discernible from the underlying gradient noise in the raw signal. By contrast, in the ASSERT processed signal the responses are readily evident.

Table 1 shows the mean differences between automatic and manual timing for each BOLD run, as well as the mean across BOLD runs and the standard deviation of the differences. In three of four BOLD runs, ASSERT averaged a shorter response time than

Discussion

The main finding from this study is that ASSERT consistently yielded response times similar to those derived manually. Both the mean and the standard deviation of differences in the response times were smaller than the standard deviation of response times for the tasks used even for simple word reading. From Table 1, the averaged difference between manual and ASSERT response times for a given BOLD run fell into a range of roughly ±15 ms. The average difference across all four BOLD runs was less

Conclusions

Our adaptive spectral subtraction algorithm, ASSERT, was able to extract response latencies of subject vocalizations quickly, accurately, and reliably. The response times correlated well with manually extracted response times, and with only a small average difference. No responses were missed, and no noise peaks were incorrectly scored as responses. Not only does this software eliminate the need for manual scoring of response times, but the removal of noise from the recording also makes

Acknowledgements

This work was approved by the Washington University Human Studies Committee and was supported by grants from the McDonnell Center for the Study of Higher Brain Function, NIH Grants LM06858 (S.E.P.), NS41255 (S.E.P.), and NS55582 (B.L.S.). B.L.S. is a Scholar of the Child Health Research Center of Excellence in Developmental Biology at Washington University School of Medicine (HD01487).

References (17)

There are more references available in the full text version of this article.

Cited by (24)

  • Functional Neuroimaging of Speech Production

    2006, Handbook of Psycholinguistics
View all citing articles on Scopus
View full text