Elsevier

NeuroImage

Volume 26, Issue 2, June 2005, Pages 317-329
NeuroImage

Support vector machines for temporal classification of block design fMRI data

https://doi.org/10.1016/j.neuroimage.2005.01.048Get rights and content

Abstract

This paper treats support vector machine (SVM) classification applied to block design fMRI, extending our previous work with linear discriminant analysis [LaConte, S., Anderson, J., Muley, S., Ashe, J., Frutiger, S., Rehm, K., Hansen, L.K., Yacoub, E., Hu, X., Rottenberg, D., Strother S., 2003a. The evaluation of preprocessing choices in single-subject BOLD fMRI using NPAIRS performance metrics. NeuroImage 18, 10–27; Strother, S.C., Anderson, J., Hansen, L.K., Kjems, U., Kustra, R., Siditis, J., Frutiger, S., Muley, S., LaConte, S., Rottenberg, D. 2002. The quantitative evaluation of functional neuroimaging experiments: the NPAIRS data analysis framework. NeuroImage 15, 747–771]. We compare SVM to canonical variates analysis (CVA) by examining the relative sensitivity of each method to ten combinations of preprocessing choices consisting of spatial smoothing, temporal detrending, and motion correction. Important to the discussion are the issues of classification performance, model interpretation, and validation in the context of fMRI. As the SVM has many unique properties, we examine the interpretation of support vector models with respect to neuroimaging data. We propose four methods for extracting activation maps from SVM models, and we examine one of these in detail. For both CVA and SVM, we have classified individual time samples of whole brain data, with TRs of roughly 4 s, thirty slices, and nearly 30,000 brain voxels, with no averaging of scans or prior feature selection.

Introduction

The advent of functional magnetic resonance imaging (fMRI) in the early 1990s has provided a revolutionary means for non-invasively probing spatiotemporal variations of brain function. For whole brain studies, the number of acquired brain voxels can be in the tens of thousands, which are sampled in time to acquire hundreds of measurements. The flexibility of this technique in terms of experimental design and data analysis is virtually limitless. As has been stated in a variety of ways by several researchers, neuroimaging data are extremely rich in signal information and poorly characterized in terms of signal and noise structure (Cox and Savoy, 2003, Hansen et al., 2001, LaConte et al., 2003a, Lange et al., 1999, Skudlarski et al., 1999, Strother et al., 2002).

During this same period of fMRI development, advances in the interrelated fields of machine learning, data mining, and statistics have enhanced our capabilities to extract and characterize subtle features in data sets from a wide variety of scientific fields (Cherkassky and Mulier, 1998, Hastie et al., 2001, Mjolsness and DeCoste, 2001). Among these developments, support vector machines (SVMs) have been an active area of research and have been applied to a broad range of problems. SVMs arise from the Statistical Learning Theory of Vapnik (Vapnik, 1995) and possess several unique properties appropriate for real world applications, including fMRI. Among these is the fact that the formulation of SVMs was motivated to deal with small sample sizes and high dimensional inputs, which match the situation involved for temporally predictive modeling of fMRI data.

There are several reasons to consider temporal predictive modeling1 for fMRI data analysis. First, as argued by Strother and Hansen (Strother et al., 2002) and mentioned in Morch et al. (1997), from a Bayesian perspective, there is no obvious mathematical advantage for choosing to estimate a spatial summary map from our knowledge of the experiment (e.g. general linear model approaches (Friston et al., 1995)) over trying to estimate these experimental parameters from our input patterns. Second, as demonstrated in LaConte et al. (2003a), Shaw et al. (2003), and Strother et al. (2002), prediction accuracy along with other model performance metrics such as spatial pattern reproducibility can be used as a data-dependent means of methodological validation. Currently, the most common tool for such validation is the receiver operating characteristic (ROC) analysis (Constable et al., 1995, Hansen et al., 2001, Le and Hu, 1997, Metz, 1978, Skudlarski et al., 1999, Xiong et al., 1996), measuring a method's accuracy by comparing the true positive fraction of activated pixels against the false positive fraction varied over some modeling parameter. Unlike standard ROC analysis in neuroimaging, the approach of LaConte et al. (2003a), Shaw et al. (2003), and Strother et al. (2002) need not rely on simulations. Third, predictive modeling explicitly uses the assumption that we have more reliable knowledge about the temporal aspects of the data than the spatial activation patterns. This is the same assumption implicitly used for generating SPMs, interpreting “data-driven” results, and modeling the hemodynamic response, which follows from the fact that we designed the temporal nature of the experiment and/or simultaneously measure behavioral performance. Finally, temporal predictive modeling is a much more natural way to examine the recent interest in using fMRI for brain computer interface (BCI) and biofeedback studies (LaConte et al., 2004).

The work in predictive modeling has primarily been developed by Strother and Hansen (Hansen et al., 1999, Kjems et al., 2002, Kustra and Strother, 2001, LaConte et al., 2003a, Lautrup et al., 1994, Morch et al., 1997, Shaw et al., 2003, Strother et al., 2002) with recent explicit testing of distributed brain systems by Haxby et al. (Haxby et al., 2001) and Cox and Savoy (Cox and Savoy, 2003). The implication of the classification setting is that fMRI can be used for predicting brain states to enhance our understanding of brain systems, rather than the standard emphasis on spatial mapping2. Recently, Strother has introduced a formal framework in which predictive modeling plays a prominent role. This framework, termed NPAIRS for Nonparametric Prediction Activation Influence and Reproduciblity reSampling (Strother et al., 2002), provides a disciplined approach for exploring multivariate signal and noise spaces and the impact of various factors such as experimental and data analysis parameters as well as the influence of outliers on these subspaces.

The use of SVM has recently been reported in the fMRI literature (Cox and Savoy, 2003, LaConte et al., 2003b). In LaConte et al. (2003b), we dealt with the efficacy of SVM compared to CVA and discussed SVM model interpretation. Here, we greatly expand that initial study. The work of Cox and Savoy (Cox and Savoy, 2003) differs from our approach on several points. First, we are classifying individual scans, with TRs of roughly 4 s, rather than 20-s blocks. Whereas Cox and Savoy used ten classes of visual objects, we focus on the two-class problem to illustrate the important issue of visualization and interpretation of SVM models applied to fMRI. Finally, we build our SVM models based on whole brain data, without selectively choosing voxels through an initial statistical parametric mapping.

In the context of BCI-type studies such as LaConte et al. (2004) or analyses of distributed representations of sensory information (Cox and Savoy, 2003), predictive modeling may be the ultimate goal. A major impetus for performing MRI-based experiments in the first place, however, is to obtain spatially localized information. With this experimental data, one advantage of predictive modeling is that it allows for spatially distributed patterns of activation while also incorporating the temporal structure of the experiment. In other words, we are dealing with multivariate approaches. These spatial summary maps provide aid in model interpretation as well as a tangible means of comparing different models (e.g. (Hansen et al., 2001)). For the SVM, generation of these summary maps requires special consideration, and we outline four methods for doing this, demonstrating one as an example.

This article has the explicit aim of formally describing SVM classification in the application domain of fMRI analysis and to propose interpretation (mapping) strategies for this application. We give a careful description of SVM classifiers and examine more closely the selection and tuning of SVM model parameters. We then illustrate SVM classification through comparisons to CVA, a previously published technique. We do not attempt a definitive verdict on CVA vs. SVM, but rather highlight some currently known merits of both approaches within the fMRI application domain.

Section snippets

Theory

Here, we summarize only the salient concepts for SVM-based classification that are essential for describing its application to fMRI. For a more general treatment, please refer to Burges (1998), Cherkassky and Mulier (1998), and Muller et al. (2001). See Kjems et al. (2002), LaConte et al. (2003a), and Strother et al. (2002) for a description of CVA. In particular, we discuss the classification setting and its relevance to fMRI data analysis, the use of SVM classifiers, and interpretation of SVM

Methods

The SVM implementation used was SVMlight (Joachims, 1999). For I/O speed considerations, we modified this C-based software to read binary image files. CVA along with NPAIRS was implemented in IDL (RSI, Boulder, CO) and is part of the VAST software library (http://neurovia.umn.edu/incweb/npairs_info.html) at the VA Medical Center, Minneapolis, Minnesota. Visualization of SVM models was accomplished with Matlab (MathWorks, Natick, MA) and AFNI (Cox, 1996, Cox and Hyde, 1997).

Results

Our initial exploration of polynomial kernels convinced us to focus on linear kernels (where the input space is equivalent to the feature space). Fig. 2 shows our justification for this. It also represents the first of many result “images,” where gray scale represents the value of a result (in this case polynomial order), with one row per subject and one column per preprocessing. Readers are again referred to Table 1 for the preprocessing abbreviations used for these figures.

From our resampling

Discussion

Focusing primarily on linear (soft margin) SVM classifiers, we have described many of the issues important to block design fMRI. To assess performance, prediction accuracy results of tuned SVM models were compared with our recent CVA work across sixteen subjects and ten preprocessing combinations. We also discussed various aspects of interpretation and visualization of SVM models in the context of fMRI.

Three important issues relevant to this work require further elaboration. These are

Acknowledgments

Many people have helped with various aspects of this project. We especially wish to acknowledge Dr. Jihong Chen, Dr. Yasser Kadah, Dr. Scott Peltier, Dr. Shing-Chung Ngan, Mr. Kirt Schaper, and Dr. Kelly Rehm.

References (41)

  • S.C. Strother et al.

    The quantitative evaluation of functional neuroimaging experiments: the NPAIRS data analysis framework

    NeuroImage

    (2002)
  • P.A. Bandettini et al.

    Processing strategies for time-course data sets in functional MRI of the human brain

    Magn. Reson. Med.

    (1993)
  • C. Burges

    A tutorial on support vector machines for pattern recognition

    Data Min. Knowledge Discov.

    (1998)
  • V. Cherkassky et al.

    Learning from data: concepts, theory, and methods

    (1998)
  • T.R. Constable et al.

    An ROC approach for evaluating functional brain MR imaging and postprocessing protocols

    Magn. Res. Med.

    (1995)
  • C. Cortes et al.

    Support-vector networks

    Mach. Learn.

    (1995)
  • R.W. Cox et al.

    Software tools for analysis and visualization of FMRI data

    NMR Biomed.

    (1997)
  • B. Efron et al.

    An introduction to the bootstrap

    (1993)
  • J.H. Friedman

    An overview of predictive learning and function approximation

  • K.J. Friston et al.

    Statistical parametric maps in functional neuroimaging: a general linear approach

    Hum. Brain Mapp.

    (1995)
  • Cited by (319)

    View all citing articles on Scopus
    View full text