Support vector machines for temporal classification of block design fMRI data
Introduction
The advent of functional magnetic resonance imaging (fMRI) in the early 1990s has provided a revolutionary means for non-invasively probing spatiotemporal variations of brain function. For whole brain studies, the number of acquired brain voxels can be in the tens of thousands, which are sampled in time to acquire hundreds of measurements. The flexibility of this technique in terms of experimental design and data analysis is virtually limitless. As has been stated in a variety of ways by several researchers, neuroimaging data are extremely rich in signal information and poorly characterized in terms of signal and noise structure (Cox and Savoy, 2003, Hansen et al., 2001, LaConte et al., 2003a, Lange et al., 1999, Skudlarski et al., 1999, Strother et al., 2002).
During this same period of fMRI development, advances in the interrelated fields of machine learning, data mining, and statistics have enhanced our capabilities to extract and characterize subtle features in data sets from a wide variety of scientific fields (Cherkassky and Mulier, 1998, Hastie et al., 2001, Mjolsness and DeCoste, 2001). Among these developments, support vector machines (SVMs) have been an active area of research and have been applied to a broad range of problems. SVMs arise from the Statistical Learning Theory of Vapnik (Vapnik, 1995) and possess several unique properties appropriate for real world applications, including fMRI. Among these is the fact that the formulation of SVMs was motivated to deal with small sample sizes and high dimensional inputs, which match the situation involved for temporally predictive modeling of fMRI data.
There are several reasons to consider temporal predictive modeling1 for fMRI data analysis. First, as argued by Strother and Hansen (Strother et al., 2002) and mentioned in Morch et al. (1997), from a Bayesian perspective, there is no obvious mathematical advantage for choosing to estimate a spatial summary map from our knowledge of the experiment (e.g. general linear model approaches (Friston et al., 1995)) over trying to estimate these experimental parameters from our input patterns. Second, as demonstrated in LaConte et al. (2003a), Shaw et al. (2003), and Strother et al. (2002), prediction accuracy along with other model performance metrics such as spatial pattern reproducibility can be used as a data-dependent means of methodological validation. Currently, the most common tool for such validation is the receiver operating characteristic (ROC) analysis (Constable et al., 1995, Hansen et al., 2001, Le and Hu, 1997, Metz, 1978, Skudlarski et al., 1999, Xiong et al., 1996), measuring a method's accuracy by comparing the true positive fraction of activated pixels against the false positive fraction varied over some modeling parameter. Unlike standard ROC analysis in neuroimaging, the approach of LaConte et al. (2003a), Shaw et al. (2003), and Strother et al. (2002) need not rely on simulations. Third, predictive modeling explicitly uses the assumption that we have more reliable knowledge about the temporal aspects of the data than the spatial activation patterns. This is the same assumption implicitly used for generating SPMs, interpreting “data-driven” results, and modeling the hemodynamic response, which follows from the fact that we designed the temporal nature of the experiment and/or simultaneously measure behavioral performance. Finally, temporal predictive modeling is a much more natural way to examine the recent interest in using fMRI for brain computer interface (BCI) and biofeedback studies (LaConte et al., 2004).
The work in predictive modeling has primarily been developed by Strother and Hansen (Hansen et al., 1999, Kjems et al., 2002, Kustra and Strother, 2001, LaConte et al., 2003a, Lautrup et al., 1994, Morch et al., 1997, Shaw et al., 2003, Strother et al., 2002) with recent explicit testing of distributed brain systems by Haxby et al. (Haxby et al., 2001) and Cox and Savoy (Cox and Savoy, 2003). The implication of the classification setting is that fMRI can be used for predicting brain states to enhance our understanding of brain systems, rather than the standard emphasis on spatial mapping2. Recently, Strother has introduced a formal framework in which predictive modeling plays a prominent role. This framework, termed NPAIRS for Nonparametric Prediction Activation Influence and Reproduciblity reSampling (Strother et al., 2002), provides a disciplined approach for exploring multivariate signal and noise spaces and the impact of various factors such as experimental and data analysis parameters as well as the influence of outliers on these subspaces.
The use of SVM has recently been reported in the fMRI literature (Cox and Savoy, 2003, LaConte et al., 2003b). In LaConte et al. (2003b), we dealt with the efficacy of SVM compared to CVA and discussed SVM model interpretation. Here, we greatly expand that initial study. The work of Cox and Savoy (Cox and Savoy, 2003) differs from our approach on several points. First, we are classifying individual scans, with TRs of roughly 4 s, rather than 20-s blocks. Whereas Cox and Savoy used ten classes of visual objects, we focus on the two-class problem to illustrate the important issue of visualization and interpretation of SVM models applied to fMRI. Finally, we build our SVM models based on whole brain data, without selectively choosing voxels through an initial statistical parametric mapping.
In the context of BCI-type studies such as LaConte et al. (2004) or analyses of distributed representations of sensory information (Cox and Savoy, 2003), predictive modeling may be the ultimate goal. A major impetus for performing MRI-based experiments in the first place, however, is to obtain spatially localized information. With this experimental data, one advantage of predictive modeling is that it allows for spatially distributed patterns of activation while also incorporating the temporal structure of the experiment. In other words, we are dealing with multivariate approaches. These spatial summary maps provide aid in model interpretation as well as a tangible means of comparing different models (e.g. (Hansen et al., 2001)). For the SVM, generation of these summary maps requires special consideration, and we outline four methods for doing this, demonstrating one as an example.
This article has the explicit aim of formally describing SVM classification in the application domain of fMRI analysis and to propose interpretation (mapping) strategies for this application. We give a careful description of SVM classifiers and examine more closely the selection and tuning of SVM model parameters. We then illustrate SVM classification through comparisons to CVA, a previously published technique. We do not attempt a definitive verdict on CVA vs. SVM, but rather highlight some currently known merits of both approaches within the fMRI application domain.
Section snippets
Theory
Here, we summarize only the salient concepts for SVM-based classification that are essential for describing its application to fMRI. For a more general treatment, please refer to Burges (1998), Cherkassky and Mulier (1998), and Muller et al. (2001). See Kjems et al. (2002), LaConte et al. (2003a), and Strother et al. (2002) for a description of CVA. In particular, we discuss the classification setting and its relevance to fMRI data analysis, the use of SVM classifiers, and interpretation of SVM
Methods
The SVM implementation used was SVMlight (Joachims, 1999). For I/O speed considerations, we modified this C-based software to read binary image files. CVA along with NPAIRS was implemented in IDL (RSI, Boulder, CO) and is part of the VAST software library (http://neurovia.umn.edu/incweb/npairs_info.html) at the VA Medical Center, Minneapolis, Minnesota. Visualization of SVM models was accomplished with Matlab (MathWorks, Natick, MA) and AFNI (Cox, 1996, Cox and Hyde, 1997).
Results
Our initial exploration of polynomial kernels convinced us to focus on linear kernels (where the input space is equivalent to the feature space). Fig. 2 shows our justification for this. It also represents the first of many result “images,” where gray scale represents the value of a result (in this case polynomial order), with one row per subject and one column per preprocessing. Readers are again referred to Table 1 for the preprocessing abbreviations used for these figures.
From our resampling
Discussion
Focusing primarily on linear (soft margin) SVM classifiers, we have described many of the issues important to block design fMRI. To assess performance, prediction accuracy results of tuned SVM models were compared with our recent CVA work across sixteen subjects and ten preprocessing combinations. We also discussed various aspects of interpretation and visualization of SVM models in the context of fMRI.
Three important issues relevant to this work require further elaboration. These are
Acknowledgments
Many people have helped with various aspects of this project. We especially wish to acknowledge Dr. Jihong Chen, Dr. Yasser Kadah, Dr. Scott Peltier, Dr. Shing-Chung Ngan, Mr. Kirt Schaper, and Dr. Kelly Rehm.
References (41)
AFNI: software for analysis and visualization of functional magnetic resonance neuroimages
Comput. Biomed. Res.
(1996)- et al.
Functional magnetic resonance imaging (fMRI) “brain reading”: detecting and classifying distributed patterns of fMRI activity in human visual cortex
NeuroImage
(2003) - et al.
Generalizable patterns in neuroimaging: how many principal components?
NeuroImage
(1999) - et al.
The quantitative evaluation of functional neuroimaging experiments: generalization error and learning curves
NeuroImage
(2002) - et al.
The evaluation of preprocessing choices in single-subject BOLD fMRI using NPAIRS performance metrics
NeuroImage
(2003) - et al.
Plurality and resemblance in fMRI data analysis
NeuroImage
(1999) Basic principles of ROC analysis
Semin. Nucl. Med.
(1978)- et al.
Evaluating subject specific preprocessing choices in multisubject fMRI data sets using data-driven performance metrics
NeuroImage
(2003) - et al.
ROC analysis of statistical methods used in functional MRI: individual subjects
NeuroImage
(1999) - et al.
The connection between regularization operators and support vector kernels
Neural Netw.
(1998)
The quantitative evaluation of functional neuroimaging experiments: the NPAIRS data analysis framework
NeuroImage
Processing strategies for time-course data sets in functional MRI of the human brain
Magn. Reson. Med.
A tutorial on support vector machines for pattern recognition
Data Min. Knowledge Discov.
Learning from data: concepts, theory, and methods
An ROC approach for evaluating functional brain MR imaging and postprocessing protocols
Magn. Res. Med.
Support-vector networks
Mach. Learn.
Software tools for analysis and visualization of FMRI data
NMR Biomed.
An introduction to the bootstrap
An overview of predictive learning and function approximation
Statistical parametric maps in functional neuroimaging: a general linear approach
Hum. Brain Mapp.
Cited by (319)
Global research evolution and frontier analysis of artificial intelligence in brain injury: A bibliometric analysis
2024, Brain Research BulletinDecoding fMRI data with support vector machines and deep neural networks
2024, Journal of Neuroscience MethodsFunctional brain network identification and fMRI augmentation using a VAE-GAN framework
2023, Computers in Biology and MedicineA deep learning method for autism spectrum disorder identification based on interactions of hierarchical brain networks
2023, Behavioural Brain ResearchOptimizing multivariate pattern classification in rapid event-related designs
2023, Journal of Neuroscience MethodsLearning brain representation using recurrent Wasserstein generative adversarial net
2022, Computer Methods and Programs in Biomedicine