Application of artificial neural network to fMRI regression analysis
Introduction
In functional magnetic resonance imaging (fMRI) studies, searches for the occurrence of signal changes that correlated with certain events are conducted to detect brain activations. In the general linear model (GLM) approach (Friston et al., 1995), a regressor representing the canonical hemodynamic response function (HRF) is used to detect such correlations. However, if the shape of the hemodynamic response differs greatly from the pre-assumed shape (Aguirre et al., 1998, Miezin et al., 2000) or an unknown process mediates such correlations, we cannot detect those correlations. In some cases, a combination of regressors, the canonical HRF, its temporal derivative, and a dispersion derivative, is used to absorb this diversity of response shape (Friston et al., 1998a, Friston et al., 1998b). Although this method can absorb minor changes in the canonical HRF, it cannot absorb all diversity and the response variability is still a problem.
Various other approaches, which do not assume the shape of the HRF a priori, have been proposed including selective averaging (Dale and Buckner, 1997), smooth FIR filters (Goutte et al., 2000), and non-parametric Bayesian estimation of the HRF (Marrelec et al., 2003). The primary advantage of these methodologies, which are called non-parametric methods because no parametric models of the HRF are used, is that even when the response functions diverge from region to region or subject to subject, correlated responses can be detected.
In this study, we proposed another ‘semi-parametric’ method for fMRI analysis: the use of a very general class of functional forms to build more flexible models (Bishop, 1995). The proposed method uses a feed-forward layered artificial neural network (ANN) to describe a non-linear dynamic system of hemodynamic response. Some event over its recent history is used as an input and the BOLD signal from a particular voxel is used as a supervised signal. This type of artificial neural network is called a multi-layer perceptron (MLP). It is known that for an infinite number of hidden units, an MLP with one or more hidden layers whose output functions are sigmoid functions can approximate any continuous function to any degree of accuracy (Funahashi, 1989, Hornik et al., 1989). Thus this method can perform a non-linear regression analysis between BOLD signals and events without explicit modeling of the response function.
In the MLP, hidden units receive input vector xin multiplied by an adjustable weight matrix Whidden-in and bias value bhidden. Hidden units have a general class of transfer function, sigmoid function tanh, and transfer the input value to the output layer.
At the output layer, the outputs of the hidden layer xhidden are multiplied by an adjustable weight matrix Wout-hidden and a bias value bout is added.y is a network output. Thus, an MLP approximates the regression function y = f(xin), by a convolution of sigmoid functions with adjustable input weights. Hereafter, we call this regression the ANN regression.
The ANN regression produces results known to be equivalent to the Volterra kernel method (Volterra, 1959) without explicit definition of the kernel functions (Wray and Green, 1994). Friston et al., 1998b, Friston et al., 2000 have already applied the Volterra series approach to fMRI analysis. In their approach, the kernel functions are explicitly modeled (Friston et al., 2000). From the viewpoint of MLP, this approach can be seen as using fewer pre-assumed transfer functions at the hidden layer and only a restricted number of input-hidden connections. In contrast, the proposed ANN regression has no explicit form of response functions and uses fully adjustable connections between the input and hidden layers. Because an adequately trained network is equivalent to the Volterra series expansion with all the dimensions of that system (Wray and Green, 1994), the ANN regression provides more flexibility than using only a restricted set of Volterra kernels. Furthermore, such a network can describe any relations between input and output series.
The ANN regression analysis has another advantage over non-parametric methods. ANN regression can detect any type of response modulation by the input values. This type of analysis, which is known as parametric modulation analysis, deals not only with the existence of activation, but also with how that activation is modulated by the parameters of the event. To detect parametric modulation using non-parametric methods, we have to explicitly model the shape of the modulations; otherwise we can detect only linear modulations. The ANN regression, in contrast, does not require explicitly modeling of the modulation shape. Furthermore, any continuous modulation can be detected, owing to the ability of the ANN regression to adjust to any continuous functions.
This paper presents a method of ANN regression that can be applied to fMRI studies. In fMRI analyses, autocorrelation noise involved in the BOLD signal time course (Bullmore et al., 1996, Purdon and Weisskoff, 1998, Woolrich et al., 2001) may be a problem for the ANN regression, because it can fit any relations between events and BOLD signals. We considered this problem in analyzing null-task fMRI data and synthetic white noise data. In the following section, we describe the application of the ANN method to a practical case: a memory-guided saccade task. Because this task includes various responses with different shapes, it is a good example of how the ANN method can fit various shapes of responses. Finally, to explore the potential of the ANN method, a parametric modulation analysis was performed. Using a synthetic data set, we compared the ANN method and a non-parametric method and their ability to detect non-linear parametric modulations.
Section snippets
Analysis methods
Our analysis used a multi-layer ANN to regress fMRI signals using a history of recent events. The ANN is a network of simple processing nodes connected by a certain weights (Hertz et al., 1991). The model was inspired by biological processes, i.e., modeling biological neural network architectures such as parallel processing, with the node transfer function providing simple modeling of neural activation. Although it has been used for modeling various perceptual and cognitive processes in
ANN fitting to white noise and null fMRI data
When the ANN regression is applied to fMRI data, autocorrelation noise, which is often involved in a BOLD signal time course (Bullmore et al., 1996), may become a problem. Although we were able to avoid over-fitting by using the cross-validation procedure described above, an ANN may fit autocorrelation noise because it is not white noise. In this section, we examine how an ANN fits to noise signals, i.e., synthetic white noise and a null fMRI signal (no task was applied and the subject rested
Application to practical data: memory-guided saccade task
In an earlier section, we showed that the ANN did not regress the autocorrelation noise relative to the white noise. However, the ANN may fit noises as well as activation signals. In this section, we describe the use of a practical task experiment to examine whether we could discriminate activation from noise in the ANN regression result.
In the experiment, a memory-guided saccade task (Funahashi et al., 1989, Funahashi et al., 1991, Hikosaka and Wurtz, 1983) was performed. The subject was
Parametric modulation analysis using an ANN regression and non-parametric methods
One advantage of the ANN analysis over non-parametric methods is in finding modulated responses by the stimulus parameters in a parametric modulation analysis (e.g., Pinel et al., 2001, Riecker et al., 2003, Coull et al., 2004). This analysis tells us not only ‘which area’ of the brain was activated by the stimulus, but also ‘how’ the brain region was activated by changes in the stimulus parameters. Though conventional non-parametric methods can describe response modulations, these methods
General discussion
This paper examined the application of an artificial neural network (ANN) regression analysis to fMRI data. To reduce the computational time, we used an RPROP algorithm (Riedmiller, 1994, Riedmiller and Braun, 1996), and to avoid over-fitting we used early stopping with cross-validation. The result of fitting to synthetic white noise and null fMRI data indicated that the autocorrelation noise intrinsically included in most fMRI time course data was less problematic in the ANN regression.
Acknowledgment
We would like to thank to Dr. Makoto Kato for his helpful comments and technical assistance with the memory-guided saccade task.
References (48)
- et al.
The variability of human, BOLD hemodynamic responses
NeuroImage
(1998) - et al.
Ambiguous results in functional neuroimaging data analysis due to covariate correlation
NeuroImage
(1999) - et al.
Characterizing stimulus-response functions using nonlinear regressors in parametric fMRI experiments
NeuroImage
(1998) Orthogonal polynomial regression for the detection of response variability in event-related fMRI
NeuroImage
(2002)- et al.
Event-related fMRI: characterizing differential responses
NeuroImage
(1998) - et al.
Nonlinear responses in fMRI: the Balloon model, Volterra kernels, and other hemodynamics
NeuroImage
(2000) On the approximate realization of continuous mappings by neural networks
Neural Netw.
(1989)- et al.
Thresholding of statistical maps in functional neuroimaging using the false discovery rate
NeuroImage
(2002) - et al.
Multilayer feedforward networks are universal approximators
Neural Netw.
(1989) - et al.
Human precentral cortical activation patterns during saccade tasks: an fMRI comparison with activation during intentional eyeblink tasks
NeuroImage
(2003)
Efficiency, power, and entropy in event-related fMRI with multiple trial types Part II: design of experiments
NeuroImage
Efficiency, power, and entropy in event-related FMRI with multiple trial types Part I: theory
NeuroImage
A new statistical approach to detecting significant activation in functional MRI
NeuroImage
Characterizing the hemodynamic response: effects of presentation rate, sampling procedure, and the possibility of ordering brain activity based on relative timing
NeuroImage
Modulation of parietal activation by semantic distance in a number comparison task
NeuroImage
Automatic early stopping using cross validation: quantifying the criteria
Neural Netw.
Parametric analysis of rate-dependent hemodynamic response functions of cortical and subcortical brain structures during auditorily cued finger tapping: a fMRI study
NeuroImage
Comparison of detrending methods for optimal fMRI preprocessing
NeuroImage
Temporal autocorrelation in univariate linear modeling of FMRI data
NeuroImage
Speed Improvement of the back-propagation on current generation workstations
Controlling the false discovery rate: a practical and powerful approach to multiple testing
J.R. Stat. Soc.
Neural Networks for Pattern Recognition
Statistical methods of estimation and inference for functional MR image analysis
Magn. Reson. Med.
Functional anatomy of the attentional modulation of time estimation
Science
Cited by (12)
Reproducibility of importance extraction methods in neural network based fMRI classification
2018, NeuroImageCitation Excerpt :In our case, the parameters of the model were optimized through backpropagation using mini-batch gradient descent as the optimization algorithm. Neural network classifiers have been previously used to classify fMRI data either with hidden layers (Bertolino et al., 2014; Floren et al., 2015; Misaki and Miyauchi, 2006) or without (Polyn et al., 2005; Saarimäki et al., 2016). The majority of MVPA studies use support vector classifiers (SVC) (Cox and Savoy, 2003; De Martino et al., 2008, Ethofer et al., 2009; Habes et al., 2013; LaConte et al., 2005; Kamitani and Tong, 2005; Lahnakoski et al., 2014; Lie et al., 2013; Meier et al., 2012; Mourão-Miranda et al., 2005; Mourão-Miranda et al., 2007; Rasmussen et al., 2011; see also Sundermann et al., 2014, for an extended list) due to fast training and good performance in ill-posed problems such as in fMRI classification (Etzel et al., 2013).
Explaining ANN-modeled fMRI Data with Path-Weights and Layer-Wise Relevance Propagation
2023, CEUR Workshop ProceedingsTowards XAI: Interpretable Shallow Neural Network Used to Model HCP’s fMRI Motor Paradigm Data
2022, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)Interpretive Reservoir: A Preliminary Study on the Association between Artificial Neural Network and Biological Neural Network
2018, Proceedings of the International Joint Conference on Neural NetworksDiagnostics of the Brain Neural-Ensemble States Using MEG Records and Artificial Neural-Network Concepts
2018, Technical Physics Letters