1 Introduction

Functional Magnetic Resonance Imaging (fMRI) is one of the most common imaging techniques for detecting the activation levels of human brain, during a cognitive process. fMRI measures the change of oxygen level in the brain with respect to neural activities. In principle, oxygen dependencies of neuron groups fluctuate in accordance with the activation and MRI machines can detect those changes through the scan. An intensity value is recorded at each 1–2 s for a neuron group called voxel. Each voxel is a cubic volume element around 1–2 \(\mathrm{mm}^{3}\) size. Classification of the cognitive stimulus from the voxel intensity values are called brain decoding and the pioneering studies in this area are called Multi Voxel Pattern Analysis (MVPA) [9, 11]. MVPA involves recognizing the cognitive states represented by voxel intensity values of fMRI data, using machine learning techniques. A set of features is extracted from voxel intensity values recorded during each cognitive task. However, due to the large feature space formed by voxels (about 100,000–200,000 voxels per brain volume), dimension reduction techniques are required, such as, clustering the voxels groups into homogeneous regions.

Anatomical regions, defined by experimental neuroscience can be used as brain parcels. In most common approach, called AAL, there are 116 major regions and each region is assumed to contain voxel groups which work together. In order to reduce the dimension of the feature space, representative signals can be selected for each region or average time series can be computed within each region [1, 16]. However, anatomical regions lose the subject-specific and task dependent information of brain activities. Besides, sizes of the regions vary extremely and activation levels of voxels may not be homogeneous within an anatomic region.

In order to partition the voxels into a set of homogeneous regions, well-defined clustering methods such as k-means [6, 7, 10], hierarchical clustering [1, 4], and spectral clustering [17, 20] can be used. The pros and cons of these clustering algorithms are widely studied in fMRI literature on a variety of datasets [8, 18]. Some studies bring spatially close voxels together considering only the location information in analogy with the AAL [6]. Although this method improves the strict norms of AALs, it lacks the functional similarity of voxel time series, which belongs to the same regions. Recent literature reveals that functionally close voxels tend to contribute to the same cognitive task, thus, form homogeneous regions. Therefore, one needs to bring both functionally similar and spatially contiguous voxels together to define homogeneous brain regions [21]. Similarly, Wang et al. suggest to combine n-cut segmentation algorithm with simple linear iterative clustering (SLIC) [21]. Blumensath et al. use region growing for brain segmentation with functional metrics and spatial constraints between samples [3]. Bellec et al. also use region growing with functional metrics within the 26 spatial neighborhood in 3-Dimensional space [2]. Background on neuronal activity, also, supports this idea, such that physically close neurons are in chemical interaction with each other and this interaction can be interpreted as functional similarity. With these objectives in mind, many different clustering algorithms are applied to create data dependent homogeneous brain parcels. Depending on the predefined distance measure, the clustering algorithms can group spatially or functionally similar voxels under the same cluster. Craddock et al. adopt this idea and propose a brain parcellation method, in which they represent the voxels in a graph structure and used n-cut on a spatially constrained brain graph with functional edges [5]. In order to achieve spatial contiguity they connected each voxel to its 26 closest neighbors in 3D space. On the other hand, to accomplish functional homogeneity, they set edge weights of the graph to the correlation between the time series of two voxels as follows;

$$\begin{aligned} e_{i,j} = {\left\{ \begin{array}{ll} corr(\mathbf {v_{i}},\mathbf {v_{j}})&{}, dist(\mathbf {v_{i}},\mathbf {v_{j}}) \le d_{t} \\ 0&{}, otherwise, \end{array}\right. } \end{aligned}$$
(1)

where \(d_{t}\) is selected to be \(\root \of {3}\) and \(corr(\mathbf {v_{i}},\mathbf {v_{j}})\) is the Pearson Correlation between the intensity values of voxels \(\mathbf {v_i}\) and \(\mathbf {v_j}\). They, also, remove the edges with correlation values less than 0.5 to reduce the weak connections. Then, they define a brain graph \(G = (V,E)\), where the set of voxels \(V = [\mathbf {v_{1}}, \mathbf {v_{2}} \ldots \mathbf {v_{N}}]\) are the nodes of the graph, and \(E = [e_{1,1}, e_{1,2}, \dots e_{N,N}]\) are the edge weights computed according to Eq. 1. They partition the graph G into subgraphs by removing the edges iteratively using N-cut segmentation method using the following formula,

$$\begin{aligned} N\_cut = \frac{\sum _{\mathbf {v_{i}}\in A,\mathbf {v_{j}}\in B}e_{i,j}}{\sum _{\mathbf {v_{i}}\in A,\mathbf {v_{n}}\in V}e_{i,n}} + \frac{\sum _{\mathbf {v_{i}}\in A,\mathbf {v_{j}}\in B}e_{i,j}}{\sum _{\mathbf {v_{j}}\in B,\mathbf {v_{n}}\in V}e_{j,n}}. \end{aligned}$$
(2)

As it is mentioned above, conventional MVPA methods create features sets from the selected voxel intensity values or use some averaging techniques to represent each brain region. This approach is quite restrictive to represent cognitive states. Recent studies suggest to model the relationships among voxels rather than using voxel intensity values. Ozay et al., demonstrate this idea by suggesting the Mesh Model which is a graph structure that identifies the connectivity among voxels [15].

Mesh Model (MM) represents intensity values of voxels as a weighted linear combination of its neighboring voxels, defined on a neighborhood system. The estimated weights represent the arc weights between the voxels and the voxels represents a node in the overall brain graph. A star mesh is formed around each voxel and its p neighbors, independently. In each mesh, the voxel at the center is called seed-voxel and the surrounding voxels are called neighbors. p nearest neighbors of voxel \(\mathbf {v_{i}}\) for cognitive stimulus k are shown as \(\eta _{\mathbf {v_{i}}(k)}^{p}\) and they can be selected spatially (Spatial Mesh Model - SMM) [12, 14] or functionally (Functional Mesh Model - FMM) [12, 13] such that, spatial neighbors has the smallest Euclidean distance with the seed-voxel whereas functional neighbors has maximum functional similarity. Meshes are formed using the full length time series for voxels, recorded during an fMRI experiment session. Assuming s measurements are taken for each cognitive stimulus, time series of a voxel \(\mathbf {v_{i}}\) for stimulus k is an s dimensional vector shown as \(\mathbf {v_{i}}(k) = [\mathbf {v_{i}}(k)^{1}, \mathbf {v_{i}}(k)^{2}, \ldots \mathbf {v_{i}}(k)^{s}]\). Spatial Mesh Model (SMM) selects the neighbors according to the physical distances among voxels in 3-dimensional space by Euclidean distance [12, 14]. On the other hand, Functional Mesh Model (FMM), proposed by Onal et al., selects functional neighbors with the highest p-correlation values obtained by Pearson Correlation [12, 13]. Afterwards, time series of the seed voxel is represented as a weighted combination of its neighbors by the following equation for each cognitive stimuli:

$$\begin{aligned} \mathbf {v_{i}}(k) = \sum _{\mathbf {v_{j}}(k) \in \eta _{\mathbf {v_{i}}(k)}^{p}} a_{i,j,k} \mathbf {v_{j}}(k) + \epsilon _{i,j}, \end{aligned}$$
(3)

where \(\eta _{v_{i}(k)}^{p}\) is the p nearest neighbors of voxel \(v_{i}\) for sample k and \(a_{i,j,k}\) are the arc weights of the mesh network between the voxels and they are called Mesh Arc Descriptors (MADs). MADs are estimated using regularized Ridge regression method by the minimization of error term \(\epsilon _{i,j}\). Concatenating each MAD for each voxel and cognitive task creates a new feature space and classification is performed on this feature space.

In this study we combine classical brain parcellation approach proposed by Craddock et al. and Mesh Model and propose a novel brain parcellation algorithm, called BrainParcel. Unlike current methods, we partition the graph formed by star meshes and partition this graph into brain regions. We show that brain partitions obtained by BrainParcel have better representation power than the partitions obtained by state of the art clustering methods and AAL in cognitive state classification problem.

Fig. 1.
figure 1

Overall architecture of the BrainPacel algorithm.

2 BrainParcel

Brain parcel is a brain partitioning algorithm that uses graph theoretic approaches. First, we form a brain graph by ensembling the meshes estimated around each voxel. Then, we partition this graph using n-cut segmentation algorithm. Each region is represented by the average time series of all voxels in that region. Then, these representative time series are fed to a machine learning algorithm to classify the underlying cognitive states. Figure 1 indicates the stages of suggested BrainParcel method for brain decoding problem. Each stage is explained in the following subsections.

2.1 Neighborhood System

In order to estimate a star mesh around each voxel independently, we need to define a neighborhood system. The concept of neighborhood takes an important place in this study. We inspire from the biological structure of human brain, where spatially close neurons act together by means of some electro-chemical interactions. Additionally, experimental evidence indicates that physically far apart neurons may contribute to the same cognitive process through the brain connectome. We try to utilize these observations in our brain parcellation model by defining a neighborhood system around each voxel and employ multiple connections between the neighboring voxels.

Neighborhood of the \(i^{th}\) voxel \(\mathbf {v_{i}}\), is defined as the set of voxels that are closest to \(\mathbf {v_{i}}\) according to a predefined rule. Assuming \(p_c\) many neighbors around a voxel, neighborhood of \(\mathbf {v_{i}}\) is represented by \(\eta _{v_{i}}^{p_c}\).

Letting N be the number of voxels, we define an \(N-by-N\) adjacency matrix, ND, to represent the neighborhood of voxels. Each entry of ND is calculated as follows;

$$\begin{aligned} ND(i,j) = {\left\{ \begin{array}{ll} 1 &{}, v_{j} \in \eta _{v_{i}}^{p_c} \\ 0 &{}, otherwise. \end{array}\right. } \end{aligned}$$
(4)

In this study, we define two types of neighborhood, given below:

Spatial neighborhood \(\eta _{v_{i}}^{p_c}\) is defined as the set of voxels, which has the \(p_c\) smallest Euclidean distance in 3-Dimensional space to voxel \(\mathbf {v_{i}}\). This neighborhood system ensures resulting brain parcels to be spatially contiguous.

Functional neighborhood \(\eta _{v_{i}}^{p_c}\) is defined as the set of voxels, which has the highest \(p_c\)-Pearson Correlation to voxel \(\mathbf {v_{i}}\). This neighborhood system connects functionally similar voxels, even if they are physically apart from each other.

Note that, selection of the number of neighbors, \(p_c\), and the type of the neighborhood system highly effects the rest of the steps of BrainParcel. Specifically, functional neighborhood relaxes the spatial similarity, selecting the neighboring voxels which are physically far apart. Therefore, the resulting brain parcels are not guaranteed to be spatially contiguous. It is very crucial to define a sort of balance in these two types of neighborhood, so that the resulting brain parcels consist of functionally similar and spatially contiguous voxels.

2.2 Extracting Mesh Arc Descriptors (MADs) Among Voxels

Each voxel is connected to its neighboring voxels according to one of the above neighborhood systems to form a star mesh around that voxel. The structure of star mesh depends on the type of the neighborhood system defined above. The arc weights of each local mesh are estimated by adopting the mesh model of Onal et al. [12,13,14]. As opposed to the current studies, we form the meshes, based on the complete time series recorded at each voxel rather than forming a different mesh for each cognitive task. This approach enables us to form a shared brain partition across all of the cognitive tasks

fMRI technique collects a time series for each voxel, when the subject is exposed to a cognitive stimulus. In the case of a block experiment design, which we have used, subjects are exposed to a stimulus for a specific time interval and the voxel time series over the entire brain volume are collected. Then, after a rest period, another stimulus is given to the subject. The time series recorded during a stimulus at \(i^{th}\) voxel is represented by the vector \(\mathbf {v_{i}}\). Based on the idea of mesh model, we represent each \(\mathbf {v_{i}}\) as weighted sum of other voxels in the \(\eta _{v_{i}}^{p_c}\) neighborhood of \(\mathbf {v_{i}}\) according to Eq. 3. Notice that Mesh Arc Descriptors (MADs) for classification are calculated per cognitive stimulus. However, we compute MADs from the entire time series of the voxels. Therefore, k index, which indicates a specific cognitive task, is removed from Eq. 3, since we compute MADs for the entire time duration of fMRI recordings. This representation is carried with a linear equation by the following formula;

$$\begin{aligned} \mathbf {v_{i}} = \sum _{v_{j} \in \eta _{v_{i}}^{p}} a_{i,j} \mathbf {v_{j}} + \epsilon _{i,j}. \end{aligned}$$
(5)

Weights of the representation, called Mesh Arc Descriptors (MADs) are shown as \(a_{i,j}\) and are estimated by Regularized Ridge Regression which minimizes the mean squared error \(\epsilon _{i,j}^2\) [12,13,14].

2.3 Constructing a Voxel-Level Brain Graph

In order to construct a brain graph from the estimated MADs, we ensemble all the local meshes under the same graph, \(G_m = (V,E_m)\). The set of nodes of this graph correspond the set of voxels \(V = [\mathbf {v_1},\mathbf {v_2}, \ldots \mathbf {v_N}]\). The set of edges corresponds to set of all MADs, \(a_{i,j}\epsilon E_m\). Note that, since \(a_{i,j}\ne a_{j,i}\), the graph \(G_m\) is directed. On the other hand, the graph partitioning methods, such as n-cut requires undirected graphs, in which each edge weight, \(e_{i,j}\) is represented by a scalar number. In order to obtain an undirected graph from the directed graph \(G_m\), a set of heuristic rules are used. Suppose that the mesh is formed for the voxel \(\mathbf {v_{i}}\), and \(\mathbf {v_{j}}\) is in the neighborhood of \(\mathbf {v_{i}}\) with mesh arc-weight \(a_{i,j}\). Edge value \(e_{i,j}\) is determined, based on the following rules:

  • Case 1: IF \( \mathbf {v_{i}} \notin \eta _{v_{j}}^{p_c}\) AND \(\mathbf {v_{j}} \in \eta _{v_{i}}^{p_c}\) THEN \(e_{i,j} = a_{i,j}\)

  • Case 2: IF \(\mathbf {v_{i}} \in \eta _{v_{j}}^{p_c}\) AND \(\mathbf {v_{j}} \in \eta _{v_{i}}^{p_c}\) THEN this case requires further analysis. Assuming highly correlated voxels should have a stronger edge between them, we employ the following thresholding method;

    IF \(corr(v_{i}, v_{j}) \ge 0\), THEN \(e_{i,j} = max(a_{i,j},a_{j,i})\)

    IF \(corr(v_{i}, v_{j}) < 0\), THEN \(e_{i,j} = min(a_{i,j},a_{j,i}).\)

Above rules prune the directed graph \(G_m\) to an undirected graph G to be partitioned for obtaining homogeneous brain regions, called supervoxels.

2.4 Graph Partitioning for Obtaining Supervoxels

After constituting the brain graph G, n-cut segmentation method is used for clustering this graph. N-cut is a graph partitioning algorithm which carries a graph cut method on a given undirected graph. Given G, n-cut cuts the edges one by one in an iterative manner. With each cut, the graph is split into two smaller connected components. Letting N be the number of voxels, n-cut method requires the representative graph G, which is actually an \(N-by-N\) adjacency matrix explained in the previous sections. The number of intended brain parcels is set to \(\mathbb {C}\). With graph cut operations, graph is split into C connected components where \(C \le \mathbb {C}\). Each sample is a member of one of this clusters and assigned with a cluster index. In other words, n-cut method returns an \(1-by-N\) dimensional vector \(\mathbb {L_{C}} = [l_{1}^{c}, l_{2}^{c}, \ldots , l_{N}^{c}]\) where each \(l_{i}^{c}\) is a number between 1 and C. The n-cut method, as applied to undirected graph G is called BrainParcel. The output of this algorithm yields a set of supervoxels, which are homogeneous with respect to the subgraphs of mesh network.

Recall that, anatomical regions (AAL) produce an experimentally neuroscientific parcellation of the brain. In order to compare the brain decoding performances, we conducted our experiment, where we form mesh network for both among anatomical regions and the network formed among supervoxels obtained at the output of BrainParcel. There are 116 basic brain regions in AAL and each voxel resides in one and only one region. Let us represent the anatomically defined region indices of voxels with \(\mathbb {L_{A}} =[l_{1}^{a}, l_{2}^{a}, \ldots , l_{N}^{a}]\), in order to avoid confusions. Notice that, with \(\mathbb {L_{A}}\) we skip all of the brain parcellation steps. Also, let us call \(\mathbb {L} =[l_{1}, l_{2}, \ldots , l_{N}]\) to all kinds of brain segmentations; in our case it means \(\mathbb {L} \supset (\mathbb {L_{C}} \cap \mathbb {L_{A}})\).

2.5 Representation of Supervoxels

We need to calculate a representative signal for each supervoxel. For this purpose, we take an average among the time series of voxels within each supervoxel. With C supervoxels, we calculate set of vectors \(U = [\mathbf {u_{1}},\mathbf {u_{2}}, \ldots , \mathbf {u_{C}}]\), where each \(u_{i}\) is the representative vector of supervoxel i and they are calculated as follows;

$$\begin{aligned} \mathbf {u_{i}} = \frac{\sum _{l_{j} == i} \mathbf {v_{j}}}{\sum _{l_{j} == i} 1}. \end{aligned}$$
(6)

In the dataset on which we have performed our experiments, six measurements were taken for each cognitive stimulus. Assuming K stimuli were shown to each subject, time series of each voxel has a length \(\mathbb {K} = 6K\). Therefore, at the output of the clustering algorithm we construct a data matrix U of size \(C-by-\mathbb {K}\), where each row represents a feature, and each column corresponds to a cognitive stimulus.

2.6 Constructing Supervoxel-Level Brain Graph

The original area of utilization of the mesh model was to model the relationships among voxels and use this relationship for decoding the cognitive processes. Both spatial and functional neighborhoods were considered, and their representation powers were demonstrated by relatively high recognition performances compared to the available state of the art network models. Specifically, Functional Mesh Model (FMM) outperform most of the MVPA and Spatial Mesh Model (SMM) results. Therefore, we use FMM for classifying the cognitive states.

Data matrix U, defined in the previous section, is feed into the FMM algorithm. Each supervoxel \(\mathbf {u_{i}}\) is represented by linear combination of its functional neighbors, the arc weights are estimated at each mesh using Ridge Regression for each cognitive stimulus. Recall that fMRI collects multiple measurements during the time course of each cognitive stimulus. In our dataset 6 measurements are collected for each stimulus, and \(\mathbf {u_{i}}\) is a vector of length \(\mathbb {K} = 6K\) for K stimuli. Let us represent the vector of the stimulus k by \(\mathbf {u_{i}(k)}\). First, functionally closest \(p_{m}\) neighbors of \(\mathbf {u_{i}(k)}\); \(\eta _{u_{i}(k)}^{p_{m}}\), are selected from the supervoxels \(\mathbf {u_j}\) which has the highest correlation with supervoxel \(\mathbf {u_i}\) according to Pearson Correlation. Then, the mean square error \(E(\epsilon _{i,j}^2)\) is minimized to estimate \(a_{i,j,k}\) of the Eq. 3. Estimated MADs are concatenated so that they represent a more powerful feature space compared to the raw fMRI signal intensity values that is used in MVPA studies. We concatenate all the MADs and represent the stimulus in a feature space formed by MADs.

2.7 Classification

MADs estimated at supervoxel-level, are concatenated under a feature vector for classifying the cognitive states. 6 fold cross validation schema is applied on the dataset, where at each fold, one run is reserved from the data as a test set. Logistic regression is used for classification.

3 Experiments

3.1 Dataset

In this study, we use a dataset called “Object Experiment”. This dataset is recorded by the team of ImageLab of METU members at Bilkent University UMRAM Center. It consists of 4 subjects in the age of twenties. Each subject is shown various bird and flower pictures. In between those stimuli, simple mathematics questions are shown as transition. There are 6 runs in the experiment and in each run, 36 pictures are shown to each subject. Thus, there are total of 216 samples. Number of samples are balanced for the two classes (bird and flower). Preprocessing of the dataset is carried with the SPM toolbox and the number of voxels is decreased to 20,000 for each subject. Also, there are 116 labeled anatomical regions, defined under MNI coordinate system [19]. We provide experimental results, where each given accuracy is the output of a six fold cross validation. Recall that, each subject is given 6 runs of stimuli. At each fold, we split a run for testing and use the other 5 runs for training. The reported accuracies are the average of these 6 folds for each subject.

3.2 Comparative Results

In this section, we provide a comparison between BrainParcel and the parcellation algorithm suggested by Craddock et al. Table 1 shows the classification performances for various number of parcels. The results are reported after optimizing the mesh sizes empirically. Recall that, functionally constrained systems that construct the graph with \(Functional\_ND\) neighborhood system does not provide any spatial integrity within the brain parcels, since the brain graph is not formed on these grounds. On the other hand, spatially constrained systems provide both spatial continuity and functional homogeneity since the brain graph is formed by spatial restrictions and edges are weighted in terms of functional connectivity.

In Table 1, the first and third column give the best results for the brain parcellation method suggested by Craddock et al. (called classical, in the Table), and the other two gives the results obtained by BrainParcel that we have proposed. Each row of this table gives the results for a different number of supervoxels (SV). Notice that spatially constrained BrainParcel gives the best classification performances in the overall schema.

These results point to the idea that, in order to achieve better representational power for cognitive state classification, one needs spatially contiguous and functionally homogeneous brain parcels, which is accomplished by spatial BrainParcel. Moreover, recall that we have offered BrainParcel as an alternative to AAL, which has 116 basic anatomic regions and gives 53% performance on average. A compatible parcellation scheme consists of 100 super voxels, where, Spatial BrainParcel results in higher classification accuracies compared to the other methods.

Table 1. Overall 2-class classification accuracy acquired from the MADs constructed among super voxels and method suggested by Craddock et al. (called, classical). These results suggest that Spatial BrainParcel gives higher performances, since it provides spatial continuity and functional homogeneity within each brain parcel.

4 Conclusion

In this study, we offer a brain parcellation methodology, which combines the spatial and functional connectivity of brain on a novel graph representation. This approach offers a better alternative to the current brain parcellation methods in the literature [8, 18], for brain decoding problems. BrainParcel uses spectral clustering methods, which represents the voxel space as a graph formed by mesh model. Common studies compute the edge weights of the brain graph as the pairwise correlation between voxels, whereas we computed the edge weights by estimating them using the mesh model among a group of voxels. Then, brain graph is partitioned with n-cut segmentation method to generate supervoxels.

As suggested, using task dependent brain parcellation methods enable better brain decoding performances compared to anatomical regions. Moreover, it is demonstrated that functional connectivity, united with the spatial contiguity is the best approach to represent homogeneous brain regions.

Also, results show that using the MADs of the mesh model for classification, improves the brain decoding performances in all of the experiment setups.

Our study reveals that mesh model not only improves the classification performance, but also creates a brain graph, where the nodes represent homogeneous super voxels with a better representation power for brain decoding. Although the performance increase looks relatively small, when the large size of the data set is considered, the performance boost becomes quite meaningful.

In the future, experimental set up can be refined for parameter selection.