Keywords

1 Introduction

Autism spectrum disorder (ASD) is a neuro-developmental disorder, and it is associated with early-emerging social and communication impairments, in addition to rigid and repetitive patterns of behavior and interests [1, 2]. Although there are no well established causative factors explaining ASD [3], alterations in functional activity patterns in brain are believed to play important role in explaining and diagnosing ASD [4]. One of the most widely used methods to measure the brain activity is the functional MRI (fMRI) [5, 6]. The fMRI is categorized into two main types: resting state fMRI (RfMRI) and task based MRI. In the RfMRI, patients are not asked to do any task, they are only asked to stay awake with open eyes and not to move in the scanner, and Blood Oxygenated Level Dependant (BOLD) signal is recorded overtime [7]. In the task based experiment patients are asked to complete a certain task and the spontaneous low-frequency fluctuations in the BOLD signal in response to this task is recorded [8]. The main purpose of the RfMRI studies is to identify the alternations in the functional connectivity patterns between two groups (i.e., patients VS controls), and it is widely reported in the literature that ASD is associated with such alteration [9, 10].

A wide range of studies were concerned about reporting statistical group differences between autistic subjects (ASDs) and typically developed subjects (TDs). In [9], the default mode network activity alternation was examined between 16 ASDs and 15 TDs, where ASDs showed weaker connectivity and this weaker connectivity was positively correlated with ASD main impairments, poorer social skills and increases in restricted and repetitive behaviors and interests, while stronger connectivity in multiple areas were correlated with communication. A more recent study in [10], used a dataset of 84 ASDs (42 males/42 females) and 150 TDs (75 males/75 females). Males with ASD showed patterns of hypo-connectivity, while females with ASD showed hyper-connectivity.

In addition to the importance of reporting group differences between the two groups -ASDs and TDs-, it is also important to try utilizing the RfMRI in autism diagnosis. Although it is a promising direction and initially yielded good results, only few studies were conducted to try diagnosing autism using RfMRI. Two recent examples of these studies are found in [11] and [12]. The main goal in [11] is to identify connectivity networks that are correlated with ASD symptoms severity. In [11], a balanced dataset of 20 ASD subjects and 20 TD subjects was used. Individual salience network maps were created per subject and those maps yielded classification accuracy of 78% between ASDs and TDs, in addition, the salience network was related to the symptoms of restricted and repetitive behaviors with Pearson correlation coefficient, \(R^2 =0.36\) and \(P =0.07\). Another approach for diagnosing using RfMRI was reported in [12], where a longitudinal study used a dataset of 59 subjects at age of 6 months with high familial risk for ASD. The reported functional connectivity at age of 6 months matched the behavioral scores at age of 24 months which demonstrates the power of RfMRI in early ASD diagnosis.

In this study, we are introducing a large scale personalized diagnosis system for ASD. We demonstrate the power of our approach in identifying the symptoms severity by correlating our personalized maps finding with Autism Diagnostic Observation Schedule (ADOS). To build a system with high generalization capability and with no need to run the RfMRI analysis (which is time consuming) for each new unseen subject, we divided the system into two phases, (i) defining areas of statistical significance using small subset of the data and (ii) building a deep learning based system that utilizes those areas in diagnosing unseen subjects and define neuro-circuits of highest correlation with ADOS scores. More details about the approach are discussed in the next sections.

2 Materials and Methods

In this study 156 subjects obtained from National Database of Autism Research (NDAR: http://ndar.nih.gov) are used. The dataset is selected to be balanced (78 ASDs and 78 TDs). The used data are obtained from a single study done at George Washington University [13]. All neuroimages were produced by a Siemens Magnetom TrioTim with a 3 T magnet with TR = 2 s, TE = 30 ms and flip angle 90\(^\circ \), to produce images with 3 mm pixel spacing and 4 mm slice spacing. Time to acquire 33 coronal slices spanning the entire brain was 2.01 s. The resting state data were recorded for approximately 6 min. All ASD subjects in the study have ADOS reports. The experiment in this study is divided into 3 main steps explained in details in the next subsections.

2.1 Step 1: RfMRI Preprocessing

Prior to RfMRI analysis, there are several preprocessing steps [14]. In this study, the preprocessing was done using SPM12 (http://www.fil.ion.ucl.ac.uk/spm) toolbox. The preprocessing pipeline includes four main steps: (i) image realignment for motion correction, (ii) slice timing correction to overcome the effect of capturing samples at different times, (iii) image normalization to standard MNI152 space resampled every 2 mm, and finally (iv) Gaussian spatial smoothing using filter with full-width half-maximum of 6 mm.

2.2 Step 2: RfMRI Analysis and ROIs Extraction

In this step, a balanced subset of 40 subjects (20 ASDs and 20 TDs) are used to obtain the original regions of interest (ROIs). The ROIs were defined as the output components from the Independent Component Analysis (ICA) that showed statistical significance. To extract the ROIs of these 40 subjects, group ICA is used [15]. In the group ICA, time courses of the 40 subjects are concatenated and then decomposed into two matrices (i) group time course multiplied by (ii) group independent spatial components. To extract subject-specific time course and subject-specific spatial components, dual regression algorithm is used.

The intuition behind using ICA is the analogy between the RfMRI analysis and the blind source separation (BSS) problem [16], where it is required to recover set of statistically independent components with minimal error. The BSS problem can be formulated as:

$$\begin{aligned} x_i(t)=As_i(t)+\mu +\eta _i(t) \end{aligned}$$
(1)

Where \(x_i\) is the BOLD signal measured over time, \(s_i\) is the non-Gaussian source signal, \(\eta _i \sim N(0,\sigma ^2\sum _i)\), A is the mixing matrix, \(\mu \) is the mean of observations \(x_i\) and i is the an index over voxel space To solve this BSS problem it is required to find the unmixing matrix W such that

$$\begin{aligned} \hat{s}_i=Wx_i \end{aligned}$$
(2)

is a close approximation of the original measured signal. To estimate the unmixing matrix, the it is needed to optimize the rotation matrix Q in the whitened observations space:

$$\begin{aligned} \hat{s}=Wx=Q\tilde{x} \end{aligned}$$
(3)

Where:

$$\begin{aligned} \tilde{x}=(\varLambda _q-\sigma ^2Iq)^{-1/2}U^t_qx \end{aligned}$$
(4)

are the whitened observations, and \(U_q\) and \(\varLambda _q\) are the first q eigen-values of U and \(\varLambda \), U and \(\varLambda \) are the Singular Value decomposition matrices of the observations X and Q is qxq rotation matrix. To solve for the unmixing matrix, and based on the non-Gaussian sources constraint, the algorithm described in [17] is used, where the individual sources are calculated by projecting the the observations x onto the rows of Q, thus the \(r^{th}\) source is given by the iterative algorithm described in [17]:

$$\begin{aligned} \hat{q_r^t} \leftarrow \langle (x F'(\hat{s_r})- F''(\hat{s_r})\rangle \hat{q_r}\rangle \end{aligned}$$
(5)

where \(\hat{q_r}\) is the \(r^{th}\) row of Q and F is general nonquadratic function and \(F'\) is the derivative of F. To obtain all the rows of Q, this iterative approach is run for q times. For more mathematical details about the solution finding, uniqueness, correctness, and model order, the reader is referred to [15].

Fig. 1.
figure 1

The analysis of 4D-RfMRI data into group spatial components and group time course, then using dual regression to obtain subject specific time courses and subject specific spatial components

After completing the probabilistic ICA analysis of the group concatenated subjects, the output group spatial maps act as a set of spatial regressors and a General Linear Model (GLM) is used to find each individual subject time course. The variance normalized time courses are then fed to another GLM to obtain the subject specific spatial maps [18].

To get the significant ROIs from the output individual spatial maps, a non-parametric permutation test is used with two conditions, (i) ASD > TD and (ii) TD > ASD and 5000 permutations [19]. The output P values are then corrected using Bonferroni correction. Using \(\alpha \) level of 0.05 and after the Bonferroni correction, 34 components were found to be significant. The analysis in this study was performed using FSL5.0 package (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/). The RfMRI analysis and ROIs extraction are illustrated in Fig. 1.

2.3 Step 3: Personalized Diagnosis for New Unseen Subjects

With the ROIs having been obtained in the previous step using a subset of 40 subjects, there is no need to repeat the entire pipeline analysis. Our proposed approach is to do the preprocessing steps then feed the new unseen subject to the dual regression to extract both spatial and time components. With this approach it is more feasible and more time efficient. In this study, the time courses and spatial maps of 116 subjects were obtained using the dual regression. These 116 subjects are then used in the personalized diagnosis system.

In the literature, the features used were the individual subjects spatial maps as in [11], but the major drawback of this approach is having number of features much greater than the number of subjects which increases the possibility of having under-determined system and increases the learning difficulty [11]. To overcome this problem in our study we used features derived from the individual temporal components. We used the power spectral density (PSD) of the individual time courses, and the reason behind this selection is the time shift invariance in the PSD. Another advantage in the PSD is the symmetry that allows use of only one half of the signal which increases the amount of features significantly.

Fig. 2.
figure 2

The personalized diagnosis pipeline. The subject specific time course is used to calculate PSD. PSDs are then feed to the deep learning based classification system and the output probabilities are then used for global diagnosis and correlation with ADOS reports

As shown in Fig. 2, for each of the 34 ROI, half the signal is fed to a deep learning based classification system which consist of an autoencoder for dimensionality reduction followed by a neural network for classification. To fine tune the model hyper-parameters, i.e., the sparsity proportion, the sparsity regulation, L2 regularization and number autoencoder hidden layers were obtained using a random heuristic search. The searching range for the sparsity proportion is defined between 0.05 and 0.9, for the sparsity reactualization is between 1 and 20, for the L2 regularization is between \(10^{-3}\) and \(10^{-6}\) and the number of hidden layers is between 10 and 100. The neural network used for classification is a single layer neural network with 10 hidden neurons. The global subject diagnosis is based on a winner-takes-all approach, where the components probabilities are used for voting and the winner class is considered as the global diagnostic decision.

To assess the statistical significance of the classifier, and to ensure its robustness, the significant areas were tested using bootstrapping. The labels of ASD and TD groups were randomly shuffled for 99 times and the classifier was trained and tested using LOSO with this completely uninformative data to ensure random performance in response to randomly shuffled labels.

In order to be able to correlate the significant areas of high accuracies with actual behavioral reports, an atlas defining 34 resting state networks is used. The used atlas is built using 4 local atlases having total of 34 cortical areas and these 4 local atlases are:

  1. 1.

    Parietal cortex atlas [20]: This local atlas, defines both functional connectivity and anatomical connectivity on both humans and macaques. It divides the the parietal cortex into 10 components.

  2. 2.

    Temporoparietal junction (TPJ) [21]: In this local atlas, TPJ is divided into 2 components: (i) anterior TPJ cluster, which showed interaction with ventral prefrontal cortex and anterior insula and (ii) posterior TPJ cluster which showed interaction with posterior cingulate, temporal pole, and anterior medial prefrontal cortex

  3. 3.

    Dorsal frontal cortex [22]: In this local atlas, the human dorsal frontal cortex is parcellated into 10 components They are all between the human inferior frontal sulcus and cingulate cortex

  4. 4.

    Ventral frontal cortex [23]: In this local atlas, the ventral frontal cortex was divided into 11 components, in addition to one more component from ventrolateral frontal pole. The spatial correlation was calculated between the output independent sources and each of the 34 areas defined in the atlas.

3 Experimental Results

ASD and TD subgroups were well-matched with respect to gender and age. Out of 78 ASD subjects, 40 were female (51%), while 43 of the 78 TD subjects were female (55%). The gender imbalance was statistically insignificant (\(\chi ^2 = 0.23\), \(p = 0.63\)). The mean age of ASD subjects was 13.6 years, while the mean age was 12.8 years for the TD group. The age difference is statistically insignificant (\(t = 0.458\), \(p = 0.647\)). In terms of IQ, the groups were less matched,but the difference in mean score is less than one standard deviation. For ASD group, the mean is 102 and the standard deviation is 21.2, while for the TD group, the mean is 111 and the standard deviation is 15.6.

3.1 Significant ROIs Personalized Diagnosis

Out of the 34 components, and after Bonferroni correction, 12 components were found to be statistically significant at \(\alpha =0.01\). These 12 components were then used in building the personalized diagnosis system. Three cross validation techniques are used, 4 folds, 10 folds and leave-one-subject-out. Table 1 shows the accuracies obtained for the 12 component using the 3 validation techniques, and the most correlated atlas area with each component. Also, Fig. 3 shows sample of personalized diagnosis maps for 2 ASDs and 2 TDs. The overall diagnostic accuracy, sensitivity and specificity after the components fusion are found to be 93%, 91% and 94% respectively.

Fig. 3.
figure 3

The personalized maps of 2 ASDs and 2 TDs. It is obvious that more areas are affected in the ASD cases than in TD cases.

Table 1. The Accuracies obtained for the 12 significant components using 4-folds, 10-folds and leave-one-subject-out (LOSO).

The result of bootstrap testing done by creating 99 versions of randomly shuffled label yielded a P value of 0.01 for each component diagnosis accuracies, which shows the statistical significance of the classifiers finding.

Table 2. Mapping between ADOS subscores, Research Domain Criteria (RDoC) neurocircuits, and functional connectivity networks. Anatomical (Brodmann) areas overlapping functional networks are given in parentheses, where Fpl, Cluster 4 is the 4th component of the parietal cortex local atlas and TBJb is the posterior TPJ component.

3.2 Correlation Between Personalized Diagnosis and ADOS Reports

To ensure the relevance of these components to ASD, each component output probability is correlated with the Total ADOS score and ADOS severity score. The Pearson correlation coefficient varies between −0.28 and 0.27 for Brodmann area/brain regions involved in neuro-circuits previously reported to be implicated in ASD. Table 2 shows the highly correlated nuero-circuits with ADOS reports and thier anatomical correspondence. To cross validate the relevance of these regions to ASD, each brain region was 347 was correlated with the Total ADOS score and ADOS severity score.

4 Conclusion, and Future Work

This study demonstrate that RfMRI could enhance both local and global diagnostic accuracy of ASD, with increased ability to predict clinical phenotypes, and potential ability to develop better individualized treatments plan. Specific affected networks could be a biomarker for correlation with specific types of behavioral abnormalities. Future research should focus further on using big data technology to integrate multiple datasets from larger populations and multiple modalities (structural MRI, DTI, etc.) to better understand clinically relevant neuro-biological pathways and assess response to personalized treatments in ASD. In addition, integrating information from multiple data sources such as behavioral reports and genetic profiles to get more insight about components of interest observed on each individual subject would be of great importance.