Keywords

1 Introduction

Fluid intelligence is a core factor of general intelligence. The rate at which skills and knowledge, i.e. crystallized intelligence, are acquired depends upon it. Thus, there is great interest in determining the extent to which fluid intelligence can be determined from brain measures.

In this study, we used a supervised learning model to automatically predict fluid intelligence scores, i.e. fluid intelligence with demographic confounding factors removed, based on T1-weighted Magnetic Resonance Images (MRIs) at 3T. More specifically, this report describes our submission to the the ABCD Neurocognitive Prediction Challenge (ABCD-NP-Challenge 2019)Footnote 1. Data were obtained from the NIMH Data Archive (NDA) databaseFootnote 2, generated by the Adolescent Brain Cognitive Development (ABCD) study, the largest long-term study of brain development and child health in the United States. The feature set used for the prediction model included sociodemographic (age, gender) and MRI-derived measures. Our MRI-derived features included regionally averaged cortical thickness and white/grey contrast measures in addition to the volumes of a set of regions of interest provided by the challenge organizers. Relying on this feature set, we create a supervised regression framework utilizing a four-layer fully-connected neural network (FNN) to predict fluid intelligence scores.

2 Material

2.1 Training and Test Data

T1-weighted MR images, volumetric measures, age, gender, scanner and fluid intelligence scores were available to the ABCD challenge participants via National Database for Autism Research (NDAR) website. A primary consideration in measuring general intelligence is the role of fluid intelligence [5] which was measured via the NIH Toolbox Neurocognition battery [2] and from which demographic factors (e.g., sex and age) are eliminated to remove the effect of confounding variables. For this residualization, the challenge organizers used all subjects without any missing values in the data collection site, sociodemographic variables and brain volume to build a linear regression model. This model was constructed with fluid intelligence as the independent variable and the other attributes as dependent variables. The residuals computed for all subjects provided the fluid intelligence scores to be predicted.

From a total of 8553 individual subjects, fluid intelligence scores were provided for participants in the training set (3736 subjects) and validation set (415 subjects), whereas the other subjects (4402 subjects) formed the test set. As explained in Sect. 2.2, few of these subjects could not be processed through CIVET pipeline, hence they were not used to predict the fluid intelligence scores. The age and gender characteristics of the 8347 subjects across scanners from three different manufacturers used in this work are presented in Table 1. There was no significant difference in age and gender between different scanners.

The FNN was trained with a set of 3568 subjects and validated during the training process on the validation set consisting of 396 subjects. Afterwards, the proposed method was used to generate the predictions of the fluid intelligence scores of the 4383 subjects in the test set.

Table 1. The subject characteristic of this work. The ages of subjects ranged from 107 to 133 months.

2.2 Image Pre-processing

Volume measures, provided by the competition organizers, were derived from the T1-weighted images as follows: the Minimal Processing pipeline [8] transformed the raw data into NIfTI format. Afterwards, the NCANDA pipeline [16] defined a brain mask by non-linearly mapping the SRI24 atlas [17] to the T1-weighted images, and it removed noise and corrected for bias-field inhomogeneities. Several skull-stripping methods were used with bias-field corrected and non-bias-field corrected images, and a majority voting of the resulting masks refined the brain masks from the previous step. The skull-stripped brains were newly corrected for bias-field inhomogeneities and segmented into gray matter, white matter and cerebrospinal fluid via Atropos [4]. The SRI24 atlas was non-rigidly registered to the images via ANTS [3] to further parcellate the gray matter and the resulting segmentations were linearly registered to the SRI24 atlas. Finally, results that failed to pass a visual two-tier quality check were rejected.

In addition to volume measures, we used cortical thickness and cortical white/gray contrast measures that were regionally averaged based on the Automated Anatomical Labeling (AAL) atlas [21]. For this, the T1-weighted volumes were denoised [15] and processed with CIVET (version 2.1 ; 2016), a fully automated structural image analysis pipeline developed at the Montreal Neurological InstituteFootnote 3. CIVET corrects intensity non-uniformities using N3 [18]; aligns the input volumes to the Talairach-like ICBM-152-nl template [7]; classifies the image into white matter, gray matter, cerebrospinal fluid, and background [20, 22]; extracts the white-matter and pial surfaces [11]; and maps these to a common surface template [14].

Cortical thickness was measured in native space at 81924 vertices using the Laplacian distance between the two surfaces. The Laplacian distance is the length of the path between the gray and white surfaces following the tangent vectors of the cortex represented as a Laplacian field [10]. The CT measures were averaged into 78 regional measures relying on the AAL atlas.

To extract the white/gray contrast measures, similarly to [13], the intensity on the T1-weighted MRI was sampled 1 mm inside and 1 mm outside of the white surface, and the ratio of the two measures was formed. Here we used a highly simplified version of the algorithm of [13] and generated supra-white and sub-white surfaces relying on the surface normals provided by the CIVET pipeline. The intensity values on the T1-weighted image (without non-uniformity correction or normalization) were sampled at each vertex of both the supra-white surface and the sub-white surface, and the ratio was formed by dividing the value at each vertex of the sub-white surface by the value at the corresponding vertex of the supra-white surface. Similarly to CT measures, the contrast measures were averaged into 78 regional measures relying on the AAL atlas. The white/gray contrast measures are sensitive to scanner-specific differences in tissue contrast [13], so to correct for this, we normalized the contrast values per scanner manufacturer by z-scoring the contrast values scanner manufacturer-wise as explained in detail in [13].

The CIVET pipeline failed to process 168 subjects from the training set, 19 subjects from the validation set and 19 subjects from the test set most likely due to motion artifacts and/or excessive noise interfering with registration and segmentation. Consequently, a different model trained exclusively on the provided volumetric and sociodemographic data was used to infer the fluid intelligence score of the validation subjects whose derived data could not be produced.

3 Machine Learning Approach

The developed regression model based on artificial neural networks was trained with feature vectors that incorporated 122 volumetric, 78 contrast and 78 CT measures along with gender, age and the scanner manufacturer one-hot encoded. Age and image-derived attributes were normalized feature-wise by subtracting their mean and dividing them by their standard deviation. The network trained was a four-layer FNN. The model was trained using mini-batches of size 24 and Adam [12], a stochastic gradient descent method with an adaptive learning rate starting from \(\eta = 0.00001\). The cost function to minimize was the mean squared error (MSE). After every epoch the model was validated and the training stopped when the MSE was greater than the minimum MSE obtained in the previous iterations plus 0.7. This stopping criteria was empirically found to increase the correlation coefficients of the predictions.

A four-layer FNN was trained by adjusting its parameters using the back-propagation algorithm to minimize the MSE produced between the desired output and the network prediction. The input layer consisted of 283 nodes corresponding to each of the features of the input vector. The two hidden layers have 20 and 15 nodes, respectively. Since predicting fluid intelligence scores is a single-output regression problem, the output layer consisted of a single node. As the network was fully connected, all nodes between successive layers were connected. This configuration provided the best results from the variations that were tried in the limited time available.

The weights of the FNN were randomly initialized as proposed in [9] whereas the bias terms were initialized to zero. An exponential linear unit (ELU) nonlinear activation function [6] was used in all intermediate layers with \(\alpha =1\) such that if \(x > 0\) then \(f(x) = x\), otherwise \(f(x) = exp(x) - 1\). All layers except for the input layer had a dropout of rate 0.5 [19]. The training of the FNN was implemented with TensorFlow [1].

Fig. 1.
figure 1

Comparison of the MSE and correlation among FNNs trained on different feature sets in internal 5-fold cross-validation and validation set experiments. “All” combines volumetric, contrast and thickness measures with age, gender and the scanner manufacturer.

4 Results

Our predictions were generated on the regression model trained on the 3568 subjects for which all image-derived features were obtained. The final submission consisted of 4383 predictions achieving a MSE of 94.0270.

The performance of the model on the validation set and in an internal 5-fold cross-validation test was used to study how different combinations of features contribute to the MSE and correlation (Fig. 1). The correlation here refers to Pearson correlation coefficient between the predicted and actual fluid intelligence scores. Both experiments showed that the combination of all features provided with larger correlation coefficients (Fig. 1b). But, the variability of the MSE produced among different models trained on different set of features was small (Fig. 1a).

The proposed method trained with all features combined achieved a MSE of 81.89 and a correlation of 0.13 in an internal 5-fold cross-validation experiment. The results obtained on the validation set were a MSE of 71.596 and a correlation of 0.151, and its corresponding training loss was 84.28 (Fig. 2(a)).

As depicted in Figs. 2(a) and (b), choosing a small batch size to train the FNN causes large oscillations in the training loss whereas choosing the largest possible batch size leads to a steady decrease of the MSE. Providing with the entire training data set to train a neural network makes more accurate estimates of the gradient directions to minimize the loss over the training data. On the other hand, small batch sizes may need more iterations to converge, but the consequent fluctuations that occur during the training can lead to reaching other local minima with potentially better generalization capabilities. Figures 2(c) and (d) show that when the batch size is 3568 not only the number of required epochs was significantly larger but also the proposed FNN provided with worse MSE and correlation than when the batch size is 24.

Fig. 2.
figure 2

Training loss and validation results of models trained with batch sizes of 24 and 3568 assessed on the validation set. Training loss is presented as moving average using a window size of 51 epochs.

4.1 Computation Time

The four-layer FNN was implemented in Tensorflow (Python). The total running time for training the model using all features was approximately 3 and a half minutes. Generating predictions of the testing set took half a second. The regression model was run on Ubuntu 16.04 with an Intel Xeon W-2125 CPU @ 4.00 GHz processor and 64 GB of memory.

Processing a subject through the CIVET pipeline entailed a computational time of approximately 10 hours on a cluster with Intel Xeon E5-2683 V4 @ 2.1 GHz processors using a single core with 4 GB of memory.

5 Discussion

We presented a method based on artificial neural networks to predict fluid intelligence scores from T1-weighted MR images, age and gender. Contrast and CT measures were additionally derived from the MR images to complement the provided volumetric and sociodemographic data. Training the proposed model with the combination of all image-derived features provided larger correlation coefficients than when it was trained solely on volumetric features. Nonetheless, the overall MSE of the predicted scores did not improve. The selected batch size to train the FNN caused the training loss to oscillate. However, as shown in Sect. 4, it increased the capability of the model to generalize. With this setting, the regression model achieved a MSE of 71.596 and a correlation of 0.151 in the validation set of 415 subjects. Due to the inherent complexity of the regression problem and the incorporation of additional image-derived features, future work is required to explore different architectures and deeper neural network models.