Keywords

1 Introduction

Different studies have demonstrated that the prevalence of abnormalities in midline brain structures in patients with schizophrenia, as well as other mood and psychotic disorders, is increased with respect to healthy patients [1]. Concretely, in this work, we focus on a variant of cerebrospinal fluid (CSF) space formed between the leaflets of the septum pellucidum. This zone is called the cavum septum pellucidum (CSP) (see Fig. 1). During fetal development both laminae tend to fuse in an anterior to posterior manner between the third and the sixth month of life, when these laminae do not fuse completely, the space in-between both laminae is the CSP. The importance of this finding can help to diagnose and study these disorders in terms of neurodevelopmental etiology. In other words, the study of this abnormalities could help to investigate the causation or origination of these pathologies, like schizophrenia or bipolar disorders.

Fig. 1.
figure 1

Cavum Septum Pellucidum (CSP) seen from the anterior section.

The presence of a CSP with a length from anterior to posterior part of the brain of around 1–1.4 mm is considered normal anatomy of the brain, with an incidence of 60–80% [2], so a larger cavity is considered as an abnormality.

Manually detecting this kind of abnormalities is a tedious task, and defining accurately the depth of this cavity can be quite controversial. Thus, a fully-automated method would help researchers and clinical professionals to study these abnormalities in detail. Up to our knowledge, there is no work in the literature regarding this issue. In this work, we propose different deep and shallow machine learning methods to detect CSP slide-by-slide in Magnetic Resonance Images (MRI). In particular, we implement two Convolutional Neural Networks (CNNs) models with different architectures and a variant adding contextual information in the input to the network. We also use different classical shallow machine learning classification algorithms with different dimensionality reduction techniques, in order to compare the performances.

2 Data and Pre-processing

2.1 Data

In this study, the data consists in 861 subjects, from which 639 are patients with schizophrenia, bipolar disorder and other psychotic or mood disorders, and 223 are healthy controls. We use T1-weighted MRI. All subjects had been scanned with a 1.5 Tesla GE Signa scanner (General Electric Medical Systems) located at Sant Joan de Déu Hospital in Barcelona.

A ground truth was created specifically for this study which consisted in 888 slices with CSP corresponding to 213 subjects and 26510 slices from subjects without CSP. As can be noted, this is a highly unbalanced problem.

A second dataset has been used to test the methods in a completely new dataset, from a different site and different scanner.

This second dataset consisted in 500 slices of 10 subjects from OASIS (Open Access Series of Imaging Studies) projectFootnote 1.

2.2 Pre-processing

Given the small size of the CSP with respect to the whole MRI image, it is useful to define a pre-processing pipeline in order to extract a Region of Interest (ROI) containing the abnormality and discard the areas not useful for the task.

All MRI volumes are pre-processed using the same pipeline automatically. The main goal is to obtain an image that will have the posterior genus at the same coordinate for each pre-processed image, which is supposed to be the first slice of the CSP. The pipeline contains the following steps:

  • Skull-stripping: We remove the skull using FSL-BET [7] (Brain Extraction Tool) with a fractional intensity threshold of 0.4. This is a necessary preparation step for the registration and segmentation steps using FSL guidelinesFootnote 2.

  • Registration: We transform all images in order that they share a common coordinate system using FLIRT [8] (FMRIB’s Linear Image Registration Tool).

  • Segmentation of brain tissues and spatial intensity correction: We segment the different tissues (white matter, grey matter and CSF). We first use white matter segmentation to find the posterior genus. This structure indicates the beginning of the CSP, and according to our clinical team, should not be deeper than 50 slices. Finally, we apply FSL-FASTFootnote 3 which uses a hidden Markov random field and an associated expectation-maximization algorithm to correct for spatial intensity variations caused by RF inhomogeneities.

  • Region of Interest definition: Given the first slide, we define a volume of \(50\times 30\times 30\) voxels beginning at the posterior genus and defining 50 slices from the anterior to the posterior part of the brain, where the CSP should appear.

  • Intensity normalization: The standardization of a dataset is a common requirement for many machine learning estimators. Many estimators expect individual features to look like standard normally distributed data. Thus we normalize the intensities per patient to be all within the same range.

  • Slicing to 2D images: We extract 2D \(50\times 30\) coronal slices.

Finally, to increase the number of samples of this CSP class (under-represented), we flip the image in both horizontal and vertical axes. This data augmentation procedure was consensuated with clinical experts from FIDMAG Research FoundationFootnote 4.

3 Methods

Given the 2D ROI images we propose to classify them as containing CSP or not using different machine learning methods.

3.1 Shallow Machine Learning Methods

Shallow machine learning methods generally use a first step to define features from the images, and before applying the classification algorithm. Next, we detail the dimensionality reduction and classification algorithms used in this work.

In order to reduce the dimensionality of the data, we consider two different approaches, feature selection and feature extraction. The former, selects the more important features of the image, and the latter builds derived features to better classify data.

We apply Extremely Randomized Trees (extra-trees) classifier [6] as a feature selection algorithm. In order to extract meaningful features we use Principal Component Analysis (PCA) and Histograms of Oriented Gradients (HOG).

We compare the performance of the following four shallow classification algorithms, by using the features extracted with both PCA and HOG: K Nearest Neighbors (KNN), Linear Support Vector Classifier (Linear SVC), Random Forests and Linear Discriminant Analysis (LDA).

We implement these methods using scikit-learnFootnote 5 (Python), opencvFootnote 6 and nilearnFootnote 7.

3.2 Convolutional Neural Networks Models

Deep machine learning methods directly learn the features from the images. In this study, we consider two different 2D CNN models which receive RAW ROIs as input instead of features extracted from the images [5].

We explore two different approaches, illustrated on Figs. 2 and 3. The first model stacks two convolution layers with a \(3\times 3\) kernel at the beginning (see Fig. 2), while the second one only implements one convolutional layer (see Fig. 3). Both models differ also in an extra dropout layer implemented before the last fully connected layer on the first CNN.

Fig. 2.
figure 2

First CNN model

Fig. 3.
figure 3

The second CNN model with 3 levels of zoom as input.

Note that the definition of the field of view in the ROI can be decisive for the classifier performance because of the large size of the brain, when comparing to the small size of the region we are interested in. Thus, the definition of a ROI that reduces the quantity of unnecessary information could benefit our method performance [9].

In order to learn from contextual information of the CSP, we propose to change the input of the CNN model by adding three different channels, containing three different levels of zooming of the ROI, see Fig. 4.

We expect that this information can help to better detect the CSP.

In addition, we test to add these three different levels of zooming of the ROI as three different layers, and join them in a merge layer before the first convolutional layer of the model. We only extend the second model, since in our experimental analysis, it was the best performing method and we pretended the contextual information to improve only the best CNN model. Figure 3 shows the architecture of the second model with 3 levels of zoom as input.

We implemented the different models using the framework KerasFootnote 8 (in Python).

4 Experimental Results

In this section we present the experimental results of the classification of the 2D images as containing CSP or not using the proposed machine learning methods.

In Table 1, we compare the results of the two CNN models, the variant of the second model, and different common machine learning algorithms. As can be seen, the results of the three CNN models are equivalent, although the last variant gives slightly better sensitivity and specificity (Table 2).

The best model is CNN 2 with 3 zoom levels as channels. This means that adding contextual information is important to improve the performance of the classifier.

Fig. 4.
figure 4

Example of three levels of zoom used in second CNN model.

Table 1. Results on first dataset. Images from same scanner.
Table 2. Results on second dataset. Images from different scanner.
Fig. 5.
figure 5

(A) Top: Qualitative results of KNN. (B) Down: Qualitative results of best CNN

In Fig. 5, we show qualitative results of True Positives (TP), True Negatives (TN), False Positives (FP) and False Negatives (FN) for the KNN method and the best CNN method, respectively. As it can be seen in the examples, the definition of the CSP is not a simple task due to the fact that the boundaries are difficult to discriminate.

Our CNN method is able to detect the presence of CSP from front to rear, in axial projection of a MRI image. These approaches can classify almost perfectly slices where the CSP can be easily seen by an expert, like in the two first images in Fig. 5A. The model failed on certain small or thick CSP, concretely at first or last slices of the CSP, like the all images in column false positives in Fig. 5A.

Note that KNN and the best CNN approach reached 99% of accuracy on the first dataset, however they differ in the type of errors they make. CNNs tend to fail on images where the CSP is difficult to identify, like in the image in last column second row from Fig. 5B; whereas KNN seems to fail also on images easier to identify, like in third image in first and last rows in Fig. 5B. Despite having high accuracies, KNN method gives some false positives and false negatives in axial slices of the brain volume where it is anatomically impossible that CSP appears. Moreover, KNN, unlike CNNs, is highly dependent on the preprocessing (see Table 1, where KNN accuracy decreased 19% with respect to the results obtained on first dataset, whereas CNN only decreased 4%) and it is sensible to noise and local structure of the data.

Fig. 6.
figure 6

First and last slice of CSP predicted vs. real in a regression plot. Variance of 0.8.

The validity of the model can be seen in Fig. 6, where we used 90 subjects to compare the results of our model with the results manually annotated by an expert. We annotated both the first and last slice of the CSP and placed then in a graphic comparing the predicted ones with the result expected. A perfect model prediction would place all slices over the diagonal line, and except from a few outliers (false positives or false negatives), according to Fig. 6, the predicted slices are close to the expert manual annotation.

5 Conclusions and Future Work

In this work, we studied, for the first time in the literature, the automatic detection of midline brain abnormalities in MRI slide-by-slide. For this purpose, we compared three different approaches based on CNNs with other classical shallow machine learning methods. KNN and the best CNN approach reached 99% of accuracy when training and validation images come from the same scanner, but KNN decrease the accuracy up to a 80.1% when classifying images from a different site, while CNN only decreased 4% in this scenario.

Up to our knowledge, there are no studies, in the literature, with such an amount of patients and studying the relation of this volume with mental disorders. In a further study, we want to face the problem by using 3D segmentation algorithms, such as the one in [3].

We expect the use of a 3D model will help to take into account spatial coherence information that can give robustness to results. We also plan to face the segmentation problem to obtain information of the volume of the CSP. For that, we have to create a new ground truth of a subset of the data with manually delineated borders and we will consider fully-convolutional networks such as the one presented in [4]. Finally, we will integrate the best methodology to a public software to let researchers and clinical professionals conduct their own studies and incorporate this information in a translational manner to their patients.