1 Introduction

Intracerebral hemorrhage (ICH) is a form of brain stroke which is associated with high mortality and morbidity [1, 16]. Most of the patients who survive a hemorrhagic stroke develop long-term disabilities as a result of the compression of the brain tissues around the affected region, caused by the edema [22]. Radiological imaging like Computed Tomography (CT) is typically used for diagnosis, treatment planning, and prognosis monitoring of ICH patients. Traditionally, radiologists visualize the hematoma by manual delineating on the CT scan and estimate its initial volume, which is used for predicting mortality and functional outcome of the patient. The lengthy process of manually delineating associated with inter-rater variability and the need for highly trained radiologists at all times forms the limitations of this traditional process. In order to carry out a precise quantitative analysis of the hematoma, it is important to have accurate automated segmentation.

Recently, deep learning based automated segmentation approaches have gained momentum, as they possess the ability to perform complex tasks at a very fast rate and with high accuracy similar to the human specialist. Some recent examples include brain tumor segmentation [11], ischemic lesion segmentation [5], lung tumor segmentation [12], cardiac segmentation [25] and pancreas segmentation [4]. In particular, automated hemorrhage (stroke lesion) segmentation has received increasing attention in stroke management by dealing with a vast amount of data and supporting clinician to take numerous complex decisions. Choi et al. [8] propose an ensemble of deep neural networks for automated prognosis of post-treatment ischemic stroke. To overcome the computational burden of 3D Ischemic MRI scan, Kamnitsas et al. [17] devise 3D CNN with dense training scheme of adjacent image patches into one pass while automatically adapting to the inherent class imbalance. Chen et al. [4] exploit two CNN consists of DeconvNets [21] and a multi-scale convolutional label evaluation net to segment acute ischemic lesion from diffusion-weighted MR imaging (DWI). All these models try to achieve state of art performance by utilizing different architectures of 2D, 3D and dual path CNN with handcrafted features and CRF as post-processing. However, none of these methods have been applied on CT scans for intracerebral hemorrhage (ICH) segmentation. On the other hand, RADnet [9] uses recurrent attention DenseNet [14] with LSTM to segment and classify brain hemorrhage from CT scans but in the case of traumatic brain injury (TBI).

In this paper, we propose a novel deep learning model (ICHNet) with a brain mask training scheme to segment intracerebral hemorrhage (ICH). In Pixel-level segmentation with the convolutional predictor (for example CNN), stochastic gradient descent (SGD) considers the training data independently and predicts each pixel separately [6]. Besides, max-pooling and striding create spatial insensitivity in the higher layer which limits spatial accuracy in pixel-wise segmentation. To minimize this problem, Hariharan et al. [10] extract features of the same pixels from multiple layers and form a vector called “Hypercolumns”. Bansal et al. [3] randomly sample a moderate number of pixels in the training phase to ensure memory bound and reduce overfitting due to feature correlation of spatially-neighboring pixels. PixelNet [2] exploits hypercolumns techniques [10] and random sampling [3] to form hypercolumn descriptor for a sampled pixel from multiple convolutional layers. Subsequently, Islam et al. [13] utilize multi-modal PixelNet to segment brain tumor from MRI scan and achieve state of the art performance. DeepLab [6], PSPNet [27], and ICNet [26] adopt ‘atrous convolution’ [7] to explicitly control the resolution and incorporate larger context without increasing the number of parameters or the amount of computation. Our current work is inspired by multi-modal PixelNet [2, 13] and atrous convolution [6, 7, 26, 27] to design a computationally efficient and state of art learning model for intracerebral hemorrhage (ICH) segmentation. The most significant contributions of our work are mainly in four aspects: (1) To our knowledge, this is the first work for automated intracerebral hemorrhage (ICH) segmentation from CT scans using deep learning; (2) Proposed model can train only by sampling a modest number of pixels from within the brain region, whereas conventional deep learning approaches use whole image or image patch including background. As it can ignore background and padding of the images from learning, so the model converges faster with better prediction rate; (3) Class imbalance in training dataset leads to a bias towards certain classes in the convolutional prediction. We deal with this problem by sampling an equal number of pixels for each class; (4) Comparing to multi-modal PixelNet [13], we adopt atrous convolution layer and dice loss layer for prediction and also 3D CRF and largest component analysis as post-processing.

Fig. 1.
figure 1

ICHNet architecture. Hypercolumn features are extracted from multiple convolutional layers and feed to an MLP for prediction of hematoma segmentation.

2 Proposed Method

Our proposed model (Fig. 1) samples diverse pixels from a ROI (brain region) and constructs hypercolumn (hp) from multiscale convolutional and atrous convolutional layer features as in past work [2, 10, 13]. It contains total 15 convolutional layers where first 13 layers (\(c_{i,j}\)) similar to convolutional part of VGG-16 [23] (Convolution, ReLU, Pooling) and last 2 convolutional filters (\(c_i\)) followed by [19]. We integrate atrous convolution (ac) according to PSPNet [27]. To predict pixel-wise segmentation from hypercolumn features, we utilize a multi-layer perceptron (MLP) with 3 fully connected (fc) layers of size 4096 followed by ReLU activation functions. The convolutional and fully-connected layers of our architecture can be denoted as {\(c_{11}\), \(c_{12}\), \(c_{21}\), \(c_{22}\), \(c_{31}\), \(c_{32}\), \(c_{33}\), \(c_{41}\), \(ac_{42}\), \(c_{43}\), \(c_{51}\), \(ac_{52}\), \(c_{53}\), \(c_6\), \(c_7\), \(h_p\), \(fc_1\), \(fc_2\), \(fc_3\)}. Hypercolumn features are extracted from 6 convolutional layers of {\(c_{12}\), \(c_{22}\), \(c_{33}\), \(c_{43}\), \(c_{53}\), \(c_7\)}. As our model can learn inside predefined ROI, so we can denote hypercolumn as \(h_{p\_ROI}\), where \(p\_ROI\) is a random pixel inside ROI. Therefore, we can formulate the hypercolumn as:

$$\begin{aligned} h_{p\_ROI} = [c_{1(p\_ROI)}, c_{2(p\_ROI)}, ..., c_{M(p\_ROI)} ], \end{aligned}$$
(1)

where \(c_{i(p\_ROI)}\) denotes the feature vectors of the \(p\_{ROI}\) pixel from \(i^{th}\) convolutional layer. The main focus in our model is an extra layer called ‘pixels’ which carry the coordinates of the pixels we want to train. Due to this layer, it has the freedom to choose random pixels inside ROI. It can also select an equal number of pixels from each class which helps to overcome data skewness problem. If N is the number of sample pixels and there are K classes in our dataset then we choose N/K pixels from each class to from hypercolumn. We also adopt Dice loss function similar to [20] to overcome class imbalance problem.

3 Experiments

3.1 Dataset

The study cohort consists of CT scans of 89 patients with ICH from the Singapore General Hospital, aged 62.0 years (SD = 14.0), of whom 54 were men. Ethics approval was obtained from the SingHealth Centralized Institutional Review Board. The dataset also consists of annotations for the hematoma region marked by two blinded assessors from the neurosurgery department. Disagreements regarding the annotations were resolved with discussion with a third assessor for final consensus. The segmentation contours delineating hemorrhagic region with pixel label 1 and healthy tissue and background considered as pixel label 0. Finally, all scans are then resampled to isotropic 1 mm\(^3\) resolution, skull-stripped as [24] and normalized to intensity range [0–255].

3.2 Training

As our model is capable of training with predefined pixels, the dataset is not resized in order to prevent shape and contextual information loss. However, we apply depth slicing along the axial plane and add padding to upsample all the slices to a common size of \(250 \times 250\) for convolutional filters. The slices are augmented by randomly flipping the image horizontally, in order to make the model more generalized. We observe that slices with less than 2000 pixels of the brain region have no significance in training. Hence, we remove these slices along with all the blank slices. We sample 2000 pixels (N) per slice to extract multiscale convolutional features and form hypercolumn for MLP prediction during the training phase. To minimize class skewness, we randomly choose an equal number of pixels (1000 pixels per class) from each class. However, some slices do not exist hemorrhagic region, in which case, we sample 2000 (N) pixels randomly. In the testing phase, all the pixels from within the brain region were selected to form hypercolumn for MLP prediction.

The model is trained using stochastic gradient descent with a mini-batch size of 5. Its parameters are initialized by a pre-trained VGG-16 [23] model with learning rate 0.001 and momentum 0.9. Our model is implemented using a modified version of deep learning platform [6] based on CAFFE framework [15]. The time taken to train the model is around 30 h for 40 epochs on a single Nvidia GPU 1080Ti GPU.

Table 1. 5-fold cross validation result of Dice coefficients and Hausdorff distances for ICHNet.

3.3 Post-processing

To remove small spurious false positives and to smoothen the predicted segmentation, we utilize 3D fully connected Conditional Random Field (CRF) with Gaussian edge potentials as proposed in [18, 19]. As the unary part of the CRF, we provide probability map generated from the softmax layer in prediction time. CRF regularizes the overall volume of the hemorrhage lesion leaving the internal structure of the lesion mostly intact. Further, remaining 3D-connected regions smaller than 1000 voxels are removed by using connected component analysis.

3.4 Results

To assess the performance of the model comprehensively, Dice coefficient, Hausdorff distance, Sensitivity, and Specificity are computed and presented in Table 1. The model evaluation is done using 5-fold cross-validation. Hence, the average value was considered. For example, the maximum dice accuracy obtained is 89.05%. However, the average value considering all folds is 87.60%. Our model also achieves average Hausdorff distance and sensitivity values of 11.76 and 91.51 respectively. The specificity obtained is almost 100% in all the cases. Table 2 shows the comparison of performance and computational efficiency of our model with other similar approaches. For a fair comparison, we consider the segmentation accuracy from the best trained model of the corresponding architecture. The same pre-processing and post-processing techniques are applied to all the different architectures compared. One of the most important observations is that our model requires almost half the time and the number of epochs to converge, as compared to the multi-modal PixelNet [13] and PSPNet [27]. It also achieves the best Dice coefficient compared to all other methods. However, PSPNet [27], one of the best performing techniques in computer vision application, obtains highest Hausdorff distance in this experiment. Figure 2 represents some predicted segmentations for our model with multi-modal PixelNet [13].

Fig. 2.
figure 2

Prediction of ICHNet (ours) and comparison with similar models. In segmentation, hemorrhage region denotes by red color. (Color figure online)

Table 2. Performance and computational efficiency comparison of our model and similar model multi-modal PixelNet [13] and PSPNet [27].

4 Discussion and Conclusion

We present a deep learning-based model: ICHNet, which predicts intracerebral hemorrhage (ICH) segmentation comparable to radiologists. In medical imaging, the anatomy of interest occupies only a very small region of the scan, which makes the prediction of the model strongly biased towards the background. ICHNet has the ability to train only using pixels obtained from within the brain region, which improves the optimization time and segmentation performance. Another advantage of using ICHNet is that it minimizes further skewness of the data by utilizing Dice coefficient as the objective function. Deep learning models with medical imaging applications using MRI consist of multi-channel (modalities) training data, unlike in the case of CT which comprises only one channel (therefore, lesser contextual information). This makes it challenging for the model to distinguish between the healthy region and the hemorrhage lesion. Future work incorporates building a 3D ICHNet model for different medical applications using MRI or CT data.