Abstract
We develop a deep learning approach for automated intracerebral hemorrhage (ICH) segmentation from 3D computed tomography (CT) scans. Our model, ICHNet, evolves by integrating dilated convolution neural network (CNN) with hypercolumn features where a modest number of pixels are sampled and corresponding features from multiple layers are concatenated. Due to freedom of sampling pixels rather than image patch, this model trains within the brain region and ignores the CT background padding. This boosts the convergence time and accuracy by learning only healthy and defected brain tissues. To overcome the class imbalance problem, we sample an equal number of pixels from each class. We also incorporate 3D conditional random field (3D CRF) to smoothen the predicted segmentation as a post-processing step. ICHNet demonstrates 87.6% Dice accuracy in hemorrhage segmentation, that is comparable to radiologists.
You have full access to this open access chapter, Download conference paper PDF
1 Introduction
Intracerebral hemorrhage (ICH) is a form of brain stroke which is associated with high mortality and morbidity [1, 16]. Most of the patients who survive a hemorrhagic stroke develop long-term disabilities as a result of the compression of the brain tissues around the affected region, caused by the edema [22]. Radiological imaging like Computed Tomography (CT) is typically used for diagnosis, treatment planning, and prognosis monitoring of ICH patients. Traditionally, radiologists visualize the hematoma by manual delineating on the CT scan and estimate its initial volume, which is used for predicting mortality and functional outcome of the patient. The lengthy process of manually delineating associated with inter-rater variability and the need for highly trained radiologists at all times forms the limitations of this traditional process. In order to carry out a precise quantitative analysis of the hematoma, it is important to have accurate automated segmentation.
Recently, deep learning based automated segmentation approaches have gained momentum, as they possess the ability to perform complex tasks at a very fast rate and with high accuracy similar to the human specialist. Some recent examples include brain tumor segmentation [11], ischemic lesion segmentation [5], lung tumor segmentation [12], cardiac segmentation [25] and pancreas segmentation [4]. In particular, automated hemorrhage (stroke lesion) segmentation has received increasing attention in stroke management by dealing with a vast amount of data and supporting clinician to take numerous complex decisions. Choi et al. [8] propose an ensemble of deep neural networks for automated prognosis of post-treatment ischemic stroke. To overcome the computational burden of 3D Ischemic MRI scan, Kamnitsas et al. [17] devise 3D CNN with dense training scheme of adjacent image patches into one pass while automatically adapting to the inherent class imbalance. Chen et al. [4] exploit two CNN consists of DeconvNets [21] and a multi-scale convolutional label evaluation net to segment acute ischemic lesion from diffusion-weighted MR imaging (DWI). All these models try to achieve state of art performance by utilizing different architectures of 2D, 3D and dual path CNN with handcrafted features and CRF as post-processing. However, none of these methods have been applied on CT scans for intracerebral hemorrhage (ICH) segmentation. On the other hand, RADnet [9] uses recurrent attention DenseNet [14] with LSTM to segment and classify brain hemorrhage from CT scans but in the case of traumatic brain injury (TBI).
In this paper, we propose a novel deep learning model (ICHNet) with a brain mask training scheme to segment intracerebral hemorrhage (ICH). In Pixel-level segmentation with the convolutional predictor (for example CNN), stochastic gradient descent (SGD) considers the training data independently and predicts each pixel separately [6]. Besides, max-pooling and striding create spatial insensitivity in the higher layer which limits spatial accuracy in pixel-wise segmentation. To minimize this problem, Hariharan et al. [10] extract features of the same pixels from multiple layers and form a vector called “Hypercolumns”. Bansal et al. [3] randomly sample a moderate number of pixels in the training phase to ensure memory bound and reduce overfitting due to feature correlation of spatially-neighboring pixels. PixelNet [2] exploits hypercolumns techniques [10] and random sampling [3] to form hypercolumn descriptor for a sampled pixel from multiple convolutional layers. Subsequently, Islam et al. [13] utilize multi-modal PixelNet to segment brain tumor from MRI scan and achieve state of the art performance. DeepLab [6], PSPNet [27], and ICNet [26] adopt ‘atrous convolution’ [7] to explicitly control the resolution and incorporate larger context without increasing the number of parameters or the amount of computation. Our current work is inspired by multi-modal PixelNet [2, 13] and atrous convolution [6, 7, 26, 27] to design a computationally efficient and state of art learning model for intracerebral hemorrhage (ICH) segmentation. The most significant contributions of our work are mainly in four aspects: (1) To our knowledge, this is the first work for automated intracerebral hemorrhage (ICH) segmentation from CT scans using deep learning; (2) Proposed model can train only by sampling a modest number of pixels from within the brain region, whereas conventional deep learning approaches use whole image or image patch including background. As it can ignore background and padding of the images from learning, so the model converges faster with better prediction rate; (3) Class imbalance in training dataset leads to a bias towards certain classes in the convolutional prediction. We deal with this problem by sampling an equal number of pixels for each class; (4) Comparing to multi-modal PixelNet [13], we adopt atrous convolution layer and dice loss layer for prediction and also 3D CRF and largest component analysis as post-processing.
2 Proposed Method
Our proposed model (Fig. 1) samples diverse pixels from a ROI (brain region) and constructs hypercolumn (hp) from multiscale convolutional and atrous convolutional layer features as in past work [2, 10, 13]. It contains total 15 convolutional layers where first 13 layers (\(c_{i,j}\)) similar to convolutional part of VGG-16 [23] (Convolution, ReLU, Pooling) and last 2 convolutional filters (\(c_i\)) followed by [19]. We integrate atrous convolution (ac) according to PSPNet [27]. To predict pixel-wise segmentation from hypercolumn features, we utilize a multi-layer perceptron (MLP) with 3 fully connected (fc) layers of size 4096 followed by ReLU activation functions. The convolutional and fully-connected layers of our architecture can be denoted as {\(c_{11}\), \(c_{12}\), \(c_{21}\), \(c_{22}\), \(c_{31}\), \(c_{32}\), \(c_{33}\), \(c_{41}\), \(ac_{42}\), \(c_{43}\), \(c_{51}\), \(ac_{52}\), \(c_{53}\), \(c_6\), \(c_7\), \(h_p\), \(fc_1\), \(fc_2\), \(fc_3\)}. Hypercolumn features are extracted from 6 convolutional layers of {\(c_{12}\), \(c_{22}\), \(c_{33}\), \(c_{43}\), \(c_{53}\), \(c_7\)}. As our model can learn inside predefined ROI, so we can denote hypercolumn as \(h_{p\_ROI}\), where \(p\_ROI\) is a random pixel inside ROI. Therefore, we can formulate the hypercolumn as:
where \(c_{i(p\_ROI)}\) denotes the feature vectors of the \(p\_{ROI}\) pixel from \(i^{th}\) convolutional layer. The main focus in our model is an extra layer called ‘pixels’ which carry the coordinates of the pixels we want to train. Due to this layer, it has the freedom to choose random pixels inside ROI. It can also select an equal number of pixels from each class which helps to overcome data skewness problem. If N is the number of sample pixels and there are K classes in our dataset then we choose N/K pixels from each class to from hypercolumn. We also adopt Dice loss function similar to [20] to overcome class imbalance problem.
3 Experiments
3.1 Dataset
The study cohort consists of CT scans of 89 patients with ICH from the Singapore General Hospital, aged 62.0 years (SD = 14.0), of whom 54 were men. Ethics approval was obtained from the SingHealth Centralized Institutional Review Board. The dataset also consists of annotations for the hematoma region marked by two blinded assessors from the neurosurgery department. Disagreements regarding the annotations were resolved with discussion with a third assessor for final consensus. The segmentation contours delineating hemorrhagic region with pixel label 1 and healthy tissue and background considered as pixel label 0. Finally, all scans are then resampled to isotropic 1 mm\(^3\) resolution, skull-stripped as [24] and normalized to intensity range [0–255].
3.2 Training
As our model is capable of training with predefined pixels, the dataset is not resized in order to prevent shape and contextual information loss. However, we apply depth slicing along the axial plane and add padding to upsample all the slices to a common size of \(250 \times 250\) for convolutional filters. The slices are augmented by randomly flipping the image horizontally, in order to make the model more generalized. We observe that slices with less than 2000 pixels of the brain region have no significance in training. Hence, we remove these slices along with all the blank slices. We sample 2000 pixels (N) per slice to extract multiscale convolutional features and form hypercolumn for MLP prediction during the training phase. To minimize class skewness, we randomly choose an equal number of pixels (1000 pixels per class) from each class. However, some slices do not exist hemorrhagic region, in which case, we sample 2000 (N) pixels randomly. In the testing phase, all the pixels from within the brain region were selected to form hypercolumn for MLP prediction.
The model is trained using stochastic gradient descent with a mini-batch size of 5. Its parameters are initialized by a pre-trained VGG-16 [23] model with learning rate 0.001 and momentum 0.9. Our model is implemented using a modified version of deep learning platform [6] based on CAFFE framework [15]. The time taken to train the model is around 30 h for 40 epochs on a single Nvidia GPU 1080Ti GPU.
3.3 Post-processing
To remove small spurious false positives and to smoothen the predicted segmentation, we utilize 3D fully connected Conditional Random Field (CRF) with Gaussian edge potentials as proposed in [18, 19]. As the unary part of the CRF, we provide probability map generated from the softmax layer in prediction time. CRF regularizes the overall volume of the hemorrhage lesion leaving the internal structure of the lesion mostly intact. Further, remaining 3D-connected regions smaller than 1000 voxels are removed by using connected component analysis.
3.4 Results
To assess the performance of the model comprehensively, Dice coefficient, Hausdorff distance, Sensitivity, and Specificity are computed and presented in Table 1. The model evaluation is done using 5-fold cross-validation. Hence, the average value was considered. For example, the maximum dice accuracy obtained is 89.05%. However, the average value considering all folds is 87.60%. Our model also achieves average Hausdorff distance and sensitivity values of 11.76 and 91.51 respectively. The specificity obtained is almost 100% in all the cases. Table 2 shows the comparison of performance and computational efficiency of our model with other similar approaches. For a fair comparison, we consider the segmentation accuracy from the best trained model of the corresponding architecture. The same pre-processing and post-processing techniques are applied to all the different architectures compared. One of the most important observations is that our model requires almost half the time and the number of epochs to converge, as compared to the multi-modal PixelNet [13] and PSPNet [27]. It also achieves the best Dice coefficient compared to all other methods. However, PSPNet [27], one of the best performing techniques in computer vision application, obtains highest Hausdorff distance in this experiment. Figure 2 represents some predicted segmentations for our model with multi-modal PixelNet [13].
4 Discussion and Conclusion
We present a deep learning-based model: ICHNet, which predicts intracerebral hemorrhage (ICH) segmentation comparable to radiologists. In medical imaging, the anatomy of interest occupies only a very small region of the scan, which makes the prediction of the model strongly biased towards the background. ICHNet has the ability to train only using pixels obtained from within the brain region, which improves the optimization time and segmentation performance. Another advantage of using ICHNet is that it minimizes further skewness of the data by utilizing Dice coefficient as the objective function. Deep learning models with medical imaging applications using MRI consist of multi-channel (modalities) training data, unlike in the case of CT which comprises only one channel (therefore, lesser contextual information). This makes it challenging for the model to distinguish between the healthy region and the hemorrhage lesion. Future work incorporates building a 3D ICHNet model for different medical applications using MRI or CT data.
References
van Asch, C.J., Luitse, M.J., Rinkel, G.J., van der Tweel, I., Algra, A., Klijn, C.J.: Incidence, case fatality, and functional outcome of intracerebral haemorrhage over time, according to age, sex, and ethnic origin: a systematic review and meta-analysis. Lancet Neurol. 9(2), 167–176 (2010)
Bansal, A., Chen, X., Russell, B., Ramanan, A.G., et al.: PixelNet: representation of the pixels, by the pixels, and for the pixels. arXiv preprint arXiv:1702.06506 (2017)
Bansal, A., Russell, B., Gupta, A.: Marr revisited: 2D-3D alignment via surface normal prediction. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5965–5974 (2016)
Cai, J., Lu, L., Xie, Y., Xing, F., Yang, L.: Improving deep pancreas segmentation in CT and MRI images via recurrent neural contextual learning and direct loss function. arXiv preprint arXiv:1707.04912 (2017)
Chen, L., Bentley, P., Rueckert, D.: Fully automatic acute ischemic lesion segmentation in DWI using convolutional neural networks. NeuroImage: Clin. 15, 633–643 (2017)
Chen, L.C., Papandreou, G., Kokkinos, I., Murphy, K., Yuille, A.L.: DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018)
Chen, L.C., Papandreou, G., Schroff, F., Adam, H.: Rethinking atrous convolution for semantic image segmentation. arXiv preprint arXiv:1706.05587 (2017)
Choi, Y., Kwon, Y., Lee, H., Kim, B.J., Paik, M.C., Won, J.H.: Ensemble of deep convolutional neural networks for prognosis of ischemic stroke. In: Crimi, A., Menze, B., Maier, O., Reyes, M., Winzeck, S., Handels, H. (eds.) BrainLes 2016. LNCS, vol. 10154, pp. 231–243. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-55524-9_22
Grewal, M., Srivastava, M.M., Kumar, P., Varadarajan, S.: RADNET: radiologist level accuracy using deep learning for hemorrhage detection in CT scans. arXiv preprint arXiv:1710.04934 (2017)
Hariharan, B., Arbeláez, P., Girshick, R., Malik, J.: Hypercolumns for object segmentation and fine-grained localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 447–456 (2015)
Havaei, M., et al.: Brain tumor segmentation with deep neural networks. Med. Image Analysis 35, 18–31 (2017)
Hwang, S., Park, S.: Accurate lung segmentation via network-wise training of convolutional networks. In: Cardoso, M.J., et al. (eds.) DLMIA/ML-CDS -2017. LNCS, vol. 10553, pp. 92–99. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67558-9_11
Islam, M., Ren, H.: Multi-modal PixelNet for brain tumor segmentation. In: Crimi, A., Bakas, S., Kuijf, H., Menze, B., Reyes, M. (eds.) BrainLes 2017. LNCS, vol. 10670, pp. 298–308. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-75238-9_26
Jégou, S., Drozdzal, M., Vazquez, D., Romero, A., Bengio, Y.: The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 1175–1183. IEEE (2017)
Jia, Y., et al.: Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM International Conference on Multimedia, pp. 675–678. ACM (2014)
Kalita, J., Misra, U., Vajpeyee, A., Phadke, R., Handique, A., Salwani, V.: Brain herniations in patients with intracerebral hemorrhage. Acta Neurol. Scand. 119(4), 254–260 (2009)
Kamnitsas, K., et al.: Efficient multi-scale 3D CNN with fully connected CRF for accurate brain lesion segmentation. Med. Image Anal. 36, 61–78 (2017)
Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with Gaussian edge potentials. In: Advances in Neural Information Processing Systems, pp. 109–117 (2011)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional models for semantic segmentation. In: CVPR, vol. 3, p. 4 (2015)
Milletari, F., Navab, N., Ahmadi, S.A.: V-net: fully convolutional neural networks for volumetric medical image segmentation. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 565–571. IEEE (2016)
Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1520–1528 (2015)
Saulle, M.F., Schambra, H.M.: Recovery and rehabilitation after intracerebral hemorrhage. In: Seminars in Neurology, vol. 36, p. 306. NIH Public Access (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Smith, S.M.: Fast robust automated brain extraction. Hum. Brain Mapp. 17(3), 143–155 (2002)
Tran, P.V.: A fully convolutional neural network for cardiac segmentation in short-axis MRI. arXiv preprint arXiv:1604.00494 (2016)
Zhao, H., Qi, X., Shen, X., Shi, J., Jia, J.: ICNet for real-time semantic segmentation on high-resolution images. arXiv preprint arXiv:1704.08545 (2017)
Zhao, H., Shi, J., Qi, X., Wang, X., Jia, J.: Pyramid scene parsing network. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2881–2890 (2017)
Acknowledgement
This work is supported by the Singapore Academic Research Fund under Grant R-397-000-227-112, NUSRI China Jiangsu Provincial Grant BK20150386 and BE2016077 and NMRC Bedside & Bench under grant R-397-000-245-511 awarded to Dr. Hongliang Ren.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Islam, M., Sanghani, P., See, A.A.Q., James, M.L., King, N.K.K., Ren, H. (2019). ICHNet: Intracerebral Hemorrhage (ICH) Segmentation Using Deep Learning. In: Crimi, A., Bakas, S., Kuijf, H., Keyvan, F., Reyes, M., van Walsum, T. (eds) Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries. BrainLes 2018. Lecture Notes in Computer Science(), vol 11383. Springer, Cham. https://doi.org/10.1007/978-3-030-11723-8_46
Download citation
DOI: https://doi.org/10.1007/978-3-030-11723-8_46
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11722-1
Online ISBN: 978-3-030-11723-8
eBook Packages: Computer ScienceComputer Science (R0)