1 Introduction and Related Work

Ischemic stroke is one of the most frequent cerebrovascular disease and vital cause of death and disability in the world. Defining location and extent of irreversibly damaged brain tissue is a critical part of the decision-making process in acute stroke. When compared to normal tissues, the infarcted tissues in the volume differ on the basis of diffusion & perfusion, hence making diffusion and perfusion MR imaging the gold standard for ischemic lesions. Recently, due to high speed, availability and lack of contradiction, Computed Tomography (CT) has been used to evaluate the order of treatment for ischemic stroke. Though the contrast of ischemic stroke lesions are poor in brain CT images, CT perfusion imaging could be exploited to identify & locate the ischemic stroke lesions. Qualitative images derived from CT perfusion imaging comprises of Cerebral Blood Flow (CBF), Cerebral Blood Volume (CBV), time to peak of the residue function (Tmax) and Mean Transit Time (MTT). CBV is defined as the volume of blood in a given amount of brain tissue. CBF is defined as CBV per unit time. The ratio of CBV and CBF is expressed as MTT.

The work presented in the paper is an attempt to develop an algorithm for fully automatic segmentation for ischemic stroke using CT perfusion maps. Several perfusion maps (CBV, CBF, MTT, Tmax) were used as inputs to the algorithm whereas ground truth used was generated from MRI Diffusion Weighted Imaging (DWI) of ischemic stroke lesion. On DWI maps, infarcted brain tissues are portrayed as hyper-intense regions.

Convolutional neural networks (CNNs) [7] have been applied to wide variety of pattern recognition tasks, most common ones are image classification [2, 6, 12] and semantic segmentation using fully convolutional networks (FCN) [8]. CNNs have also been applied to medical image segmentation and classification [1, 9]. In this paper, we propose a CNN based architecture for segmentation of the stroke lesion from CT perfusion maps and CT images of the brain. Our network’s connectivity pattern was inspired from DenseNets [3].

2 Materials and Methods

2.1 Data Pre-processing Pipeline

Skull Stripping and Brain Mask Generation. Skull stripping is an important pre-processing step in brain image analysis. In this process, non brain tissues like skull, eyes, scalp, dura were removed. This process helped in enhancing the segmentation accuracy and lowering the execution time required by segmentation algorithms. Brain extraction tool (BET) [13] was utilized for removal of skull and another non brain tissues. This was followed by Hounsfield windowing (0–100) of brain tissues so as to improve the contrast.

A brain mask was generated from skull stripped CT images of brain by means of simple thresholding. The brain mask was used to remove non-brain regions from CBF, CBV, MTT and Tmax maps. Different steps in the data pre-processing pipeline are illustrated in Fig. 1.

Fig. 1.
figure 1

Data pre-processing steps involved (a) CT image after Hounsfield windowing, (b) Skull stripped image, (c) Mask generated using skull stripped image, (d) Cerebral blood flow information, (e) Cerebral blood volume information, (f) MTT, (g) Tmax and (h) Ground-truth image

Data Normalization. Volumes were normalized to have zero mean and unit variance using Eq. 1.

$$\begin{aligned} X_{norm} = \frac{X-\mu }{\sigma } \end{aligned}$$
(1)

where X is the data, \(\mu \) and \( \sigma \) are global mean and global standard deviation associated with X. These were calculated from entire training dataset.

2.2 Proposed Network

We utilize an encoder-decoder architecture for the task of segmenting ischmeic lesion from the CT perfusion maps. The encoder in the network comprise of dense connectivity pattern & was inspired from the DenseNet-121 architecture. The decoder was composed of the bi-linear up-sampling module and convolutional layers. Features learnt in the down-sampling path were concatenated with the features learnt in up-sampling path using long skip connection. The architecture of the network is illustrated in Fig. 2.

Fig. 2.
figure 2

Proposed network

The input to the network comprised of 5 channels for feeding CT image and corresponding 4 different CT perfusion maps. The first convolutional layer composed of 64 different 7 \(\times \) 7 kernels. The resultant feature maps were then passed to a batch normalization [4] and a non linearity layer (ReLU) [11]. This was followed by a max pooling layer with the kernel size and stride set to \(3\times 3\) and 2 respectively. Max-pooling aided in reducing the spatial dimension of the generated features. The features were passed through a series of dense block and transition layers. A dense block was composed of convolutional layers, wherein each convolutional layer received inputs from all the preceding layers in the network, Fig. 3. In a dense block, batch normalization and a non-linearity layer preceded a convolutional layer in the network, while a 2-D dropout layer (p = 0.5) succeeded the convolutional layer, Fig. 4. In a dense block, the number of convolutional layers was an hyper-parameter. In this work, the network composed of 4 dense blocks and had 6, 12, 24, and 16 convolutional layers respectively. Transition Down block was composed of \([1\times 1]\) convolutional layer followed by dropout with \(p = 0.5\) and \([2 \times 2]\) max pooling layer and aided in reducing the spatial dimension of the features learnt by the network Fig. 5.

Fig. 3.
figure 3

Dense block with 3 layers

Fig. 4.
figure 4

Layer of dense block

Fig. 5.
figure 5

Transition block

2.3 Loss Function

Lesions are represented by a minuscule proportion of voxels in a medical volume by thereby leading to class imbalance. This issue was circumvented by training the network to minimize a hybrid loss function. The hybrid cost function comprised of weighted cross entropy & dice loss [10].

The dice co-efficient is an overlap metric used for assessing the quality of segmentation maps. The dice coefficient between two binary volumes can be written as:

$$\begin{aligned} DICE =\frac{2\sum _{i}^{N}p_ig_i}{\sum _i^Np_i^2+\sum _i^Ng_i^2} \end{aligned}$$
(2)

where the sums run over the N voxels, of the predicted binary segmentation volume \(p_i \in P\) and the ground truth binary volume \(g_i \in G\).

The parameters of the network was optimized so as to minimize the \(total\_loss\), Eq. (3).

$$\begin{aligned} {total\_loss} = \lambda (cross\_entropy\_loss) + \gamma (dice\_loss\_bg) + \delta (dice\_loss\_fg) \end{aligned}$$
(3)

where \(\lambda \), \(\delta \) and \(\gamma \) are empirically assigned weights to individual losses, fg and bg represent foreground voxels which corresponded to lesion regions and background voxels which corresponded to non-lesion regions respectively. In this work we set \(\gamma = 0.25\), \(\delta = 0.25\) and \(\lambda = 0.50\).

Training. The proposed model was trained on a batch size of 6 for 90 epochs using ADAM [5] as the optimizer. The provided dataset was split into training and validation subsets in the ratio of 7:3. The weights of the encoding layers of our network were initialized using DenseNet-121 architecture pre-trained on ImageNet dataset. Since DenseNet-121 architecture took 3 channel input, the weights of the first layer of our network were randomly initialized. During training of our network, the model was saved on every epoch and the best model selection criteria was based on the model which gave the highest dice score on the validation set.

3 Experimental Setup and Result

3.1 Data Sets and Evaluation Criteria

Imaging data from acute stroke patients in two centers who presented within 8 h of stroke onset and underwent an MRI DWI within 3 h after CT perfusion were included. To assess cerebral perfusion, a contrast agent (CA) was administered to the patient and its temporal change was captured in dynamic scans acquired 1–2 s apart. There were 63 patients in the training data set. Each patient data comprised of brain CT image, CBV map, CBF map, MTT map, Tmax map with ground truth that was generated on DWI map. To access our segmentation algorithm’s performance, the organizers employed Dice, precision, recall, Hausdroff distance, Average distance and Absolute volume difference.

3.2 Experimental Results

See Fig. 6 and Table 1.

Fig. 6.
figure 6

The figure illustrates our model’s prediction on held-out test set.

Table 1. The table provides the list of evaluation metrics used for evaluating our segmentation on the challenge test dataset (n = 62). HD - Hausdorff Distance, AD - Average Distance, AVD - Absolute Volume Difference. The values provided are mean (standard deviation).

3.3 Conclusion

We developed an automatic segmentation algorithm for ischemic stroke lesions using CT perfusion maps. CT imaging has several advantage such as less scan time, low cost and ease of availability over MRI. The analysis of CT perfusion maps have shown to be useful for early treatment planning on the onset of stroke.