Keywords

1 Introduction

The prevalence of myopia and high myopia are increasing globally at an alarming rate, with significant increases in the risk for vision impairment from pathologic conditions associated with high myopia, including retinal damage, cataract and glaucoma [1]. Myopia macular degeneration (MMD) is a major cause of vision impairment in high myopia. Lacquer cracks (LCs), signs of MMD, typically present as yellowish to white lines in the posterior segment in eyes with high myopia and are believed to be breaks in the choroid/retinal pigment epithelium (RPE)/Bruch membrane [2]. The prevalence of LC is 4.3%–9.2% in high myopia eyes [3]. Patients with LCs are at high risk of visual impairment because LCs may lead to further adverse changes in the fundus, such as patchy chorioretinal atrophy or myopic choroidal neovascularization [4]. Thus, the segmentation of LCs are quite important in clinical ophthalmology, which helps doctors diagnose and analyze the development of MMD.

Indocyanine green angiography (ICGA) is considered to be the ground truth for LCs detection. It provides details of choroidal vasculature in high myopia eyes and allows observation of the location and extent of LCs much more clearly than fundus photography and typical fluorescein angiography (FA) [2, 5, 6].

There is little studies on LCs segmentation recently. In order to achieve the accurate segmentation of LCs in ICGA images, we propose a novel method based on conditional generative adversarial networks (cGAN) [7]. As generative adversarial networks (GANs) [8] learn a generative model of data, cGANs learn a condition generative model. This makes cGANs much more suitable for image segmentation tasks, where we condition on an input image and generate the corresponding output segmentation image. Previous cGANs have tackled inpainting [9], image prediction from a normal map [10], image manipulation guided by user constraint [11], future frame prediction [12], etc. We are the first to apply cGAN on LCs segmentation. According to the features of ICGA images with LCs, Dice loss function [13] is added in the cGAN model to deal with the situation where there is a strong imbalance between the number of object the background pixels so that the generator can finally achieve better segmentation.

2 Method

2.1 Conditional Generative Adversarial Networks and Improvements

Image-conditional generative adversarial nets consists of two adversarial models: a generative model \( G \) that extracts the image features and generates fake images, and a discriminator \( D \) that estimates the probability that the image came from the training data rather than the generator.

It is easier to understand the network through the diagram below (see Fig. 1). cGANs learn a mapping from original image \( x \) and random noise vector \( z \) to the ground truth \( y \). The generator \( G \) is trained to produce outputs that cannot be distinguished by real images, while the discriminator \( D \) is trained to detect the fake images produced by generator. This training process is diagrammed in Fig. 1.

Fig. 1.
figure 1

Diagram of conditional GAN.

The objective function of a cGAN can be expressed as following [7]:

$$ L_{cGAN} (G,D) = E_{{x,y \sim p_{data} (x,y)}} [\log D(x,y)] + E_{{x \sim p_{data} (x),z \sim p_{z} (z)}} [\log (1 - D(x,G(x,z)))] $$
(1)

Since the generator tries to minimize the objective function against the adversarial discriminator that tries to maximize it, the final objective function is:

$$ F = \arg \mathop {\hbox{min} }\limits_{G} \mathop {\hbox{max} }\limits_{D} L_{cGAN} (G,D) $$
(2)

Previous approaches to cGANs have found it beneficial to mix the GAN objective with a traditional loss, such as L1 and L2 loss functions [9]. By adding L1 or L2 loss, discriminator’s job remains unchanged, but the generator is tasked to produce not only undistinguishable fake images but also images much more similar to the ground truth. According to previous work [7], L1 loss, which encourages less blurring than L2 loss, is adopted in this paper.

To apply cGANs on LCs segmentation, improvement is made based on the objective function. In ICGA images, LCs usually occupy a relatively small part of whole image. This data imbalance problem often causes the learning process to get trapped in a local minimum of the loss function and finally achieves predictions which are mainly biased to backgrounds. Also, it may just mistake the vessel, choroidal hemorrhage and the shade at the edge of images for LCs. To solve this problem, Dice loss function is added in the objective function. Dice loss function can effectively deal with the imbalance between the number of object and background pixels and finally make the proposed segmentation much more accurate.

Thus, both L1 loss function and Dice loss function [13] as follows are adopted:

$$ L_{L1} \left( G \right) = E_{{x,y \sim p_{data} (x,y),z \sim p_{z} (z)}} [\left\| {y - G(x,z)} \right\|_{1} ] $$
(3)
$$ L_{Dice} (G) = E_{{x,y \sim p_{data} (x,y),z \sim p_{z} (z)}} [1 - \frac{{2\sum\nolimits_{i}^{N} {y_{i} G(x,z)_{i} } }}{{\sum\nolimits_{i}^{N} {y_{i}^{2} + \sum\nolimits_{i}^{N} {G(x,z)_{i}^{2} } } }}] $$
(4)

where the sums run over all \( N \) pixels, of the generated binary segmentation pixel \( G(x,z)_{i} \in G(x,z) \) and the ground truth binary pixel \( y_{i} \in y \).

The final objective function is:

$$ F = \arg \mathop {\hbox{min} }\limits_{G} \mathop {\hbox{max} }\limits_{D} L_{cGAN} (G,D) + \mu L_{L1} (G) + \lambda L_{Dice} (G) $$
(5)

Past cGANs [10] provided Gaussian noise as an input to the generator since the net would produce deterministic outputs without the noise. In the proposed net, we provide random noise in the form of dropout, which is applied in several layers of our generator, and the initialization of kernels.

2.2 Network Architectures

The architecture of our network including generator and discriminator is illustrated in Figs. 2 and 3. The generator in Fig. 2 is similar to the traditional encoder-decoder architecture. Each encoding layer is a convolution layer with batch normalization and ReLU activation function. Each layer of decoder consists of a deconvolution, batch normalization and ReLU activation function. Dropout with a rate of 50% is adopted in the first three decoding layers to efficiently prevent the overfitting during training. In practise, leaky ReLU function, a variant of ReLU function, is adopted with the slope of 0.2. Leaky ReLU function can be used to mitigate the vanishing gradient problem and makes the network converge much faster during training.

Fig. 2.
figure 2

Architecture of generator.

Fig. 3.
figure 3

Architecture of discriminator.

All convolutions and deconvolutions are \( 4 \times 4 \) spatial filters applied with stride 2. Compared to other traditional deep convolution networks, our network adopt filters in convolution layer applied with stride 2 instead of pooling layer to reduce the spatial size of the representation, since discarding the pooling layer performs better in training good generative models [14].

We adopt the recent popular U-Net [15] as our main framework in generator. For medical image segmentation tasks, predicted segmentation shares the structure information with original images. Skip connections can constrain the output to be aligned with the input and make segmentation result more reasonable and accurate. In the proposed generator, high resolution features from the contracting path are combined with the upsampled output by skip connections. Thus, a successive convolution layer can then learn to assemble a more precise output based on this information. Since LCs in ICGA are mostly tiny and irregular, skip connections, which helps generator produce images with more details that look similar to the real LCs, can improve the accuracy of our segmentation.

PatchGAN is adopted in the discriminator which is shown in Fig. 3. Traditional discriminator in GANs for image processing estimates the probability that the image is real or fake and outputs a single scalar to represent. In contrast, patchGAN tries to classify if each \( N \times N \) patch in an image is real or fake. We run this discriminator convolutionally across the image, averaging all responses to provide the ultimate output possibility.

We create 2 copies of discriminator with the same underlying variables, one for real pairs and one for fake pairs. For real pairs, we concatenate the input and the ground truth first. For fake pairs, we concatenate the input and the output first. They all run over 5 encoding layers right after the concatenation. The convolutions are \( 4 \times 4 \) spatial filters with stride 2 except for those in last two layers with stride 1. These two layers applied zero-padding before convolution to change the size of representation in every channel from \( 32 \times 32 \) to \( 30 \times 30 \) and make the size of reception field, also the \( N \) in patchGAN, to be 70, which has better image quality than the full size image [7]. In the final \( 30 \times 30 \) image, each pixel represents the probability of a \( 70 \times 70 \) patch in the original image.

Discriminator with patchGAN effectively models the image as a Markov random field, assuming independence between pixels separated by more than a patch diameter [16]. It is demonstrated that the size of patch can be much smaller than the full size of the image and still produce high quality results [7]. Compared to traditional discriminators, discriminator using patchGAN has fewer parameters, runs faster and can be applied on arbitrarily large images.

3 Experiments and Results

The proposed network is evaluated on the ICGA data set of patients with LCs. The training data set consists of 22 annotated ICGA images of size \( 768 \times 768 \). Because of the lack of training data, we use excessive data augmentation by flipping images vertically and horizontally. During the experiments, 6 images were randomly chosen as the testing images. Data augmentation is applied to the rest 16 images and finally the training set consists of 64 images.

In order to evaluate the performance of segmentation, our method is compared with U-Net, a recently popular network that specializes on biomedical image segmentation, DenseNet [17], a new network which is good at feature extraction, and also the original cGAN without Dice loss function. U-Net with 5 layers and DenseNet with 7 dense blocks are adopted in the comparison, which have the best performance on the ICGA data set. The segmentation results are shown in Fig. 4.

Fig. 4.
figure 4

Segmentation results of different networks. (a) Original ICGA image. (b) The ground truth. (c) Results of original cGAN. (d) Results of the proposed net. (e) Results of DenseNet. (f) Results of U-Net.

As shown in Fig. 4, original cGAN, improved cGAN and DenseNet perform better than U-Net on segmenting LCs. Since ICGA images with LCs does not include many obvious features and the intensity information of LCs might be confused by nets with vessels, macular and the shade at the edge of images, it is quite difficult for U-Net to extract key features without discriminator networks. Second, original and improved cGAN get reasonable segmentation, which is quite similar to the ground truth. However, original cGAN and DenseNet segment some part of choroidal hemorrhage and retinal vessels as false positives. This situation is drastically suppressed in our proposed net due to the addition of Dice loss function.

To make the comparison more quantitatively, we adopt intersection-over-union (IoU) and pixel accuracy (PA) to evaluate the segmentation results in Table 1. IoU is the standard metric for segmentation purposes. It computes a ratio between the intersection and the union of the ground truth and the predicted segmentation. The ratio can be reformulated as the number of true positives (TP) over the sum of true positives, false positives (FP) and false negatives (FN). Pixel accuracy is another metric, simply computing a ratio between the amount of properly segmented pixels and the total number of them [18].

Table 1. Quantitative segmentation results of different networks.
$$ IoU = \frac{TP}{TP + FP + FN} $$
(6)
$$ PA = \frac{TP}{TP + FP} $$
(7)

As we have seen in the Table 1, all three cGANs and DenseNet achieve better segmentation according to both IoU and PA. Results from DenseNet is overall better than original cGAN but worse than the improved cGAN. cGAN with only Dice loss function is added to reflect the importance of Dice loss function. Compared with original cGAN, segmentation of cGAN with only Dice loss function can achieve higher PA but lower IoU. In theory, Dice loss function adds more strict constraint to net, and the segmentation just contains the most obvious part of LCs and losses some part of the LCs, which is not obvious to the net. L1 loss is also important in cGAN since it penalizes the difference between ground truth and outputs and also encourages the output to be aligned with the input. Finally, the proposed net with both L1 loss and Dice loss achieves better IoU and better PA than other nets. It seems to be the most appropriate method to solve the LCs segmentation problem.

4 Conclusion

We propose an improved conditional GAN to segment LCs in ICGA images. Compared with original cGAN, U-Net and DenseNet, adding Dice loss function can solve the data imbalance problem and optimize the segmentation results. According to the results of experiments on the data set, the segmentation in the proposed network is overall better than other nets.