DeepVessel: Retinal Vessel Segmentation via Deep Learning and Conditional Random Field

Fu, Huazhu; Xu, Yanwu; Lin, Stephen; Kee Wong, Damon Wing; Liu, Jiang

doi:10.1007/978-3-319-46723-8_16

Huazhu Fu¹⁸,
Yanwu Xu¹⁸,
Stephen Lin¹⁹,
Damon Wing Kee Wong¹⁸ &
…
Jiang Liu^18,20

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9901))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

21k Accesses

Abstract

Retinal vessel segmentation is a fundamental step for various ocular imaging applications. In this paper, we formulate the retinal vessel segmentation problem as a boundary detection task and solve it using a novel deep learning architecture. Our method is based on two key ideas: (1) applying a multi-scale and multi-level Convolutional Neural Network (CNN) with a side-output layer to learn a rich hierarchical representation, and (2) utilizing a Conditional Random Field (CRF) to model the long-range interactions between pixels. We combine the CNN and CRF layers into an integrated deep network called DeepVessel. Our experiments show that the DeepVessel system achieves state-of-the-art retinal vessel segmentation performance on the DRIVE, STARE, and CHASE_DB1 datasets with an efficient running time.

You have full access to this open access chapter, Download conference paper PDF

Multi-level deep supervised networks for retinal vessel segmentation

Article 02 June 2017

W–net: A Convolutional Neural Network for Retinal Vessel Segmentation

Vessel-Net: Retinal Vessel Segmentation Under Multi-path Supervision

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Retinal vessels are of much diagnostic significance, as they are commonly examined to evaluate and monitor various ophthalmological diseases. However, manual segmentation of retinal vessels is both tedious and time-consuming. To assist with this task, many approaches have been introduced in the last two decades to segment retinal vessels automatically. For example, Marin et al. employed the gray-level vector and moment invariant features to classify each pixel using a neural network [8]. Nguyen et al. utilized a multi-scale line detection scheme to compute vessel segmentation [11]. Orlando et al. performed vessel segmentation using a fully-connected Conditional Random Field (CRF) whose configuration is learned using a structured-output support vector machine [12]. Existing methods such as these, however, lack sufficiently discriminative representations and are easily affected by pathological regions, as shown in Fig. 1.

Deep learning (DL) have recently been demonstrated to yield highly discriminative representations that have aided in many computer vision tasks. For example, Convolutional Neural Networks (CNNs) have brought heightened performance in image classification and semantic image segmentation. Xie et al. employed a holistically-nested edge detection (HED) system with deep supervision to resolve the challenging ambiguity in object boundary detection [16]. Zheng et al. reformulated the Conditional Random Field (CRF) as a Recurrent Neural Network (RNN) to improve semantic image segmentation [18]. These works inspire us to learn rich hierarchical representation based on a DL architecture.

A DL-based vessel segmentation method is proposed in [9], which addressed the problem as pixel classification using a deep neural network. In [7], Li et al. employed cross-modality data transformation from retinal image to vessel map, and outputted the label map of all pixels for a given image patch. These methods has two drawbacks: first, it does not account for non-local correlations in classifying individual pixels/patches, which leads to failures caused by noise and local pathological regions; second, the classification strategy is computationally intensive for both the training and testing phases. In our paper, we address retinal vessel segmentation as a boundary detection task that is solved using a novel DL system called DeepVessel, which utilizes a CNN with a side-output layer to learn discriminative representations, and also a CRF layer that accounts for non-local pixel correlations. With this approach, our DeepVessel system achieves state-of-the-art performance on publicly-available datasets (DRIVE, STARE, and CHASE_DB1) with relatively efficient processing.

2 Proposed Method

Our DeepVessel architecture consists of three main layers. The first is a convolutional layer used to learn a multi-scale discriminative representation. The second is a side-output layer that operates with the early layers to generate a companion local output. The last one is a CRF layer, which is employed to further take into account the non-local pixel correlations. The overall architecture of our DeepVessel system is illustrated in Fig. 2.

Convolutional Layer is used to learn local feature representations based on patches randomly sampled from the image. Suppose $\mathbf L _j^{(n)}$ is the j-th output map of the n-th layer, and $\mathbf L _i^{(n-1)}$ is the i-th input map of the n-th layer. The output of the convolutional layer is then defined as:

$$\begin{aligned} \mathbf L _j^{(n)} = f (\sum _i \mathbf L _i^{(n-1)} *\mathbf W _{ij}^{(n)} + b_j^{(n)}{} \mathbf 1 ), \end{aligned}$$

(1)

where $ \mathbf W _{ij}^{(n)}$ is the kernel linking the i-th input map to the j-th output map, $*$ denotes the convolution operator, and $b_j^{(n)}$ is the bias element.

Side-output Layer acts as a classifier that produces a companion local output for early layers [6]. Suppose $\mathbf {W}$ denotes the parameters of all the convolutional layers, and there are M side-output layers in the network, where the corresponding weights are denoted as $\mathbf {w}=(\mathbf {w}^{(1)},...,\mathbf {w}^{(M)})$. The objective function of the side-output layer is given as:

$$\begin{aligned} \mathcal {L}_{s}(\mathbf {W}, \mathbf {w}) = \sum ^M_{m=1} \alpha _m L^{(m)}_s(\mathbf {W}, \mathbf {w}^{(m)}), \end{aligned}$$

(2)

where $\alpha _m$ is the loss function fusion-weight or each side-output layer, and $L^{(m)}_s$ denotes the image-level loss function, which is computed over all pixels in the training retinal image X and its vessel ground truth Y. For the retinal image, the pixels of the vessel and background are imbalanced, thus we follow HED [16] to utilize a class-balanced cross-entropy loss function:

$$\begin{aligned} L^{(m)}_s(\mathbf {W}, \mathbf {w}^{(m)}) = -\frac{|Y^-|}{|Y|} \sum _{j\in Y^+} \log \sigma ( a_j^{(m)}) -\frac{|Y^+|}{|Y|} \sum _{j\in Y^-} \log ( 1 - \sigma (a_j^{(m)})), \end{aligned}$$

(3)

where $|Y^+|$ and $|Y^-|$ denote the vessel and background pixels in the ground truth Y, and $\sigma (a_j^{(m)})$ is the sigmoid function on pixel j of the activation map $A_s^{(m)} \equiv {a_j^{(m)}, j=1,...,|Y|}$ in side-output layer m. Simultaneously, we can obtain the vessel prediction map of each side-output layer m by $\hat{Y}_s^{(m)} = \sigma (A_s^{(m)} )$.

Conditional Random Field (CRF) Layer is used to model non-local pixel correlations. Although the CNN can produce a satisfactory vessel probability map, it still has some problems. First, a traditional CNN has convolutional filters with large receptive fields and hence produces maps too coarse for pixel-level vessel segmentation (e.g., non-sharp boundaries and blob-like shapes). Second, a CNN lacks smoothness constraints, which may result in small spurious regions in the segmentation output. Thus, we utilize a CRF layer to obtain the final vessel segmentation result. Following the fully-connected CRF model of [5], each node is a neighbor of each other, and it takes into account long-range interactions in the whole image. We denote $\mathbf v = \{v_i\}$ as a labeling over all pixels of the image, with $v_i =1$ for vessel and $v_i=0$ for background. The energy of a label assignment $\mathbf v $ is given by:

$$\begin{aligned} E(\mathbf v ) = \sum _i \psi _u (v_i) + \sum _{i<j} \psi _p (v_i, v_j), \end{aligned}$$

(4)

with:

$$\begin{aligned} \psi _u (v_i) = \frac{1}{M}\sum _{m=1}^M a_i^{(m)}, \;\; \mathbf and , \;\; \psi _p (v_i, v_j) = \mu (v_i, v_j) \sum _{d=1}^D h^{(d)} k^{(d)} (\mathbf f _i, \mathbf f _j), \end{aligned}$$

(5)

where $\psi _u (v_i)$ and $\psi _p (v_i, v_j)$ are the unary and pairwise terms, respectively. $a_j^{(m)}$ is the value at pixel i in the activation map $A_s^{(m)}$ of side-output layer m, and $k^{(d)}$ for $d=1,...,D$ is the Gaussian kernel applied on feature vectors. The feature vector of pixel i, denoted by $\mathbf f _i$, is derived from image features such as spatial location and RGB values. An effective solution to minimize the CRF energy $E(\mathbf v )$ in Eq. (4) is through mean-field approximation [5]. In our system, we employ the implementation of [18], in which the CRF is reformulated as a Recurrent Neural Network (RNN) layer and can be utilized in an end-to-end DL architecture.

Our DeepVessel Architecture is an end-to-end system illustrated in Fig. 2, which contains four CNN stages and one CRF stage. Each CNN stage includes multiple convolutional and ReLU layers, and one side-output layer. The side-output layer is connected to the last convolutional layer in each stage to support deep layer supervision. The objective function of the whole system is:

$$\begin{aligned} (\mathbf {W}, \mathbf {w}, \mathbf {h}) =\arg \min \left( \mathcal {L}_s(\mathbf {W}, \mathbf {w}) + L^{CRF}_s (\mathbf {W}, \mathbf {w}, \mathbf {h}) \right) , \end{aligned}$$

(6)

where $\mathbf {h}$ is a CRF layer parameter, $\mathcal {L}_s$ is the CNN layer loss function in Eq. (2), and $L^{CRF}_s$ is the CRF layer loss function, specifically the class-balanced cross-entropy loss function in Eq. (3). We minimize the objective function via standard stochastic gradient descent. In our DeepVessel architecture, we only employ four CNN stages with side-output layers. The main reason is that retinal vessels in fundus images are different from general object edges in natural images. An object edge separates two regions of different appearance, which allows the boundary to be detectable even at deeper layers. By contrast, a retinal vessel appears merely as a curved line, which is too thin to respond in the higher stride layers. Thus, we only employ four side-output layers. The vessel prediction map example for each side-output layer is shown in Fig. 3, where earlier side-output layers have a smaller receptive field size and respond to local details, while deeper layers represent appearance at a larger scale.

3 Experiments

We implement our framework using the Caffe library and build on top of the implementation of HED [16]. The model parameters follow the configuration used in [16]. We employed a two-step fine-tuning approach that first utilizes the ARIA dataset [2] to fine-tune the initial parameters, and then the DRIVE training set [15] to obtain the final fine-tuning parameters. We rotate all training images to eight different angles, and rescale the ARIA images to the same size as the DRIVE images. The whole fine-tuning phase takes about two days on a single NVIDIA K40 GPU (10, 000 iterations). For a 565 $\times $ 584 image, it takes about 1.3 s to generate the final vessel map.

3.1 Experimental Results

We evaluate our method^{Footnote 1} on three publicly datasets: DRIVE [15], STARE [4], and CHASE_DB1 [3]. These datasets provide two manual segmentations generated by two different experts for each image. The first observer is selected as ground truth and used for performance evaluation in the literature. We performed the evaluation in terms of Accuracy ($Acc =\frac{TP+TN}{TP+FN+TN+FP}$), and Sensitivity ($Sen = \frac{TP}{TP+FN}$), where TP, TN, FP and FN represent the number of true positives, true negatives, false positives and false negatives, respectively. Note that there is no training set in the STARE and CHASE_DB1 datasets, thus we only utilize the DRIVE training set to fine-tune the final parameters.

Table 1. Performance of different segmentation methods on three datasets.

Full size table

We compare our method with several state-of-the-art vessel segmentation methods, and also report the ground truth labeling of the second observer as the performance of a human observer. Our DeepVessel system outputs a probability map, and Otsu’s thresholding method [13] is employed to obtain the binary labeling result automatically in the experiments. Table 1 lists the performances on the three datasets, where the reported performance scores from the original papers are used. Our method obtains the best Accuracy scores among the methods, which include the other DL method [9] on the DRIVE dataset. And our method obtains Accuracy performance similar to the human observer on the CHASE_DB1 dataset and a better Accuracy score on the other two datasets.

We provide the results produced by the individual and average fusion results of the side-output layers in Table 1. We also report our results without side-output layers (DeepVessel w/o S). We observe that the second and third side-output layers obtain better performance than the other two layers, which is also observed in Fig. 3. The side-output fusion combines all the side-output layer outputs and generally performs better than any of the individual layers and the version without side-output layers. Figure 4 displays some results. It can be observed that our DeepVessel with CRF produces a clearer vessel segmentation result than the fusion result from only the side-output layers, especially for pathological regions as shown in the second row of Fig. 4.

4 Conclusion

In this paper, we have developed a retinal vessel segmentation method, called DeepVessel, based on a novel deep learning architecture. A discriminative representation is learned by a CNN with side-output layers, and a high quality vessel probability map is produced using a CRF layer. We have demonstrated that our system produces state-of-the-art results on three publicly available datasets.

Notes

1.
Our results on all three datasets can be downloaded from http://hzfu.github.io/subpage/deepvessel/deepvessel.html.

References

Azzopardi, G., Strisciuglio, N., Vento, M., Petkov, N.: Trainable COSFIRE filters for vessel delineation with application to retinal images. Med. Image Anal. 19(1), 46–57 (2015)
Google Scholar
Farnell, D., Hatfield, F., Knox, P., Reakes, M., Spencer, S., Parry, D., Harding, S.P.: Enhancement of blood vessels in digital fundus photographs via the application of multiscale line operators. J. Franklin Inst. 345(7), 748–765 (2008)
MATH Google Scholar
Fraz, M., Remagnino, P., Hoppe, A., Uyyanonvara, B., Rudnicka, A., Owen, C., Barman, S.: An ensemble classification-based approach applied to retinal blood vessel segmentation. IEEE Trans. Biomed. Eng. 59(9), 2538–2548 (2012)
Google Scholar
Hoover, A., Kouznetsova, V., Goldbaum, M.: Locating blood vessels in retinal images by piecewise threshold probing of a matched filter response. IEEE Trans. Med. Imaging 19(3), 203–210 (2000)
Google Scholar
Krähenbühl, P., Koltun, V.: Efficient inference in fully connected CRFs with Gaussian edge potentials. In: Conference on Neural Information Processing Systems, pp. 109–117 (2011)
Google Scholar
Lee, C., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets. In: International Conference on Artificial Intelligence and Statistics (2015)
Google Scholar
Li, Q., Feng, B., Xie, L., Liang, P., Zhang, H., Wang, T.: A cross-modality learning approach for vessel segmentation in retinal images. IEEE Trans. Med. Imaging 35(1), 109–118 (2016)
Google Scholar
Marin, D., Aquino, A., Gegundez-Arias, M., Bravo, J.: A new supervised method for blood vessel segmentation in retinal images by using gray-level and moment invariants-based features. IEEE Trans. Med. Imaging 30(1), 146–158 (2011)
Google Scholar
Melinscak, M., Prentasic, P., Loncaric, S.: Retinal vessel segmentation using deep neural networks. In: International Conference on Computer Vision Theory and Applications, pp. 557–582 (2015)
Google Scholar
Mendonca, A., Campilho, A.: Segmentation of retinal blood vessels by combining the detection of centerlines and morphological reconstruction. IEEE Trans. Med. Imaging 25(9), 1200–1213 (2006)
Google Scholar
Nguyen, U., Bhuiyan, A., Park, L., Ramamohanarao, K.: An effective retinal blood vessel segmentation method using multi-scale line detection. Pattern Recogn. 46(3), 703–715 (2013)
Google Scholar
Orlando, J.I., Blaschko, M.: Learning fully-connected CRFs for blood vessel segmentation in retinal images. In: Golland, P., Hata, N., Barillot, C., Hornegger, J., Howe, R. (eds.) MICCAI 2014. LNCS, vol. 8673, pp. 634–641. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10404-1_79
Google Scholar
Otsu, N.: A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 9(1), 62–66 (1979)
MathSciNet Google Scholar
Roychowdhury, S., Koozekanani, D., Parhi, K.: Iterative vessel segmentation of fundus images. IEEE Trans. Biomed. Eng. 62(7), 1738–1749 (2015)
Google Scholar
Staal, J., Abràmoff, M., Niemeijer, M., Viergever, M., van Ginneken, B.: Ridge-based vessel segmentation in color images of the retina. IEEE Trans. Med. Imaging 23(4), 501–509 (2004)
Google Scholar
Xie, S., Tu, Z.: Holistically-nested edge detection. In: International Conference on Computer Vision, pp. 1395–1403 (2015)
Google Scholar
Zhao, Y., Wang, X., Wang, X., Shih, F.: Retinal vessels segmentation based on level set and region growing. Pattern Recogn. 47(7), 2437–2446 (2014)
Google Scholar
Zheng, S., Jayasumana, S., Romera-Paredes, B., Vineet, V., Su, Z., Du, D., Huang, C., Torr, P.: Conditional random fields as recurrent neural networks. In: International Conference on Computer Vision, pp. 1529–1537 (2015)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Infocomm Research, A*STAR, Singapore, Singapore
Huazhu Fu, Yanwu Xu, Damon Wing Kee Wong & Jiang Liu
Microsoft Research, Beijing, China
Stephen Lin
Cixi Institute of Biomedical Engineering, Ningbo Institute of Materials Technology and Engineering, Chinese Academy of Sciences, Ningbo, China
Jiang Liu

Authors

Huazhu Fu
View author publications
You can also search for this author in PubMed Google Scholar
Yanwu Xu
View author publications
You can also search for this author in PubMed Google Scholar
Stephen Lin
View author publications
You can also search for this author in PubMed Google Scholar
Damon Wing Kee Wong
View author publications
You can also search for this author in PubMed Google Scholar
Jiang Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Huazhu Fu .

Editor information

Editors and Affiliations

University College London , London, United Kingdom
Sebastien Ourselin
The Hebrew University of Jerusalem , Jerusalem, Israel
Leo Joskowicz
Harvard Medical School , Boston, Massachusetts, USA
Mert R. Sabuncu
Istanbul Technical University , Istanbul, Turkey
Gozde Unal
Harvard Medical School and Brigham and Women's Hospital, Boston, Massachusetts, USA
William Wells

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Fu, H., Xu, Y., Lin, S., Kee Wong, D.W., Liu, J. (2016). DeepVessel: Retinal Vessel Segmentation via Deep Learning and Conditional Random Field. In: Ourselin, S., Joskowicz, L., Sabuncu, M., Unal, G., Wells, W. (eds) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2016. MICCAI 2016. Lecture Notes in Computer Science(), vol 9901. Springer, Cham. https://doi.org/10.1007/978-3-319-46723-8_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-46723-8_16
Published: 02 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46722-1
Online ISBN: 978-3-319-46723-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

DeepVessel: Retinal Vessel Segmentation via Deep Learning and Conditional Random Field