Topology Aware Fully Convolutional Networks for Histology Gland Segmentation

BenTaieb, Aïcha; Hamarneh, Ghassan

doi:10.1007/978-3-319-46723-8_53

Aïcha BenTaieb¹⁸ &
Ghassan Hamarneh¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9901))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

14k Accesses
62 Citations

Abstract

The recent success of deep learning techniques in classification and object detection tasks has been leveraged for segmentation tasks. However, a weakness of these deep segmentation models is their limited ability to encode high level shape priors, such as smoothness and preservation of complex interactions between object regions, which can result in implausible segmentations. In this work, by formulating and optimizing a new loss, we introduce the first deep network trained to encode geometric and topological priors of containment and detachment. Our results on the segmentation of histology glands from a dataset of 165 images demonstrate the advantage of our novel loss terms and show how our topology aware architecture outperforms competing methods by up to 10 % in both pixel-level accuracy and object-level Dice.

You have full access to this open access chapter, Download conference paper PDF

A Deep Model with Shape-Preserving Loss for Gland Instance Segmentation

DenseRes-Net: An Architecture for Gland Segmentation

Article 08 July 2023

Improving Nuclei/Gland Instance Segmentation in Histopathology Images by Full Resolution Neural Network and Spatial Constrained Loss

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Object segmentation, assigning semantic labels to pixels within an object, is a fundamental problem in medical image analysis. Reproducible classification or grading of adenocarcinomas benefits from accurate segmentation of epithelial glands from histology images [4, 6, 10]. Despite great advances in histology gland segmentation, many challenges remain. The complexity of glandular objects’ appearance, which correlates with the degree of cancer differentiation (e.g. high grade tumours present degenerated glands), and the high variability in histology image acquisition (i.e. microscope, lighting, and staining) accounts for two of the major challenges in histopathology gland segmentation [14].

Generally, state-of-the-art segmentation techniques benefit from incorporating prior knowledge about the target structures into the segmentation formulation [2, 11]. Recent gland segmentation methods, e.g. [5, 14], are no exception as they do encode gland geometrical priors into their formulation, namely that glands are smooth tubular structures, composed of a central area (lumen) surrounded by epithelial cells forming a nuclear boundary around the lumen (examples in Fig. 1-(a,b)). However, a limitation of these works is that they rely on hand-crafted features (often pixel-level color and texture cues) to detect each glandular component, which can be susceptible to biological and staining variation. To counteract these problems, existing works commonly resort to ad-hoc post-processing methods for false negative removal and object delineation.

The recent success of deep convolutional networks (CNN) for object recognition and classification tasks has been leveraged for segmentation, or pixel-level classification, through the introduction of fully convolutional networks (FCN) [8], in which all fully-connected layers of a standard classification CNN are converted into convolutional layers. FCNs have been proven capable of learning high-level complex hierarchies of descriptive and discriminative features useful for per-pixel predictions [8, 9, 12]. Models inspired by FCN architectures were successfully applied and adapted to various biomedical image segmentation applications [12].

Despite their success, FCN-based segmentations suffer from relying on a pixel-level prediction that is not designed to account for higher-order properties, such as boundary smoothness and the topological label interactions of multi-part objects (as in the lumen and epithelium of glands). Moreover, FCNs tend to produce low-resolution segmentations due to the subsampling resulting from stacked layers of convolutions and pooling. To overcome FCN’s limitations, different strategies have been explored to preserve object boundaries. One approach consists of adding trainable upsampling layers using deconvolution operations [8, 9]. While these layers are useful in reconstructing the input image size from coarser outputs, they only partially recover object boundaries. Other approaches attach a dense conditional random field (CRF) to the FCN, either as a post-processing step [1] or jointly trained with the FCN [15], in order to increase the sharpness of the output. However, both approaches require extra computational costs for optimizing the CRF and only specific graphical models can be integrated into the FCN learning pipeline. To the best of our knowledge, none of the existing works incorporates topology priors in the learning of FCNs.

In this work, we propose to encode smoothness of, and topological constraints between, segmented regions of spatially-recurring, multi-part objects (e.g. several glands, each with lumen and epithelium) into the learning of FCNs. Our aim is to train a deep network that produces topologically plausible, high-resolution segmentation output. Our strategy is to design a loss function with specific penalty terms that encode the desired boundary smoothness priors and hierarchical relationships between regions labels. In our specific application, the multi-region relations correspond to containment and exclusion properties observed between the smooth lumen and epithelial gland boundaries (Fig. 1-c,d,e).

Our proposed loss exploits the elegant graph formulation of hierarchical label relationships used in the context of image classification [3], and the popular energy-based multi-region labelling framework introduced by Delong and Boykov [2]. In contrast to these previous works, our formulation is specifically designed for object segmentation and pixel-level interactions in an end-to-end trainable deep network. Further, our formulation does not require post-hoc processing or additional heavy, test-time computational costs associated with the previously explored CRF optimization based approaches. Extensive experiments on the publicly available Warwick-QU dataset of histology colon glands and on different FCN architectures and training strategies (e.g. combining FCNs with CRFs) demonstrate the advantage of our method in learning more regularized deep networks for gland segmentation.

2 Method

Our goal is to incorporate topological priors: containment and exclusion, and geometrical prior: boundary smoothness, into the learning of deep fully convolutional networks. In the context of histology glands, there is a containment relation between lumen and epithelial boundary and an exclusion relation between stroma and all other regions (Fig. 1-(e)). We also know that a smooth epithelial boundary separates the lumen from the stroma (Fig. 1-(a,b)).

We train an FCN from a set of images and their corresponding ground truth segmentations, $\{(x^{(n)}, y^{(n)}); n=1,2,\ldots ,N\}$. We drop the superscript (n) when referring to any image x or segmentation y. The FCN’s prediction of y is denoted $y^*$. A (crisp) segmentation of a color image $x \in \mathcal {R}^{H\times W \times 3}$ assigns the p-th pixel $x_p$ in x a vector $y_p= ( y_p^1, y_p^2, ... ,y_p^{L} ) \in \{0,1\}^{L}$, where $y_p^r$ indicates whether pixel $x_p$ belongs to region r, and L is the number of region labels.

FCN’s per-pixel loss: Training an FCN for segmentation amounts to finding the network’s parameters $\theta $ that solves the following optimization:

$$\begin{aligned}&\theta ^* = \arg \min \limits _{\theta }{\sum \limits _{n=1}^N \mathcal {L}(x^{(n)}; \theta )}, \end{aligned}$$

(1)

$$\begin{aligned}&\mathcal {L}(x; \theta ) = \sum \limits _{p \in \varOmega } \sum \limits _{r=1}^L - y_p^r\log { P( y_p^r = 1| x_p; \theta ) }, P( y_p^r = 1 | x_p; \theta ) = \frac{\exp (a_r(x_p ) ) }{\sum \limits _{k=1}^{ L} \exp (a_k (x_p ) } \end{aligned}$$

(2)

where $\varOmega $ is the pixel space, $\mathcal {L}$ is the multinomial cross-entropy loss, and P are the class probabilities output of the softmax function of the FCN, which is based on $a_r(x_p)$, the output activation for region r and pixel p. $\mathcal {L}$ measures the compatibility between the predictions $P(y_p^r=1)$ and the corresponding ground truth $y_p^r$ for each pixel $x_p$ in the training dataset.

Multi-region interactions: We now modify (1) and (2), by introducing additional hierarchical relations between region labels and add a regularization term, and perform the following minimization of the new topology-aware loss:

$$\begin{aligned}&\theta ^* = \arg \min \limits _{\theta }{\sum \limits _{n=1}^N \alpha _1 \mathcal {L}_T(x^{(n)}; \theta ) + \alpha _2 \mathcal {L}_S( x^{(n)}; \theta ) } ; \end{aligned}$$

(3)

where $\mathcal {L}_T \text { and } \mathcal {L}_S $ refer, in order, to the pixel-level loss functions that encode the topological relations between labels and the smoothness constraints. We elaborate on the design of each term of the proposed loss below. Note that $\alpha _1 \text { and } \alpha _2$ are user-defined weights used to balance the contribution of each prior. We discuss the impact of these terms in the Experiments.

Hierarchical label relations: The goal here is to define $\mathcal {L}_T$ such that the network is trained, not only to penalize incorrect label assignment per pixel, but to also penalize incorrect label hierarchy. In gland segmentation, for example, the fact that region U (lumen) should be contained in region E (epithelium), not only requires $P(y_p^U = 1)$ to be high at a lumen pixel p but so should $P(y_p^E = 1)$ and $P(y_p^S = 0)$. In other words, the joint probability $P(y_p^S = 0, y_p^U = 1, y_p^E = 1)$ should be high. Given L labels or tissue classes, there are $2^L$ possible assignments per pixel (Fig. 1-f). Some of these assignments would be plausible, as they respect the label hierarchy imparted by the containment and exclusion priors, while others would not. Inspired by the strategy used in [3], which introduces a generic CRF-based approach for image classification with structured label relations, we define the following unary loss:

$$\begin{aligned} P(y_p | x_p; \theta ) = \frac{1}{Z} \prod \limits _{r=1}^{L} \exp {\big (a_r(x_p)\big )} \times y^r_p \times V(y_p) , \quad Z = \sum \limits _{r=1}^{L} P(y^r_p | x_p; \theta ) ; \end{aligned}$$

(4)

where P is the normalized joint probability for the label vector $y_p$, Z is the partition function, $a_r(x)$ is the FCN’s output prediction for region label r and $V(y_p) \in \{0, 1\}$ is a validity indicator function returning 1 if a given label vector $y_p$ corresponds to a topologically-valid assignment, and zero otherwise (see Fig. 1-(f)). The probability of a region r is computed by marginalizing all other region labels: $P(y_p^r = 1 |x_p; \theta ) = \sum \limits _{y_p:y_p^r=1} P(y_p | x_p; \theta )$.

Combined with a softmax loss, the hierarchical probabilities $P(y_p^r|x_p;\theta )$ form our first penalty term $\mathcal {L}_T$. Note that if all regions are mutually exclusive, $P(y_p^r=1|x_p;\theta )$ is equivalent to the softmax probability defined in (2).

Pairwise penalties: The goal here is to define $\mathcal {L}_S$ such that the network is trained to produce segmentations with smooth boundaries. We encode this geometrical property via a binary pairwise label interaction softmax loss:

$$\begin{aligned} \mathcal {L}_S(x; \theta ) = \sum \limits _{p \in \varOmega } \sum \limits _{r=1}^L \sum \limits _{ q\in \mathcal {N}^{p} } B_{p,q} \times y_p^r \left| P(y_p^r| x_p; \theta ) - P(y_q^r| x_q; \theta ) \right| ; B_{p,q} = \left\{ \begin{array}{ll} 1 \ \text {if} y_p^r = y_q^r\\ 0 \ \text {else} \end{array} \right. \end{aligned}$$

(5)

where $\mathcal {N}^{p}$ corresponds to the 4-connected neighborhood of pixel p. $\mathcal {L}_S$ trains the network to output regularized pairs of softmax label probabilities of neighbouring pixels p and q (i.e. having similar predicted probabilities) when ground truth pixel pairs belong to the same tissue label ($B_{p,q}=1$). At the same time, $\mathcal {L}_S$ trains the network to allow discontinuities across tissue boundaries ($B_{p,q}=1$).

Optimization and inference: The proposed loss is optimized using stochastic gradient descent. To infer the output predictions $y^*$ (e.g. a probability score for each region and each pixel), a simple forward pass through the trained network is required. Probabilities are computed following the label relations defined in (4). The final binary output segmentation $y^*$ corresponds to the region with maximum probability per pixel.

3 Experiments

The implementation of our proposed model was realized as a new loss layer in Caffe deep learning library [7] and can be used on top of any fully convolutional model given multi-region relations. Given the large input image size ($500\times 500$), we used a mini-batch size of 1 with a momentum of 0.99. The learning rate was tuned for each model on a validation set during training. We used the totality of the publicly available Warwick-QU colon adenocarcinoma dataset released as part of the GlaS Challenge [13], which consists of 85 training and 80 test images. In all experiments, we used 70 images for training, 15 for validation and 80 for test. We kept the training and test splits provided by the challenge organizers. We used a series of elastic (warping) and affine transformations (rotation, scaling, color shifts) to augment the training dataset by a factor of $\sim 150$. All models were trained on a NVIDIA 12 GB GPU card and training time ranged between 2 hours for relatively small models ($\sim $6 layers) and 36 hours for deeper models. Test times were $\sim $1 s/image for all models.

To test the advantage of adding topological priors in the learning of FCN, we compared the performance of four network architectures that implement different sampling strategies for border sharpening and with increasing number of layers, trained with vs. without including our multi-region priors $\mathcal {L}_T \text { and } \mathcal {L}_S$: (i) Alexnet-FCN and (ii) FCN-8s (with a stride of 8) [8] use a simple bilinear interpolation for upsampling; (iii) U-Net [12] includes bridge-like layers between coarser layers’ outputs and finer layers; whereas (iv) DN [9] uses deconvolution layers as upsampling strategy.

We used two evaluation metrics: (1) pixel-level accuracy and (2) object-level Dice similarity coefficient (Fig. 2). Our results show that, for the same optimizer and the same network complexity, using our proposed loss yields an average improvement of 9 to 15 % in correctly labelling pixels and 3 to 5 % in delineating glands.

We tested the robustness of our results to the hyper-parameters in (3). We used a validation set to tune these parameters and found that regardless of the model’s architecture using equally weighted penalty terms generally gave us best results. We observed a minimal change in pixel accuracy and object Dice (less than $1e^{-4}$) when varying the difference between $\alpha _1 \text { and } \alpha _2$ by $\pm 20\,\%$.

We also compared our method with the winner of the GlaS Challenge [13], CuMedVision2, which also used a FCN-based model with a special upsampling strategy. Note that winners’ model architecture was not released and only the number of pooling layers were reported [13]. For fair comparison we report results with FCN-8s that has similar number of pooling layers. Using our topology-aware loss with FCN-8s architecture, we outperformed the reported results of CuMedVision2 by 18 % for F1 score, 3 % for object Dice but CuMedVision2 surpassed our approach by 12 % in terms of Hausforff distance.

Table 1. Penalty terms vs. graphical models. +Smoothness refers to adding $\mathcal {L}_S$ in the FCN-32s training. +Smoothness+Topology refers to $\mathcal {L}_T +\mathcal {L}_S $.

Full size table

To compare applying the proposed loss penalties vs. graphical models, we test the performance of FCN-32s (with a stride of 32) trained with $\mathcal {L}_T+\mathcal {L}_S$ with: (a) the original FCN-32s model that optimizes per-pixel loss ($\mathcal {L}$), and with two methods that refine FCN’s segmentation by incorporating a probabilistic graphical model optimization: (b) DeepLab [1], which uses a special fully-connected CRF, where the pairwise terms depend on pixels positions and color intensities as a post-processing step, and (c) CFR-RNN [15], where the same CRF model is jointly trained with the FCN. In DeepLab and CRF-RNN, the CRF energy function is optimized using iterations of the mean field approximation.

As shown in Table 1, using our additional smoothness and topology priors in the training of FCN-32s, our model achieves 13 to 38 % higher object Dice compared to the original FCN-32s, DeepLab or CRF-RNN. It is also worth pointing out that our proposed method does not incur any additional computational cost during inference, contrarily to DeepLab and CRF-RNN.

It is worth noting that DeepLab and CRF-RNN degrade the performance of the original FCN-32s model. This initially surprising result may be explained by the fact that the special CRF model used in DeepLab and CRF-RNN includes image (color)-based pairwise terms in their energy functions, which are sensitive to stain variations among glands and between stroma and glands.

Finally, we observe that adding our topology priors result in an increase in Dice by 10 % over FCN-32s (0.70 % to 0.80 %) despite a smaller decrease of 4 % in pixel accuracy (0.80 % to 0.76 %). This implies that the additional priors are critical for the detection of individual glands, particularly due to how the topology prior encodes relevant object-level (i.e. beyond pixel-level) information during training. Qualitative results are presented in Fig. 3. Adding topology penalties generally resulted in smoother boundaries and individually segmented glands. However, it did not fully compensate for the loss of fine-grained details resulting from upsampling the probabilities in some very challenging cases where glands’ boundaries are extremely thin.

4 Conclusion

We hypothesized that the inclusion of prior knowledge in the training of deep fully convolutional networks for the segmentation of histology glands can result in more accurate segmentations. To test our hypothesis, we presented a novel loss function inspired by energy-based models for multi-region labelling and adapted for deep networks. Our findings show that our approach yields significantly more accurate and plausible segmentations while being more computationally efficient at test-time. We plan to further investigate the effect of equipping deep learning models with relevant prior knowledge for training more regularized networks on different medical segmentation applications.

References

Chen, L.C., et al.: Semantic image segmentation with deep convolutional nets and fully connected CRFs (2014). arXiv:1412.7062
Delong, A., Boykov, Y.: Globally optimal segmentation of multi-region objects. In: ICCV, pp. 285–292 (2009)
Google Scholar
Deng, J., Ding, N., Jia, Y., Frome, A., Murphy, K., Bengio, S., Li, Y., Neven, H., Adam, H.: Large-scale object classification using label relation graphs. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 48–64. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10590-1_4
Google Scholar
Elston, C.W., Ellis, I.O.: Pathological prognostic factors in breast cancer. the value of histological grade in breast cancer: experience from a large study with long-term follow-up. Histopathology 19(5), 403–410 (1991)
Article Google Scholar
Gunduz-Demir, C., Kandemir, M., Tosun, A.B., Sokmensuer, C.: Automatic segmentation of colon glands using object-graphs. MedIA 14(1), 1–12 (2010)
Google Scholar
Humphrey, P.A.: Gleason grading and prognostic factors in carcinoma of the prostate. Mod. Pathol. 17(3), 292–306 (2004)
Article Google Scholar
Jia, Y., et al.: Caffe: Convolutional architecture for fast feature embedding. In: ACM Multimedia, pp. 675–678 (2014)
Google Scholar
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR, pp. 3431–3440 (2015)
Google Scholar
Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: CVPR, pp. 1520–1528 (2015)
Google Scholar
Nosrati, M.S., Andrews, S., Hamarneh, G.: Bounded labeling function for global segmentation of multi-part objects with geometric constraints. In: IEEE ICCV, pp. 2032–2039 (2013)
Google Scholar
Nosrati, M.S., Hamarneh, G.: Local optimization based segmentation of spatially-recurring, multi-region objects with part configuration constraints. TMI 33(9), 1845–1859 (2014)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Sirinukunwattana, K., et al.: Gland segmentation in colon histology images: The GlaS challenge contest (2016). arXiv:1603.00275
Sirinukunwattana, K., Snead, D., Rajpoot, N.M.: A stochastic polygons model for glandular structures in colon histology images. TMI 34(11), 2366–2378 (2015)
Google Scholar
Zheng, S., et al.: Conditional random fields as recurrent neural networks. In: ICCV, pp. 1529–1537 (2015)
Google Scholar

Download references

Acknowledgments

We gratefully acknowledge NSERC for funding and NVIDIA Corporation for GPU donation.

Author information

Authors and Affiliations

Medical Image Analysis Lab, Simon Fraser University, Burnaby, Canada
Aïcha BenTaieb & Ghassan Hamarneh

Authors

Aïcha BenTaieb
View author publications
You can also search for this author in PubMed Google Scholar
Ghassan Hamarneh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aïcha BenTaieb .

Editor information

Editors and Affiliations

University College London , London, United Kingdom
Sebastien Ourselin
The Hebrew University of Jerusalem , Jerusalem, Israel
Leo Joskowicz
Harvard Medical School , Boston, Massachusetts, USA
Mert R. Sabuncu
Istanbul Technical University , Istanbul, Turkey
Gozde Unal
Harvard Medical School and Brigham and Women's Hospital, Boston, Massachusetts, USA
William Wells

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

BenTaieb, A., Hamarneh, G. (2016). Topology Aware Fully Convolutional Networks for Histology Gland Segmentation. In: Ourselin, S., Joskowicz, L., Sabuncu, M., Unal, G., Wells, W. (eds) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2016. MICCAI 2016. Lecture Notes in Computer Science(), vol 9901. Springer, Cham. https://doi.org/10.1007/978-3-319-46723-8_53

Download citation

DOI: https://doi.org/10.1007/978-3-319-46723-8_53
Published: 02 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46722-1
Online ISBN: 978-3-319-46723-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

Topology Aware Fully Convolutional Networks for Histology Gland Segmentation

Abstract

Similar content being viewed by others

A Deep Model with Shape-Preserving Loss for Gland Instance Segmentation

DenseRes-Net: An Architecture for Gland Segmentation

Improving Nuclei/Gland Instance Segmentation in Histopathology Images by Full Resolution Neural Network and Spatial Constrained Loss

Keywords

1 Introduction

2 Method

3 Experiments

4 Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Topology Aware Fully Convolutional Networks for Histology Gland Segmentation

Abstract

Similar content being viewed by others

A Deep Model with Shape-Preserving Loss for Gland Instance Segmentation

DenseRes-Net: An Architecture for Gland Segmentation

Improving Nuclei/Gland Instance Segmentation in Histopathology Images by Full Resolution Neural Network and Spatial Constrained Loss

Keywords

1 Introduction

2 Method

3 Experiments

4 Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation