Keywords

1 Motivation

The digitization of histopathological tissue by means of whole slide scanners results in highly resolved whole slide images (WSIs) exhibiting resolutions in the range of gigapixels. As typically series of these huge images need to be investigated, manual segmentation for means of tissue analysis is extremely time consuming, urging for automated segmentation approaches. In this work, we consider the task of segmenting pathological tissue showing variable levels-of-progression (LoP) leading to clear changes in the tissue morphology and texture (Fig. 1). This application scenario plays a vital role in assessing treatments for various diseases, such as renal or liver diseases as well as various kinds of cancer.

Fig. 1.
figure 1

Progression of a specific disease (unilateral uretral obstruction), showing healthy tissue (left) and clearly visible changes after 21 days (right).

Fig. 2.
figure 2

Magnifications showing variability of glomerular appearances.

The main challenge in segmenting WSIs showing pathological changes is given by the fact that the image characteristics can strongly vary for different pathologies and that the LoP, i.e. the degree of change, for an individual pathology also strongly varies. Although it can be assumed that state-of-the-art machine learning approaches, such as fully-convolutional networks (FCNs) [2, 10], can learn this variability if huge annotated training data is available, for practical reasons a fully-supervised learning scenario is not feasible here. This demands for domain adaptation which can be utilized to adapt a model, trained on healthy data, for a specific combination of a disease and an LoP. If traditional domain adaptation should be utilized, adaptation would be required individually for each LoP, which is potentially computationally complex, but still feasible. Such an approach, however, only incorporates specific data for each adaptation step (source domain data and data of the domain to adapt) and does not make use of the knowledge, that disease progression is a gradual process. For research purpose in life sciences, often series of WSIs are prepared where also information on the LoP is known and can be used without any further manual annotation effort. We focus on developing a so-called “gradual” domain adaptation framework exploiting this prior knowledge and incorporating all available unlabeled data as well as the information of the LoP (Fig. 2).

Related Work: CNNs were effectively applied to histopathological image analysis applications. Classifications of WSIs with different targets were performed in [1, 7]. CNNs also yielded excellent performance in segmenting and classifying histopathological image data [2, 12]. For segmentation, variations of fully-convolutional networks were applied in the field of biomedical image analysis [2, 10] especially due to their high efficiency compared to sliding-window segmentation. Due to high manual effort required for annotations, domain adaptation for semantic segmentation has become a hot topic [4, 13]. Recently, especially completely unsupervised approaches have become highly popular. In [4] domain adaptation was performed by means of generative adversarial networks on image-level which means that images are converted between two domains. This requires that the morphological structure is not changed significantly between the domains which is, often not the case in histopathological images (Fig. 1). Other approaches rely on adversarial networks [9] or on curriculum learning [13] to adjust dissimilarities between domains in a deeper representation. In [5] unsupervised domain adaptation based on retraining was performed by incrementally switching from the source domain to the target domain.

Contributions: In this work, we solve the task of segmenting WSIs showing variable LoPs by means of a novel gradual adaptation framework. The domain shift addressed here is dissimilar from what we are typically used to. Instead of nominal domain classes, we consider healthy data as well as variable LoPs leading to an ordinal domain variable. To exploit this specific knowledge of the domain shift and to tackle the challenge of high variability in histopathological image data, we propose a gradual unsupervised domain adaptation framework to slowly learn structural changes affected by the individual stages of pathologies. To this end, a model trained on healthy data is iteratively adjusted to pathological data where the LoP is increased after each step. For adaptation between consecutive LoPs, we rely on an established model proposed in [5] (which could be easily replaced by other approaches).

Instead of the typical nominal domain shift (i.e. adapt between domain A and domain B), a “gradual”, continuous shift between two domains is considered which has, to the best of our knowledge, not been investigated before. The proposed framework could also be interpreted as an extension of the incremental method [5] by making use of intermediate domains. Besides the overall pipeline, the contribution consists of a novel adaptation control stage as well as a domain gap assessment stage which are introduced in order to prevent misdirected adaptation. The pipeline is evaluated on three datasets corresponding to different renal pathologies.

Fig. 3.
figure 3

Based on a pretrained model, domain gap estimation is conducted to assess the difference in distribution between consecutive LoPs. Then, after an initial segmentation, a selection of reliably segmented patches is performed (adaptation control) which are added to the adaptation dataset used for fine tuning. Whereas intra-level adaptation is performed by incrementally adding current target domain (\(T_i\)) data, inter-level adaptation is achieved by changing from domain \(T_i\) to domain \(T_{i+1}\).

2 Gradual Domain Adaptation Pipeline

In the following, healthy data is referred to as source domain data \(\mathcal {S}\), pathological data is referred to as target domain data \(\mathcal {T}\). This target domain data is partitioned into several LoPs \(\mathcal {T}_{1}, \ldots , \mathcal {T}_{n}\) exhibiting an ordinal scale (i.e. \(\forall i, 0<i<n : \mathcal {T}_{i+1}\) is more affected than \(\mathcal {T}_{i}\)). As a first step, a fully-convolutional network is pretrained on source domain data resulting in a pretrained model \(\mathcal {M}_{S}\) (Fig. 3). In a next step, the domain-gap between the source and the first level-of-pathology is assessed to estimate the number of iterations during adaptation \(n_{adapt}\) (Sect. 2.1). Based on this source domain model \(\mathcal {M}_{S}\), initial pathological WSI segmentations are then acquired by segmenting extracted patches from \(T_{1}\) with \(\mathcal {M}_{S}\). Afterwards, adaptation control is performed to prevent the network from adapting into a wrong direction (Sect. 2.2). Based on this adaptation control stage, a random subset is generated based on patches containing source domain data as well as selected patches from current and previous domain with their corresponding segmentation output (which is the new training ground-truth) similar to [5]. This data is utilized for adapting \(\mathcal {M}_{S}\) to the domain \(\mathcal {T}_{1}\). As in [11] all parameters of the pretrained network are enabled to be fine-tuned to the new domain. Subsequently to this first adaptation step, suitability of the training set is iteratively adapted by reperforming segmentation of all extracted patches with the updated model and constructing a new training set based on the improved segmentations (Fig. 3, intra-level adaptation). Training and updating steps are repeated for a specified number of adaptation iterations \(n_{adapt}\), until the model is declared to be fully adapted to the current pathological stage and constitutes the pathological model \(\mathcal {M}_{T1}\). For adaptation to the successive target domain \(\mathcal {T}_{2}\) (, \(\ldots \mathcal {T}_{n}\)), the previous pathological model \(\mathcal {M}_{T1}\) (, \(\ldots \mathcal {M}_{T(n-1)}\)) is treated as source model and the previously mentioned concept is applied recursively (Fig. 3, inter-level adaptation).

2.1 Domain Gap Estimation (DGE)

During training, WSIs are arranged in progressive order, which facilitates the gradual adaptation process and enables the investigation of gaps between successive domains. The fact that the severity of theses gaps varies, suggests to adjust the number of adaptation iterations \(n_{adapt}\) according to this domain gap, since adaptation to small gaps requires less effort than learning the changes induced by severe gaps. To take this into account, we train a separate network that distinguishes between the two consecutive domains. One set of patches is selected for training and another (distinct) set is selected for evaluation. The domain gap is finally represented by the \(d_{gap} = 1 - c_{acc}\), where \(c_{acc}\) is the classification accuracy. Low accuracies indicate small domain differences as the network can not sufficiently learn, whereas high accuracies imply severe changes, which can easily be identified. This measure is combined with a weight \(w_{gap}\) and utilized to adjust the relative increment of current domain data \(r_{adapt}= w_{gap} \cdot d_{gap}\) and finally also to set the number of adaptation iterations \(n_{adapt} = \frac{1-r_{source}}{r_{adapt}}\).

2.2 Adaptation Control (AC)

Since adaptation follows a completely unsupervised concept, generation of target domain labels \(Y_{T}\) induces the risk of strong discrepancies between obtained labels and ground-truth data and more specifically preservation of the glomerulus segmentation objective can not be guaranteed. To anchor the actual segmentation task, firstly, source data and corresponding ground-truth segmentations, utilized for source model training, are incorporated into the training set. The training set is split into a source domain portion \(r_{S}\), a target domain portion \(r_{T_i}\) and a previous domain portion \(r_{T_{i-1}}\) where \(r_{S}\) is fixed, \(r_{T_i} = j \cdot r_{adapt}\) (j is the counter during intra-level adaptation, Fig. 3) and \(r_{T_{i-1}} = 1 - r_{S} - r_{T_i}\).

To measure the suitability of segmentations to be considered for adaptation, two metrics are introduced. The first measure exploits the architecture of the proposed pipeline by assessing changes after each adaptation iteration. Specifically, the pixelwise distance of the segmentations is considered

$$\begin{aligned} \delta _{segm} = \frac{1}{\sum _{m,n}y_{T}(m,n)}\sum _{i,j} | y_{T,previous}(i,j)-y_{T}(i,j) | \end{aligned}$$
(1)

with y describing single segmentation masks from the previous and current domain respectively and (ij), (mn) defining the two dimensional pixel position. A high measure (i.e. large discrepancies between the previous and current state) show potential for further adjustment, whereas a low measure demonstrates the network’s confidence. As a second metric, the plausibility of the shape of each segmentation is evaluated. For the considered application scenario (where objects show near-circular shapes even in case of pathologies) the pixelwise distance between the segmentation mask and a mask \(y_{shape}\), containing the smallest circles enclosing each individual object, is measured:

$$\begin{aligned} \delta _{shape} = \frac{1}{\sum _{m,n}y_{T}(m,n)}\sum _{i,j}y_{circ}(i,j)-y_{T}(i,j) \end{aligned}$$
(2)

The weighted (\(w_{circ}\)) combination of both values constitutes the cost function \(c_{patch} = \delta _{segm} + w_{shape} \delta _{shape}\). The cost value or rather the adaptation potential value of each patch is considered in training set construction by omitting patches with costs exceeding the threshold \(c_{thresh}\). This analysis is performed individually for each iteration step as outlined in Fig. 3.

3 Experiments

As segmentation model, we utilize the U-Net [10] which combines up-sampled features from deep, coarse layers with those from shallow, fine layers and exhibits excellent performance in biomedical applications. For domain discrimination, an architecture similar to the GAN discriminator model in [8] is used comprising six blocks of \(5\times 5\) convolutions with valid border mode and stride 2, followed by a Leaky-Relu with \(\alpha =0.2\) and a dropout layer with a rate of 0.4.

The image datasets comprise from PAS stained WSIs of mouse kidneys. Slides are fixed in methyl Carnoy’s solution and embedded in paraffin, as stated in detail in [3]. Images are captured by the whole slide scanner model C9600-12, by Hamamatsu with an 40x objective lens. Evaluation is performed on three different pathological datasets corresponding to the diseases unilateral uretral obstruction (UUO), ischemia reperfusion (IR) and nephrotoxic nephritis (NTN). All datasets comprise a staged progression of pathological alterations, ranging from moderate to severe structural changes (9 stages for UUO, 10 for IR and 4 for NTN). Whereas for UUO and IR the LoP is given by the expired period after induction of the disease, a visual rating was applied for NTN as the outbreak of disease varies strongly in this case. For a total of 86 WSIs, including 26 healthy and 60 pathological cases (UUO: 37, IR: 19, NTN: 4 (between 1 and 3 WSIs for each LoP)), manual ground-truth annotations are acquired in cooperation with a pathological expert, providing about 9300 annotated glomeruli.

In order to facilitate unbiased evaluation, the experiments are repeated three times and finally the means are reported. For source domain training, we extract 2,000 patches with a size of \(492\,\times \,492\) from 26 training WSIs. 80% of source domain data are used for training and 20% are utilized for evaluation. For domain adaptation, 140 patches for each LoP and for each individual pathology are employed for training and further 28 are used for evaluation. The domain gap estimator network is therefore trained with 112 and evaluated with further 28 patches with a size of \(492\times 492\) pixels. The parameters \(r_{S}\) and \(w_{gap}\) are evaluated via grid search. To obtain unbiased results, optimization for one distinct dataset (which correspond to one pathology) is performed by searching for the best combination in case of the other two (distinct) datasets (see Sect. 3.1). Due to the enormous required computation effort for optimization, \(w_{shape}\) is fixed to 1 which means that \(\delta _{segm}\) and \(\delta _{shape}\) equally contribute during adaptation control. For training the networks, batch size is set to 1 and L2-regularization is utilized [6]. All networks are trained using cross-entropy as a loss function and Adam as an optimizer with learning rate \(\nu = 10^{-5}\) [6]. Initial training is performed for 100 epochs on the source domain data [6]. Each adaptation step consists of further 5 epochs.

3.1 Results

The overall segmentation performance (mean over three random splits) for the three datasets UUO, IR and NTN obtained with the proposed approach are provided in Fig. 4. We report the Dice similarity coefficient (DSC) individually for each LoP. The horizontal axis for UUO and IR is scaled according to the LoP in days exhibiting an interval scale (and we expected more important information in the higher LoP range). For the ordinal scale in case of NTN a scaling is not reasonable. For healthy data (not shown in Fig. 4), we obtain a DSC of 0.92 (±0.01) with mean precision 0.93 and recall 0.91. In almost each configuration, source domain training and traditional domain adaptation (i.e. without gradual adaptation) is outperformed by the novel approach. Similarly, the reference settings without AC and without DGE are exceeded in the majority of cases. The most significant improvements are obtained in case of the NTN (improve of more than 6% for LoP 1, 3 and 4), which shows the most difficult application scenario. Evaluation took on average 135 s per WSI (NVIDIA TITAN X GPU).

Fig. 4.
figure 4

Segmentation performance (Dice similarity coefficient (DSC)) of the proposed gradual adaptation approach in comparison to traditional adaptation, source domain training, the proposed approach without DGE and the proposed approach without AC.

4 Discussion

In this work, we propose and investigate unsupervised “gradual” domain adaptation. The results show that the proposed framework outperforms domain adaptation without gradual refinement. Increased DSCs are not only obtained for late LoPs, but over the whole range and also for early stages (especially in case of IR and NTN). This is impressive because tissue changes due to disease progression in the early days is clearly more subtle compared to the late stages where a significant improvement was expected. The advantage obtained by applying adaptation control is on average most distinct for IR. For NTN, we do not notice a positive overall effect, which is supposed to be due to the fact that the morphology of the object in case of this pathology slightly changes and thereby the circular constraint could be inappropriate. Domain gap estimation leads to slight improvements in case of UUO and IR but, again, not in case of NTN. This could be attributed to the higher difficulty of the application scenario. Thus, it is not necessarily advantageous to increase the number of training cycles in case of stronger domain shifts. The different outcome in case of NTN is supposed to be also due to the visual rating for partitioning into the LoPs. Especially the domain gap estimation is obviously less important as the data is visually presorted.

To conclude, we reinterpreted a specific biomedical domain adaptation scenario and proposed and investigated an unsupervised gradual domain adaptation framework which exploits the knowledge that disease progression is a gradual process. The proposed method (including the proposed regularization stages to prevent misdirected adaptation) exhibited excellent performances and outperformed conventional domain adaptation not incorporating the gradual process of disease progression. Evaluation was performed for three different pathologies in renal histopathology, however, we expect that the method could be effectively applied to other application scenarios as well and if needed components (e.g. the domain adaptation method) can be easily adjusted or exchanged.