Abstract
The lack of labeled training data is one of the major challenges in the era of big data and deep learning. Especially for large and complex images, the acquisition of expert annotations becomes infeasible and although many microscopy images contain repetitive and regular structures, manual annotation effort remains expensive. To this end, we propose an approach to obtain image slices and corresponding annotations for confocal microscopy images showing fluorescently labeled cell membranes in an automated and unsupervised manner. Due to their regular structure, cell membrane positions are modeled in silico and respective raw images are synthesized by generative deep learning approaches. The resulting synthesized data set is validated based on the authenticity of generated images and the utilizability for training an existing deep learning segmentation approach. We show, that segmentation accuracy nearly reaches state-of-the-art performance for fluorescently labeled cell membranes in A.thaliana, without the expense of manual labeling.
D. Eschweiler and T. Klose—Authors contributed equally to this work.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
In developmental biology, a large variety of cellular characteristics can be studied by analysis of cell shapes. Obtaining precise manual segmentations of cell membranes for detailed morphological analysis are a tedious task, due to large image sizes, proximity of cells and vanishing fluorescence intensities in deeper tissue layers. Poor or missing manual annotations limit the performance of learning-based segmentation approaches, especially in challenging image regions [3, 6], which could be partly improved by the incorporation of deep learning methods [2, 9]. However, in order to leverage the power of recent deep learning approaches and to train generalized models, large amounts of data are required. Reducing labeling expense and still obtaining enough annotations, is often accomplished by data augmentation [5] or sparse annotations [13]. Since augmentations have to remain in biological appropriate ranges, the amount and variety of data is limited and often can not account for generalized models. Transfer learning approaches overcome this issue by training models on large data sets from slightly related domains and only use small portions of labeled data from the target domain to fine-tune the model [5].
Both approaches, however, can not completely diminish the need for manual annotations. To achieve complete independence from manual interaction, data needs to be synthesized, which has been addressed for several biological experiments. Those approaches range from physical modeling of cells to generation of images based on classical image features [8, 10, 11]. More recently, generative adversarial approaches proved to achieve good results for data augmentations [5], as well as data generation [4, 7].
Although previous methods work well for cell synthesis, they can not straightforwardly be adapted to work for cell membranes, which comprise a fundamentally different appearance. We propose a method, which approximates the complex, densely connected membrane network, by combining randomly sampled points and Voronoi diagrams. Subsequently, generative deep learning models are used to translate the obtained membrane segmentation to the image domain.
The main contributions of this work are (1) a parametrized generation of membrane segmentations and (2) unsupervised translation of the generated annotations to the image domain, by (3) using structure-aware losses, which ensure matching membrane locations in the label and image domain. Furthermore, (4) the proposed method offers a way to generate complete segmentations, even in regions of low or vanishing fluorescence signals, which constitute the most challenging and time-consuming regions for manual annotators.
For validation, we used annotated 2D slices (Fig. 1) of microscopy images showing fluorescently labeled cell membranes in Arabidopsis thaliana [12] and generated three additional data sets with different levels of abstraction. For each data set, authenticity of generated images, as well as utilizability for training of an existing segmentation approach [2] were assessed.
(A) Cropped 2D slices from a 3D confocal image stack of fluorescently labeled cell membranes in A. thaliana and (B) the associated multi-instance segmentation [12].
2 Method
The proposed method follows a top-down approach by generating annotations first and subsequently synthesizing corresponding images. In this manner, the availability of complete annotations can be ensured and we are able to obtain images with arbitrary fluorescence intensity without losing information about true labels, even in regions of vanishing fluorescence signals. The proposed method can be divided into two sub-tasks, namely automated label generation and translation of labels to the image domain.
2.1 Label Generation
As shown in [2], generation of final instance segmentations can be improved by introducing an intermediate step that reformulates the instance segmentation problem as a 3-class semantic segmentation, dividing the image into background, membrane positions and cell centroids. This alternative representation describes annotations in a more general way and is utilized as baseline for synthesized annotations. As initial step, an arbitrary but plausible specimen shape is generated, dividing the image into foreground and background. Within the foreground region, rough cell locations are modeled by a predefined number of points \(n_{points}\) at randomly sampled positions. To prevent points from being too close to one another and generating unnatural changes of size among neighbouring cells, a k-means clustering approach is utilized. Clustering is performed for a predefined number of iterations \(n_{iter}\), which allows to further control uniformity of distances between cell centroids. Based on these resulting cell positions, a Voronoi diagram is constructed to partition the foreground region into separated instances, with each of them representing a single cell. Therefore, morphological cell appearance and final cell count is parametrized by the parameter tuple \((n_{points}, n_{iter}, k)\).
2.2 Image Synthesis
To transfer the generated labels into a realistic-looking image domain, a cyclic generative adversarial network (cycleGAN) is utilized, which allows to perform domain transfers without the need of paired examples [14]. The underlying framework contains two generator networks and two discriminator networks, which are trained in an adversarial way. Considering data from the label domain \(x_{L} \in \mathcal {X}_L\) and data from the image domain \(x_{I} \in \mathcal {X}_I\), the generators contribute two mappings \(G_{LI}: \mathcal {X}_L \mapsto \mathcal {X}_I\) and \(G_{IL}: \mathcal {X}_I \mapsto \mathcal {X}_L\). The discriminator \(D_L\) aims to discriminate between reference samples \(x_{L}\) and translated samples \(G_{IL}(x_{I})\), whereas discriminator \(D_I\) discriminates between reference samples \(x_{I}\) and translated samples \(G_{LI}(x_{L})\).
Network architectures for generators and discriminators are adapted from [14] and comprise a residual-based architecture for generators, which operate on input patches of size \(256\times 256\) pixel and PatchGANs are used as discriminators.
As introduced in [14], the framework is trained by optimizing different loss terms, namely the adversarial loss \(\mathcal {L}_{GAN}\), the cycle-consistency loss \(\mathcal {L}_{cyc}\) and the identity loss \(\mathcal {L}_{identity}\). For our approach, we rely on the original formulation of \(\mathcal {L}_{GAN}\) and \(\mathcal {L}_{identity}\), utilizing the L1 norm. We also keep the original formulation of \(\mathcal {L}_{cyc}\) for the image domain cycle \(\mathcal {X}_I \mapsto \mathcal {X}_L \mapsto \mathcal {X}_I\), but replace the cycle loss for the label domain \(\mathcal {X}_L \mapsto \mathcal {X}_I \mapsto \mathcal {X}_L\) by a distance-weighted loss, similar to [1]. This is motivated by the fact, that cell membranes are represented as thin lines and even slight offsets cause large jumps in the L1 loss term, which impedes the training process and results in inaccuracies between membrane positions in the image domain \(\mathcal {X}_I\) and label domain \(\mathcal {X}_L\). To encourage the generators to preserve exact correspondences between membrane positions, a distance map is generated based on original labels \(x_{L}\), being minimal at membrane positions and maximal at cell centers and in background regions. This distance map \(w_{dist}\) is utilized to weight the L1 distance between \(x_{L}\) and the translated label mask \(G_{IL}(G_{LI}(x_{L}))\), which formulates the cycle-consistency loss as
with \(p_{data}\) denoting the data distributions. Instead of using the 3-class representation as input for the generators, only the binary mask of membrane locations is utilized and background and centroid information are omitted. That way, the task of the generators \(G_{LI}\) and \(G_{IL}\) can be interpreted as a transformation between the reconstructed membrane signal and a degraded representation, captured by the microscope.
3 Experiments
Experiments are based on 2D slices from 3D confocal microscopy image stacks of A. thaliana [12], which serve as a baseline data set \(\mathcal {D}_{orig}\) for validation. Three additional data sets were generated by the proposed method, showing decreasing abstraction of structures and involving increasing priors of structural appearance. Details of each data set are specified in the following.
3.1 No Correspondence (\(\mathcal {D}_{naive}\))
Cell populations in the considered data set roughly resemble a circular structure (Fig. 1). To this end, the first generated data set naively mimics the specimens appearance by generation of a circularly shaped foreground region. Within the foreground region, cell instances are generated by the method described in Sect. 2.1 utilizing the parameter tuple \((n_{points}=4000, n_{iter}=100, k=20)\). Subsequently, the cycleGAN approach described in Sect. 2.2 translates the generated labels to the image domain.
3.2 Global Shape Correspondence (\(\mathcal {D}_{global}\))
To incorporate more accurate specimen shapes, more realistic foreground regions are estimated from original samples of the image domain \(\mathcal {X}_I\), by intensity thresholding and morphological hole filling. Within the foreground region, cell instances are generated as described in Sect. 2.1 utilizing the parameter tuple \((n_{points}=4000, n_{iter}=100, k=20)\) and subsequently images are synthesized by the cycleGAN approach.
3.3 Local Structure Correspondence (\(\mathcal {D}_{local}\))
To further validate our approach, another data set is generated, which skips the label generation step and solely relies on original labels \(x_L \in \mathcal {X}_L\). Corresponding images are generated by the cycleGAN approach, which allows to investigate the rate of errors induced by the domain translation. Note that the translation is still trained in an unsupervised fashion, since we do not rely on paired data.
4 Results
The public data set [12] comprises a total of 124 image stacks from 6 different plants with annotations obtained with an automatic method that was manually corrected. Due to the high correlation between neighbouring slices along the z-axis, for each plant we randomly select 200 2D slices at arbitrary z-locations, which reduced the data set \(\mathcal {D}_{orig}\) to a total of 1200 2D samples. Each generated data set \(\mathcal {D}_{naive}\), \(\mathcal {D}_{global}\) and \(\mathcal {D}_{local}\), therefore, likewise comprised 200 generated samples per plant.
For evaluation, a three-fold cross-validation was performed, subsequently utilizing four plants for training of the data generation and two for testing. First, the quality of generated images was assessed by considering two different similarity measurements. Second, the generated data sets were utilized for training of a segmentation approach [2] and final segmentation accuracies were compared to those obtained by only using manually annotated data.
Boxplots of SSIM and NCC calculated between fake images of each folds test set and the corresponding real images. Red lines indicate the median value and boxes extend from the first to the third quartile. Whiskers show the range of achieved values without considering outliers, which are represented as individual dots. (Color figure online)
Multi-class mask of each data set, color coding background in red, membrane positions in blue and centroids in green (first row). The second row shows corresponding images, which are, except for the original data set, generated by the cycleGAN trained on the respectively generated labels. (Color figure online)
Scores for segmentation of background, membrane and centroids, obtained by training on the respective domain. Red lines indicate the median value and boxes extend from the first to the third quartile. Whiskers show the range of achieved values without considering outliers, which are represented as individual dots. (Color figure online)
Right column: multi-class segmentation results for the approach proposed in [2], trained on the respective domain. Left column: membrane predictions are overlayed with the ground truth membrane segmentation (blue). Additionally, the raw image and the ground truth segmentation are shown in the first row. (Color figure online)
4.1 Image Quality Assessment
Similarity between real and fake data was evaluated by the structure similarity measurement (SSIM) and the normalized correlation coefficient (NCC). Following the scheme of the three-fold cross-validation, generated fake images of each test set were compared to the corresponding real image. As the quantitative results in Fig. 2 show, data generated from real labels exhibit the highest degree of similarity to real data. Small deviations from real labels lead to worse similarity scores, which is attributable to the missing correspondence between exact membrane positions in the real and generated label domain. Qualitative results depicted in Fig. 3, show visually appealing images for all generated data sets. Additionally, it becomes visible that unnatural mask shapes impede the learned correspondences between membrane positions in the label and image domain.
4.2 Training with Synthesized Data
Utilizability of the generated data was assessed by training the segmentation approach presented in [2], which was adapted to work for 2D data. To train more general models, data augmentation was incorporated into the training process, which included rotation, flipping, additive Gaussian noise and random intensity scaling in the range [0.5, 1]. For validation, one plant of each fold’s test set was utilized for training and the second plant for testing and vice versa. This way, it was ensured that overall five plants were used for training (four for data generation and one for segmentation) and the sixth plant was never seen during the training. Note that, computation of segmentation scores for the sixth plant, always considered real samples \(x_I \in \mathcal {X}_I\) and manual annotations \(x_L \in \mathcal {X}_L\), but no generated data.
Metrics were adapted from [2] and comprised the regular F1-score for background predictions, a boundary F1-score allowing a safety margin of two pixels around each membrane for membrane predictions and a local maximum-based detection accuracy for centroid detection. As a baseline, the segmentation approach was trained on \(\mathcal {D}_{orig}\), also deploying the policy to consider one plant for training and one for testing. Quantitative results obtained by utilizing the generated data sets for training are shown in Fig. 4, which analogously to the obtained similarity scores show increasing prediction accuracies of each class with utilization of more realistic data. This is also supported by qualitative results depicted in Fig. 5.
5 Conclusion
In this paper, an approach towards annotation-free segmentation of fluorescently labeled cell membranes was proposed. The concept for label generation demonstrated that even with small correspondences to the real image domain, plausible images could be generated. Training a segmentation approach with generated data disclosed that at least small correspondences between real and generated images had to be included, in order to obtain reliable segmentations. Especially for the naive approach, inaccurate correspondences between membrane positions in the label and image domain impeded the training process and resulted in vague segmentations. Although the loss was modified to penalize offsets of membrane positions in both domains, the offset increased with increasing abstraction of membrane labels. However, the trade-off between less accurate label predictions and the necessity for manual interaction has to be considered, since training without manual labels not only allows to create training data sets of arbitrary size, but additionally completely diminishes the need for tedious and time-consuming manual interactions. In general, the results are a promising first step and we plan to further improve the domain correspondences for the naive approach and to extend the concept to generating realistic 3D data.
References
Caliva, F., Iriondo, C., Martinez, A.M., Majumdar, S., Pedoia, V.: Distance map loss penalty term for semantic segmentation. In: International Conference on Medical Imaging with Deep Learning - Extended Abstract Track (2019)
Eschweiler, D., Spina, T.V., Choudhury, R.C., Meyerowitz, E., Cunha, A., Stegmaier, J.: CNN-based preprocessing to optimize watershed-based cell segmentation in 3D confocal microscopy images. In: Proceedings of the IEEE International Symposium on Biomedical Imaging, pp. 223–227 (2019)
Fernandez, R., et al.: Imaging plant growth in 4D: robust tissue reconstruction and lineaging at cell resolution. Nat. Methods 7, 547 (2010)
Goldsborough, P., Pawlowski, N., Caicedo, J.C., Singh, S., Carpenter, A.E.: CytoGAN: generative modeling of cell images. In: bioRxiv, p. 227645 (2017)
Majurski, M., et al.: Cell image segmentation using generative adversarial networks, transfer learning, and augmentations. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (2019)
Mosaliganti, K.R., Noche, R.R., Xiong, F., Swinburne, I.A., Megason, S.G.: ACME: automated cell morphology extractor for comprehensive reconstruction of cell membranes. PLoS Comput. Biol. 8, e1002780 (2012)
Osokin, A., Chessel, A., Carazo-Salas, R.E., Vaggi, F.: GANs for biological image synthesis. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2233–2242 (2017)
Stegmaier, J., et al.: Generating semi-synthetic validation benchmarks for embryomics. In: Proceedings of the IEEE International Symposium on Biomedical Imaging, pp. 684–688 (2016)
Stegmaier, J., et al.: Cell segmentation in 3D confocal images using supervoxel merge-forests with CNN-based hypothesis selection. In: Proceedings of the IEEE International Symposium on Biomedical Imaging, pp. 382–386 (2018)
Svoboda, D., Kozubek, M., Stejskal, S.: Generation of digital phantoms of cell nuclei and simulation of image formation in 3D image cytometry. Cytometry Part A 75A(6), 494–509 (2009)
Weigert, M., Subramanian, K., Bundschuh, S.T., Myers, E.W., Kreysing, M.: Biobeam - multiplexed wave-optical simulations of light-sheet microscopy. PLoS Comput. Biol. 14(4), e1006079 (2018)
Willis, L., et al.: Cell size and growth regulation in the Arabidopsis Thaliana apical stem cell niche. Proc. Natl. Acad. Sci. 113, 8238–8246 (2016)
Zhao, Z., Yang, L., Zheng, H., Guldner, I.H., Zhang, S., Chen, D.Z.: Deep learning based instance segmentation in 3D biomedical images using weak annotation. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11073, pp. 352–360. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00937-3_41
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Eschweiler, D., Klose, T., Müller-Fouarge, F.N., Kopaczka, M., Stegmaier, J. (2019). Towards Annotation-Free Segmentation of Fluorescently Labeled Cell Membranes in Confocal Microscopy Images. In: Burgos, N., Gooya, A., Svoboda, D. (eds) Simulation and Synthesis in Medical Imaging. SASHIMI 2019. Lecture Notes in Computer Science(), vol 11827. Springer, Cham. https://doi.org/10.1007/978-3-030-32778-1_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-32778-1_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-32777-4
Online ISBN: 978-3-030-32778-1
eBook Packages: Computer ScienceComputer Science (R0)