Abstract
Multimodal imaging and correlative analysis typically require image alignment. Contrastive learning can generate representations of multimodal images, reducing the challenging task of multimodal image registration to a monomodal one. Previously, additional supervision on intermediate layers in contrastive learning has improved biomedical image classification. We evaluate if a similar approach improves representations learned for registration to boost registration performance. We explore three approaches to add contrastive supervision to the latent features of the bottleneck layer in the U-Nets encoding the multimodal images and evaluate three different critic functions. Our results show that representations learned without additional supervision on latent features perform best in the downstream task of registration on two public biomedical datasets. We investigate the performance drop by exploiting recent insights in contrastive learning in classification and self-supervised learning. We visualize the spatial relations of the learned representations by means of multidimensional scaling, and show that additional supervision on the bottleneck layer can lead to partial dimensional collapse of the intermediate embedding space.
Supported by VINNOVA (projects 2017-02447, 2020-03611, 2021-01420) and the Centre for Interdisciplinary Mathematics (CIM), Uppsala University.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amiri, M., Brooks, R., Rivaz, H.: Fine-tuning U-Net for ultrasound image segmentation: different layers, different outcomes. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 67(12), 2510–2518 (2020). https://doi.org/10.1109/TUFFC.2020.3015081
Borg, I., Groenen, P.J.F.: Modern Multidimensional Scaling - Theory and Applications. Springer, New York (2005). https://doi.org/10.1007/0-387-28981-X
Chen, Z., Wei, J., Li, R.: Unsupervised multi-modal medical image registration via discriminator-free image-to-image translation (2022). https://doi.org/10.48550/ARXIV.2204.13656
Chi, Z., et al.: On the representation collapse of sparse mixture of experts (2022). https://doi.org/10.48550/ARXIV.2204.09179
Choi, Y., Uh, Y., Yoo, J., Ha, J.W.: StarGAN v2: diverse image synthesis for multiple domains. In: CVPR, pp. 8185–8194 (2020). https://doi.org/10.1109/CVPR42600.2020.00821
Cosentino, R., et al.: Toward a geometrical understanding of self-supervised contrastive learning (2022). https://doi.org/10.48550/ARXIV.2205.06926
Dey, N., Schlemper, J., Salehi, S.S.M., Zhou, B., Gerig, G., Sofka, M.: ContraReg: contrastive learning of multi-modality unsupervised deformable image registration (2022). https://doi.org/10.48550/ARXIV.2206.13434
Eliceiri, K., Li, B., Keikhosravi, A.: Multimodal Biomedical Dataset for Evaluating Registration Methods (patches from TMA Cores), June 2020. https://doi.org/10.5281/zenodo.3874362
Eliceiri, K., Li, B., Keikhosravi, A.: Multimodal biomedical dataset for evaluating registration methods (full-size TMA cores), February 2021. https://doi.org/10.5281/zenodo.4550300
En, S., Lechervy, A., Jurie, F.: TS-NET: combining modality specific and common features for multimodal patch matching. In: ICIP, pp. 3024–3028 (2018). https://doi.org/10.1109/ICIP.2018.8451804
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981). https://doi.org/10.1145/358669.358692
Gutmann, M., Hyvärinen, A.: Noise-contrastive estimation: a new estimation principle for unnormalized statistical models. In: Proceedings of the International Conference on Artificial Intelligence and Statistics, vol. 9, pp. 297–304. PMLR (2010)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: NeurIPS, vol. 30 (2017). Proceedings.neurips.cc/paper/2017/file/8a1d694707eb0fefe65871369074926d-Paper.pdf
Hjelm, R.D., et al.: Learning deep representations by mutual information estimation and maximization. In: ICLR (2019). https://openreview.net/forum?id=Bklr3j0cKX
Hua, T., Wang, W., Xue, Z., Ren, S., Wang, Y., Zhao, H.: On feature decorrelation in self-supervised learning. In: ICCV, pp. 9598–9608, October 2021
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR (2017)
Jégou, S., Drozdzal, M., Vazquez, D., Romero, A., Bengio, Y.: The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation. In: CVPR Workshops, pp. 11–19 (2017)
Jing, L., Vincent, P., LeCun, Y., Tian, Y.: Understanding dimensional collapse in contrastive self-supervised learning. arXiv preprint arXiv:2110.09348 (2021)
Kaku, A., Upadhya, S., Razavian, N.: Intermediate layers matter in momentum contrastive self supervised learning. In: NeurIPS, vol. 34, pp. 24063–24074 (2021). https://Proceedings.neurips.cc/paper/2021/file/c9f06258da6455f5bf50c5b9260efeff-Paper.pdf
Kang, S., Uchida, S., Iwana, B.K.: Tunable U-Net: controlling image-to-image outputs using a tunable scalar value. IEEE Access 9, 103279–103290 (2021). https://doi.org/10.1109/ACCESS.2021.3096530
Lee, H.Y., et al.: DRIT++: diverse image-to-image translation via disentangled representations. Int. J. Comput. Vis. 128, 2402–2417 (2020). https://doi.org/10.1007/s11263-019-01284-z
Li, A.C., Efros, A.A., Pathak, D.: Understanding collapse in non-contrastive Siamese representation learning. In: Avidan, S., Brostow, G., Cisse, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol. 13691, pp. 490–505. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19821-2_28
Li, S., Tso, G.K., He, K.: Bottleneck feature supervised U-Net for pixel-wise liver and tumor segmentation. Expert Syst. Appl. 145, 113131 (2020). https://doi.org/10.1016/j.eswa.2019.113131
Lindblad, J., Sladoje, N.: Linear time distances between fuzzy sets with applications to pattern matching and classification. TIP 23(1), 126–136 (2014). https://doi.org/10.1109/TIP.2013.2286904
Lu, J., Öfverstedt, J., Lindblad, J., Sladoje, N.: Is image-to-image translation the panacea for multimodal image registration? A comparative study. PLOS ONE 17(11), 1–33 (2022). https://doi.org/10.1371/journal.pone.0276196
Lu, J., Öfverstedt, J., Lindblad, J., Sladoje, N.: Datasets for Evaluation of Multimodal Image Registration, April 2021. https://doi.org/10.5281/zenodo.5557568
Morozov, S., Voynov, A., Babenko, A.: On self-supervised image representations for GAN evaluation. In: ICLR (2021). https://openreview.net/forum?id=NeRdBeTionN
Öfverstedt, J., Lindblad, J., Sladoje, N.: Fast and robust symmetric image registration based on distances combining intensity and spatial information. TIP 28(7), 3584–3597 (2019). https://doi.org/10.1109/TIP.2019.2899947
Öfverstedt, J., Lindblad, J., Sladoje, N.: Fast computation of mutual information in the frequency domain with applications to global multimodal image alignment. Pattern Recogn. Lett. 159, 196–203 (2022). https://doi.org/10.1016/j.patrec.2022.05.022
van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. CoRR (2018). http://arxiv.org/abs/1807.03748
Oord, A.v.d., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)
Park, T., Efros, A.A., Zhang, R., Zhu, J.-Y.: Contrastive learning for unpaired image-to-image translation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 319–345. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_19
Pielawski, N., et al.: CoMIR: contrastive multimodal image representation for registration. In: NeurIPS, vol. 33, pp. 18433–18444 (2020). https://Proceedings.neurips.cc/paper/2020/file/d6428eecbe0f7dff83fc607c5044b2b9-Paper.pdf
Qin, C., Shi, B., Liao, R., Mansi, T., Rueckert, D., Kamen, A.: Unsupervised deformable registration for multi-modal images via disentangled representations. In: Chung, A.C.S., Gee, J.C., Yushkevich, P.A., Bao, S. (eds.) IPMI 2019. LNCS, vol. 11492, pp. 249–261. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20351-1_19
Sammon, J.W.: A nonlinear mapping for data structure analysis. Trans. Comput. C-18(5), 401–409 (1969)
Tian, Y., Krishnan, D., Isola, P.: Contrastive multiview coding. CoRR (2019). http://arxiv.org/abs/1906.05849
Tian, Y., Sun, C., Poole, B., Krishnan, D., Schmid, C., Isola, P.: What makes for good views for contrastive learning? In: NeurIPS, vol. 33, pp. 6827–6839 (2020). https://Proceedings.neurips.cc/paper/2020/file/4c2e5eaae9152079b9e95845750bb9ab-Paper.pdf
Vicar, T., Raudenska, M., Gumulec, J., Masarik, M., Balvan, J.: Quantitative phase microscopy timelapse dataset of PNT1A, DU-145 and LNCaP cells with annotated caspase 3,7-dependent and independent cell death, March 2019. https://doi.org/10.5281/zenodo.2601562
Vicar, T., Raudenska, M., Gumulec, J., Masarik, M., Balvan, J.: Fluorescence microscopy timelapse dataset of PNT1A, DU-145 and LNCaP cells with annotated caspase 3,7-dependent and independent cell death, February 2021. https://doi.org/10.5281/zenodo.4531900
Wang, F., Liu, H.: Understanding the behaviour of contrastive loss. In: CVPR, pp. 2495–2504, June 2021
Wang, Z., Bovik, A., Sheikh, H., Simoncelli, E.: Image quality assessment: from error visibility to structural similarity. TIP 13(4), 600–612 (2004). https://doi.org/10.1109/TIP.2003.819861
Wen, Z., Li, Y.: Toward understanding the feature learning process of self-supervised contrastive learning, vol. 139, pp. 11112–11122. PMLR, 18–24 July 2021
Wetzer, E., Lindblad, J., Sladoje, N.: Can representation learning for multimodal image registration be improved by supervision of intermediate layers? (2023). https://doi.org/10.48550/ARXIV.2303.00403
Wu, W., Yang, J.: Object fingerprints for content analysis with applications to street landmark localization. In: Proceedings of the ACM International Conference on Multimedia, pp. 169–178 (2008)
Xiao, T., Wang, X., Efros, A.A., Darrell, T.: What should not be contrastive in contrastive learning. In: ICLR (2021). https://openreview.net/forum?id=CZ8Y3NzuVzO
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Wetzer, E., Lindblad, J., Sladoje, N. (2023). Can Representation Learning for Multimodal Image Registration be Improved by Supervision of Intermediate Layers?. In: Pertusa, A., Gallego, A.J., Sánchez, J.A., Domingues, I. (eds) Pattern Recognition and Image Analysis. IbPRIA 2023. Lecture Notes in Computer Science, vol 14062. Springer, Cham. https://doi.org/10.1007/978-3-031-36616-1_21
Download citation
DOI: https://doi.org/10.1007/978-3-031-36616-1_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36615-4
Online ISBN: 978-3-031-36616-1
eBook Packages: Computer ScienceComputer Science (R0)