Can Representation Learning for Multimodal Image Registration be Improved by Supervision of Intermediate Layers?

Wetzer, Elisabeth; Lindblad, Joakim; Sladoje, Nataša

doi:10.1007/978-3-031-36616-1_21

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14062))

Included in the following conference series:

Iberian Conference on Pattern Recognition and Image Analysis

1273 Accesses

Abstract

Multimodal imaging and correlative analysis typically require image alignment. Contrastive learning can generate representations of multimodal images, reducing the challenging task of multimodal image registration to a monomodal one. Previously, additional supervision on intermediate layers in contrastive learning has improved biomedical image classification. We evaluate if a similar approach improves representations learned for registration to boost registration performance. We explore three approaches to add contrastive supervision to the latent features of the bottleneck layer in the U-Nets encoding the multimodal images and evaluate three different critic functions. Our results show that representations learned without additional supervision on latent features perform best in the downstream task of registration on two public biomedical datasets. We investigate the performance drop by exploiting recent insights in contrastive learning in classification and self-supervised learning. We visualize the spatial relations of the learned representations by means of multidimensional scaling, and show that additional supervision on the bottleneck layer can lead to partial dimensional collapse of the intermediate embedding space.

Supported by VINNOVA (projects 2017-02447, 2020-03611, 2021-01420) and the Centre for Interdisciplinary Mathematics (CIM), Uppsala University.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Amiri, M., Brooks, R., Rivaz, H.: Fine-tuning U-Net for ultrasound image segmentation: different layers, different outcomes. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 67(12), 2510–2518 (2020). https://doi.org/10.1109/TUFFC.2020.3015081
Article Google Scholar
Borg, I., Groenen, P.J.F.: Modern Multidimensional Scaling - Theory and Applications. Springer, New York (2005). https://doi.org/10.1007/0-387-28981-X
Book MATH Google Scholar
Chen, Z., Wei, J., Li, R.: Unsupervised multi-modal medical image registration via discriminator-free image-to-image translation (2022). https://doi.org/10.48550/ARXIV.2204.13656
Chi, Z., et al.: On the representation collapse of sparse mixture of experts (2022). https://doi.org/10.48550/ARXIV.2204.09179
Choi, Y., Uh, Y., Yoo, J., Ha, J.W.: StarGAN v2: diverse image synthesis for multiple domains. In: CVPR, pp. 8185–8194 (2020). https://doi.org/10.1109/CVPR42600.2020.00821
Cosentino, R., et al.: Toward a geometrical understanding of self-supervised contrastive learning (2022). https://doi.org/10.48550/ARXIV.2205.06926
Dey, N., Schlemper, J., Salehi, S.S.M., Zhou, B., Gerig, G., Sofka, M.: ContraReg: contrastive learning of multi-modality unsupervised deformable image registration (2022). https://doi.org/10.48550/ARXIV.2206.13434
Eliceiri, K., Li, B., Keikhosravi, A.: Multimodal Biomedical Dataset for Evaluating Registration Methods (patches from TMA Cores), June 2020. https://doi.org/10.5281/zenodo.3874362
Eliceiri, K., Li, B., Keikhosravi, A.: Multimodal biomedical dataset for evaluating registration methods (full-size TMA cores), February 2021. https://doi.org/10.5281/zenodo.4550300
En, S., Lechervy, A., Jurie, F.: TS-NET: combining modality specific and common features for multimodal patch matching. In: ICIP, pp. 3024–3028 (2018). https://doi.org/10.1109/ICIP.2018.8451804
Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981). https://doi.org/10.1145/358669.358692
Article MathSciNet Google Scholar
Gutmann, M., Hyvärinen, A.: Noise-contrastive estimation: a new estimation principle for unnormalized statistical models. In: Proceedings of the International Conference on Artificial Intelligence and Statistics, vol. 9, pp. 297–304. PMLR (2010)
Google Scholar
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: NeurIPS, vol. 30 (2017). Proceedings.neurips.cc/paper/2017/file/8a1d694707eb0fefe65871369074926d-Paper.pdf
Hjelm, R.D., et al.: Learning deep representations by mutual information estimation and maximization. In: ICLR (2019). https://openreview.net/forum?id=Bklr3j0cKX
Hua, T., Wang, W., Xue, Z., Ren, S., Wang, Y., Zhao, H.: On feature decorrelation in self-supervised learning. In: ICCV, pp. 9598–9608, October 2021
Google Scholar
Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR (2017)
Google Scholar
Jégou, S., Drozdzal, M., Vazquez, D., Romero, A., Bengio, Y.: The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation. In: CVPR Workshops, pp. 11–19 (2017)
Google Scholar
Jing, L., Vincent, P., LeCun, Y., Tian, Y.: Understanding dimensional collapse in contrastive self-supervised learning. arXiv preprint arXiv:2110.09348 (2021)
Kaku, A., Upadhya, S., Razavian, N.: Intermediate layers matter in momentum contrastive self supervised learning. In: NeurIPS, vol. 34, pp. 24063–24074 (2021). https://Proceedings.neurips.cc/paper/2021/file/c9f06258da6455f5bf50c5b9260efeff-Paper.pdf
Kang, S., Uchida, S., Iwana, B.K.: Tunable U-Net: controlling image-to-image outputs using a tunable scalar value. IEEE Access 9, 103279–103290 (2021). https://doi.org/10.1109/ACCESS.2021.3096530
Article Google Scholar
Lee, H.Y., et al.: DRIT++: diverse image-to-image translation via disentangled representations. Int. J. Comput. Vis. 128, 2402–2417 (2020). https://doi.org/10.1007/s11263-019-01284-z
Li, A.C., Efros, A.A., Pathak, D.: Understanding collapse in non-contrastive Siamese representation learning. In: Avidan, S., Brostow, G., Cisse, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol. 13691, pp. 490–505. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19821-2_28
Li, S., Tso, G.K., He, K.: Bottleneck feature supervised U-Net for pixel-wise liver and tumor segmentation. Expert Syst. Appl. 145, 113131 (2020). https://doi.org/10.1016/j.eswa.2019.113131
Article Google Scholar
Lindblad, J., Sladoje, N.: Linear time distances between fuzzy sets with applications to pattern matching and classification. TIP 23(1), 126–136 (2014). https://doi.org/10.1109/TIP.2013.2286904
Article MathSciNet MATH Google Scholar
Lu, J., Öfverstedt, J., Lindblad, J., Sladoje, N.: Is image-to-image translation the panacea for multimodal image registration? A comparative study. PLOS ONE 17(11), 1–33 (2022). https://doi.org/10.1371/journal.pone.0276196
Lu, J., Öfverstedt, J., Lindblad, J., Sladoje, N.: Datasets for Evaluation of Multimodal Image Registration, April 2021. https://doi.org/10.5281/zenodo.5557568
Morozov, S., Voynov, A., Babenko, A.: On self-supervised image representations for GAN evaluation. In: ICLR (2021). https://openreview.net/forum?id=NeRdBeTionN
Öfverstedt, J., Lindblad, J., Sladoje, N.: Fast and robust symmetric image registration based on distances combining intensity and spatial information. TIP 28(7), 3584–3597 (2019). https://doi.org/10.1109/TIP.2019.2899947
Article MathSciNet MATH Google Scholar
Öfverstedt, J., Lindblad, J., Sladoje, N.: Fast computation of mutual information in the frequency domain with applications to global multimodal image alignment. Pattern Recogn. Lett. 159, 196–203 (2022). https://doi.org/10.1016/j.patrec.2022.05.022
Article Google Scholar
van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. CoRR (2018). http://arxiv.org/abs/1807.03748
Oord, A.v.d., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)
Park, T., Efros, A.A., Zhang, R., Zhu, J.-Y.: Contrastive learning for unpaired image-to-image translation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 319–345. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_19
Chapter Google Scholar
Pielawski, N., et al.: CoMIR: contrastive multimodal image representation for registration. In: NeurIPS, vol. 33, pp. 18433–18444 (2020). https://Proceedings.neurips.cc/paper/2020/file/d6428eecbe0f7dff83fc607c5044b2b9-Paper.pdf
Qin, C., Shi, B., Liao, R., Mansi, T., Rueckert, D., Kamen, A.: Unsupervised deformable registration for multi-modal images via disentangled representations. In: Chung, A.C.S., Gee, J.C., Yushkevich, P.A., Bao, S. (eds.) IPMI 2019. LNCS, vol. 11492, pp. 249–261. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20351-1_19
Chapter Google Scholar
Sammon, J.W.: A nonlinear mapping for data structure analysis. Trans. Comput. C-18(5), 401–409 (1969)
Google Scholar
Tian, Y., Krishnan, D., Isola, P.: Contrastive multiview coding. CoRR (2019). http://arxiv.org/abs/1906.05849
Tian, Y., Sun, C., Poole, B., Krishnan, D., Schmid, C., Isola, P.: What makes for good views for contrastive learning? In: NeurIPS, vol. 33, pp. 6827–6839 (2020). https://Proceedings.neurips.cc/paper/2020/file/4c2e5eaae9152079b9e95845750bb9ab-Paper.pdf
Vicar, T., Raudenska, M., Gumulec, J., Masarik, M., Balvan, J.: Quantitative phase microscopy timelapse dataset of PNT1A, DU-145 and LNCaP cells with annotated caspase 3,7-dependent and independent cell death, March 2019. https://doi.org/10.5281/zenodo.2601562
Vicar, T., Raudenska, M., Gumulec, J., Masarik, M., Balvan, J.: Fluorescence microscopy timelapse dataset of PNT1A, DU-145 and LNCaP cells with annotated caspase 3,7-dependent and independent cell death, February 2021. https://doi.org/10.5281/zenodo.4531900
Wang, F., Liu, H.: Understanding the behaviour of contrastive loss. In: CVPR, pp. 2495–2504, June 2021
Google Scholar
Wang, Z., Bovik, A., Sheikh, H., Simoncelli, E.: Image quality assessment: from error visibility to structural similarity. TIP 13(4), 600–612 (2004). https://doi.org/10.1109/TIP.2003.819861
Article Google Scholar
Wen, Z., Li, Y.: Toward understanding the feature learning process of self-supervised contrastive learning, vol. 139, pp. 11112–11122. PMLR, 18–24 July 2021
Google Scholar
Wetzer, E., Lindblad, J., Sladoje, N.: Can representation learning for multimodal image registration be improved by supervision of intermediate layers? (2023). https://doi.org/10.48550/ARXIV.2303.00403
Wu, W., Yang, J.: Object fingerprints for content analysis with applications to street landmark localization. In: Proceedings of the ACM International Conference on Multimedia, pp. 169–178 (2008)
Google Scholar
Xiao, T., Wang, X., Efros, A.A., Darrell, T.: What should not be contrastive in contrastive learning. In: ICLR (2021). https://openreview.net/forum?id=CZ8Y3NzuVzO
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

Centre for Image Analysis, Department of Information Technology, Uppsala University, Uppsala, Sweden
Elisabeth Wetzer, Joakim Lindblad & Nataša Sladoje

Authors

Elisabeth Wetzer
View author publications
You can also search for this author in PubMed Google Scholar
Joakim Lindblad
View author publications
You can also search for this author in PubMed Google Scholar
Nataša Sladoje
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Elisabeth Wetzer .

Editor information

Editors and Affiliations

University of Alicante, Alicante, Spain
Antonio Pertusa
University of Alicante, Alicante, Spain
Antonio Javier Gallego
Universitat Politècnica de València, Valencia, Spain
Joan Andreu Sánchez
IPO Porto, Coimbra, Portugal
Inês Domingues

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wetzer, E., Lindblad, J., Sladoje, N. (2023). Can Representation Learning for Multimodal Image Registration be Improved by Supervision of Intermediate Layers?. In: Pertusa, A., Gallego, A.J., Sánchez, J.A., Domingues, I. (eds) Pattern Recognition and Image Analysis. IbPRIA 2023. Lecture Notes in Computer Science, vol 14062. Springer, Cham. https://doi.org/10.1007/978-3-031-36616-1_21

Download citation

DOI: https://doi.org/10.1007/978-3-031-36616-1_21
Published: 25 June 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-36615-4
Online ISBN: 978-3-031-36616-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Can Representation Learning for Multimodal Image Registration be Improved by Supervision of Intermediate Layers?