Skip to main content

Can Representation Learning for Multimodal Image Registration be Improved by Supervision of Intermediate Layers?

  • Conference paper
  • First Online:
Pattern Recognition and Image Analysis (IbPRIA 2023)

Abstract

Multimodal imaging and correlative analysis typically require image alignment. Contrastive learning can generate representations of multimodal images, reducing the challenging task of multimodal image registration to a monomodal one. Previously, additional supervision on intermediate layers in contrastive learning has improved biomedical image classification. We evaluate if a similar approach improves representations learned for registration to boost registration performance. We explore three approaches to add contrastive supervision to the latent features of the bottleneck layer in the U-Nets encoding the multimodal images and evaluate three different critic functions. Our results show that representations learned without additional supervision on latent features perform best in the downstream task of registration on two public biomedical datasets. We investigate the performance drop by exploiting recent insights in contrastive learning in classification and self-supervised learning. We visualize the spatial relations of the learned representations by means of multidimensional scaling, and show that additional supervision on the bottleneck layer can lead to partial dimensional collapse of the intermediate embedding space.

Supported by VINNOVA (projects 2017-02447, 2020-03611, 2021-01420) and the Centre for Interdisciplinary Mathematics (CIM), Uppsala University.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Amiri, M., Brooks, R., Rivaz, H.: Fine-tuning U-Net for ultrasound image segmentation: different layers, different outcomes. IEEE Trans. Ultrason. Ferroelectr. Freq. Control 67(12), 2510–2518 (2020). https://doi.org/10.1109/TUFFC.2020.3015081

    Article  Google Scholar 

  2. Borg, I., Groenen, P.J.F.: Modern Multidimensional Scaling - Theory and Applications. Springer, New York (2005). https://doi.org/10.1007/0-387-28981-X

    Book  MATH  Google Scholar 

  3. Chen, Z., Wei, J., Li, R.: Unsupervised multi-modal medical image registration via discriminator-free image-to-image translation (2022). https://doi.org/10.48550/ARXIV.2204.13656

  4. Chi, Z., et al.: On the representation collapse of sparse mixture of experts (2022). https://doi.org/10.48550/ARXIV.2204.09179

  5. Choi, Y., Uh, Y., Yoo, J., Ha, J.W.: StarGAN v2: diverse image synthesis for multiple domains. In: CVPR, pp. 8185–8194 (2020). https://doi.org/10.1109/CVPR42600.2020.00821

  6. Cosentino, R., et al.: Toward a geometrical understanding of self-supervised contrastive learning (2022). https://doi.org/10.48550/ARXIV.2205.06926

  7. Dey, N., Schlemper, J., Salehi, S.S.M., Zhou, B., Gerig, G., Sofka, M.: ContraReg: contrastive learning of multi-modality unsupervised deformable image registration (2022). https://doi.org/10.48550/ARXIV.2206.13434

  8. Eliceiri, K., Li, B., Keikhosravi, A.: Multimodal Biomedical Dataset for Evaluating Registration Methods (patches from TMA Cores), June 2020. https://doi.org/10.5281/zenodo.3874362

  9. Eliceiri, K., Li, B., Keikhosravi, A.: Multimodal biomedical dataset for evaluating registration methods (full-size TMA cores), February 2021. https://doi.org/10.5281/zenodo.4550300

  10. En, S., Lechervy, A., Jurie, F.: TS-NET: combining modality specific and common features for multimodal patch matching. In: ICIP, pp. 3024–3028 (2018). https://doi.org/10.1109/ICIP.2018.8451804

  11. Fischler, M.A., Bolles, R.C.: Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM 24(6), 381–395 (1981). https://doi.org/10.1145/358669.358692

    Article  MathSciNet  Google Scholar 

  12. Gutmann, M., Hyvärinen, A.: Noise-contrastive estimation: a new estimation principle for unnormalized statistical models. In: Proceedings of the International Conference on Artificial Intelligence and Statistics, vol. 9, pp. 297–304. PMLR (2010)

    Google Scholar 

  13. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: NeurIPS, vol. 30 (2017). Proceedings.neurips.cc/paper/2017/file/8a1d694707eb0fefe65871369074926d-Paper.pdf

  14. Hjelm, R.D., et al.: Learning deep representations by mutual information estimation and maximization. In: ICLR (2019). https://openreview.net/forum?id=Bklr3j0cKX

  15. Hua, T., Wang, W., Xue, Z., Ren, S., Wang, Y., Zhao, H.: On feature decorrelation in self-supervised learning. In: ICCV, pp. 9598–9608, October 2021

    Google Scholar 

  16. Isola, P., Zhu, J.Y., Zhou, T., Efros, A.A.: Image-to-image translation with conditional adversarial networks. In: CVPR (2017)

    Google Scholar 

  17. Jégou, S., Drozdzal, M., Vazquez, D., Romero, A., Bengio, Y.: The one hundred layers tiramisu: fully convolutional densenets for semantic segmentation. In: CVPR Workshops, pp. 11–19 (2017)

    Google Scholar 

  18. Jing, L., Vincent, P., LeCun, Y., Tian, Y.: Understanding dimensional collapse in contrastive self-supervised learning. arXiv preprint arXiv:2110.09348 (2021)

  19. Kaku, A., Upadhya, S., Razavian, N.: Intermediate layers matter in momentum contrastive self supervised learning. In: NeurIPS, vol. 34, pp. 24063–24074 (2021). https://Proceedings.neurips.cc/paper/2021/file/c9f06258da6455f5bf50c5b9260efeff-Paper.pdf

  20. Kang, S., Uchida, S., Iwana, B.K.: Tunable U-Net: controlling image-to-image outputs using a tunable scalar value. IEEE Access 9, 103279–103290 (2021). https://doi.org/10.1109/ACCESS.2021.3096530

    Article  Google Scholar 

  21. Lee, H.Y., et al.: DRIT++: diverse image-to-image translation via disentangled representations. Int. J. Comput. Vis. 128, 2402–2417 (2020). https://doi.org/10.1007/s11263-019-01284-z

  22. Li, A.C., Efros, A.A., Pathak, D.: Understanding collapse in non-contrastive Siamese representation learning. In: Avidan, S., Brostow, G., Cisse, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol. 13691, pp. 490–505. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-19821-2_28

  23. Li, S., Tso, G.K., He, K.: Bottleneck feature supervised U-Net for pixel-wise liver and tumor segmentation. Expert Syst. Appl. 145, 113131 (2020). https://doi.org/10.1016/j.eswa.2019.113131

    Article  Google Scholar 

  24. Lindblad, J., Sladoje, N.: Linear time distances between fuzzy sets with applications to pattern matching and classification. TIP 23(1), 126–136 (2014). https://doi.org/10.1109/TIP.2013.2286904

    Article  MathSciNet  MATH  Google Scholar 

  25. Lu, J., Öfverstedt, J., Lindblad, J., Sladoje, N.: Is image-to-image translation the panacea for multimodal image registration? A comparative study. PLOS ONE 17(11), 1–33 (2022). https://doi.org/10.1371/journal.pone.0276196

  26. Lu, J., Öfverstedt, J., Lindblad, J., Sladoje, N.: Datasets for Evaluation of Multimodal Image Registration, April 2021. https://doi.org/10.5281/zenodo.5557568

  27. Morozov, S., Voynov, A., Babenko, A.: On self-supervised image representations for GAN evaluation. In: ICLR (2021). https://openreview.net/forum?id=NeRdBeTionN

  28. Öfverstedt, J., Lindblad, J., Sladoje, N.: Fast and robust symmetric image registration based on distances combining intensity and spatial information. TIP 28(7), 3584–3597 (2019). https://doi.org/10.1109/TIP.2019.2899947

    Article  MathSciNet  MATH  Google Scholar 

  29. Öfverstedt, J., Lindblad, J., Sladoje, N.: Fast computation of mutual information in the frequency domain with applications to global multimodal image alignment. Pattern Recogn. Lett. 159, 196–203 (2022). https://doi.org/10.1016/j.patrec.2022.05.022

    Article  Google Scholar 

  30. van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. CoRR (2018). http://arxiv.org/abs/1807.03748

  31. Oord, A.v.d., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748 (2018)

  32. Park, T., Efros, A.A., Zhang, R., Zhu, J.-Y.: Contrastive learning for unpaired image-to-image translation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 319–345. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_19

    Chapter  Google Scholar 

  33. Pielawski, N., et al.: CoMIR: contrastive multimodal image representation for registration. In: NeurIPS, vol. 33, pp. 18433–18444 (2020). https://Proceedings.neurips.cc/paper/2020/file/d6428eecbe0f7dff83fc607c5044b2b9-Paper.pdf

  34. Qin, C., Shi, B., Liao, R., Mansi, T., Rueckert, D., Kamen, A.: Unsupervised deformable registration for multi-modal images via disentangled representations. In: Chung, A.C.S., Gee, J.C., Yushkevich, P.A., Bao, S. (eds.) IPMI 2019. LNCS, vol. 11492, pp. 249–261. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-20351-1_19

    Chapter  Google Scholar 

  35. Sammon, J.W.: A nonlinear mapping for data structure analysis. Trans. Comput. C-18(5), 401–409 (1969)

    Google Scholar 

  36. Tian, Y., Krishnan, D., Isola, P.: Contrastive multiview coding. CoRR (2019). http://arxiv.org/abs/1906.05849

  37. Tian, Y., Sun, C., Poole, B., Krishnan, D., Schmid, C., Isola, P.: What makes for good views for contrastive learning? In: NeurIPS, vol. 33, pp. 6827–6839 (2020). https://Proceedings.neurips.cc/paper/2020/file/4c2e5eaae9152079b9e95845750bb9ab-Paper.pdf

  38. Vicar, T., Raudenska, M., Gumulec, J., Masarik, M., Balvan, J.: Quantitative phase microscopy timelapse dataset of PNT1A, DU-145 and LNCaP cells with annotated caspase 3,7-dependent and independent cell death, March 2019. https://doi.org/10.5281/zenodo.2601562

  39. Vicar, T., Raudenska, M., Gumulec, J., Masarik, M., Balvan, J.: Fluorescence microscopy timelapse dataset of PNT1A, DU-145 and LNCaP cells with annotated caspase 3,7-dependent and independent cell death, February 2021. https://doi.org/10.5281/zenodo.4531900

  40. Wang, F., Liu, H.: Understanding the behaviour of contrastive loss. In: CVPR, pp. 2495–2504, June 2021

    Google Scholar 

  41. Wang, Z., Bovik, A., Sheikh, H., Simoncelli, E.: Image quality assessment: from error visibility to structural similarity. TIP 13(4), 600–612 (2004). https://doi.org/10.1109/TIP.2003.819861

    Article  Google Scholar 

  42. Wen, Z., Li, Y.: Toward understanding the feature learning process of self-supervised contrastive learning, vol. 139, pp. 11112–11122. PMLR, 18–24 July 2021

    Google Scholar 

  43. Wetzer, E., Lindblad, J., Sladoje, N.: Can representation learning for multimodal image registration be improved by supervision of intermediate layers? (2023). https://doi.org/10.48550/ARXIV.2303.00403

  44. Wu, W., Yang, J.: Object fingerprints for content analysis with applications to street landmark localization. In: Proceedings of the ACM International Conference on Multimedia, pp. 169–178 (2008)

    Google Scholar 

  45. Xiao, T., Wang, X., Efros, A.A., Darrell, T.: What should not be contrastive in contrastive learning. In: ICLR (2021). https://openreview.net/forum?id=CZ8Y3NzuVzO

  46. Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: ICCV (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Elisabeth Wetzer .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wetzer, E., Lindblad, J., Sladoje, N. (2023). Can Representation Learning for Multimodal Image Registration be Improved by Supervision of Intermediate Layers?. In: Pertusa, A., Gallego, A.J., Sánchez, J.A., Domingues, I. (eds) Pattern Recognition and Image Analysis. IbPRIA 2023. Lecture Notes in Computer Science, vol 14062. Springer, Cham. https://doi.org/10.1007/978-3-031-36616-1_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-36616-1_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-36615-4

  • Online ISBN: 978-3-031-36616-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics