StereoDiffusion: Temporally Consistent Stereo Depth Estimation with Diffusion Models

Xu, Haozheng; Xu, Chi; Giannarou, Stamatia

doi:10.1007/978-3-031-72089-5_56

Haozheng Xu¹⁴,
Chi Xu¹⁴ &
Stamatia Giannarou¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15006))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

1604 Accesses

Abstract

In Minimally Invasive Surgery (MIS), temporally consistent depth estimation is necessary for accurate intraoperative surgical navigation and robotic control. Despite the plethora of stereo depth estimation methods, estimating temporally consistent disparity is still challenging due to scene and camera dynamics. The aim of this paper is to introduce the StereoDiffusion framework for temporally consistent disparity estimation. For the first time, a latent diffusion model is incorporated into stereo depth estimation. Advancing existing depth estimation methods based on diffusion models, StereoDiffusion uses prior knowledge to refine disparity. Prior knowledge is generated using optical flow to warp the disparity map of the previous frame and predict a reprojected disparity map in the current frame to be refined. For efficient inference, fewer denoising steps and an efficient denoising scheduler have been used. Extensive validation on MIS stereo datasets and comparison to state-of-the-art (SOTA) methods show that StereoDiffusion achieves the best performance and provides temporally consistent disparity estimation with high-fidelity details, despite having been trained on natural scenes only.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Dense Depth Estimation from Stereo Endoscopy Videos Using Unsupervised Optical Flow Methods

SfMDiffusion: self-supervised monocular depth estimation in endoscopy based on diffusion models

Article 24 February 2025

Learning how to robustly estimate camera pose in endoscopic videos

Article Open access 15 May 2023

References

Allan, M., McLeod, A.J., Wang, C.C., et. al, J.R.: Stereo correspondence and reconstruction of endoscopic data challenge. CoRR abs/2101.01133 (2021)
Google Scholar
Amit, T., Nachmani, E., Shaharabany, T., Wolf, L.: Segdiff: Image segmentation with diffusion probabilistic models. CoRR abs/2112.00390 (2021), https://arxiv.org/abs/2112.00390
Hamlyn Centre Laparoscopic / Endoscopic Video Datasets: Hamlyn Centre Laparoscopic / Endoscopic Video Datasets. https://hamlyn.doc.ic.ac.uk/vision/
Hirschmüller, H., Scharstein, D.: Evaluation of cost functions for stereo matching. 2007 IEEE Conference on Computer Vision and Pattern Recognition pp. 1–8 (2007)
Google Scholar
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models (2020)
Google Scholar
Ke, B., Obukhov, A., Huang, S., Metzger, N., Daudt, R.C., Schindler, K.: Repurposing diffusion-based image generators for monocular depth estimation (2023)
Google Scholar
Li, Z., Liu, X., Drenkow, N., Ding, A., Creighton, F.X., Taylor, R.H., Unberath, M.: Revisiting stereo depth estimation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 6197–6206 (October 2021)
Google Scholar
Lipson, L., Teed, Z., Deng, J.: RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching. arXiv preprint arXiv:2109.07547 (2021)
Mayer, N., Ilg, E., Häusser, P., Fischer, P., Cremers, D., Dosovitskiy, A., Brox, T.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. CVPR pp. 4040–4048 (2016)
Google Scholar
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR. pp. 10684–10695 (2022)
Google Scholar
Salimans, T., Ho, J.: Progressive distillation for fast sampling of diffusion models. arXiv preprint arXiv:2202.00512 (2022)
Saxena, S., Herrmann, C., Hur, J., Kar, A., Norouzi, M., Sun, D., Fleet, D.J.: The surprising effectiveness of diffusion models for optical flow and monocular depth estimation. In: Thirty-seventh Conference on Neural Information Processing Systems (2023)
Google Scholar
Schmidt, A., Mohareri, O., DiMaio, S., Salcudean, S.: Stir: Surgical tattoos in infrared (2023). https://doi.org/10.21227/w8g4-g548
Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021), https://openreview.net/forum?id=St1giarCHLP
Teed, Z., Deng, J.: Raft: Recurrent all-pairs field transforms for optical flow. In: Computer Vision - ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part II. p. 402-419. Springer-Verlag, Berlin, Heidelberg (2020).https://doi.org/10.1007/978-3-030-58536-5_24, https://doi.org/10.1007/978-3-030-58536-5_24
Tukra, S., Xu, H., Xu, C., Giannarou, S.: Generalizable stereo depth estimation with masked image modelling. Healthcare Technology Letters (12 2023)https://doi.org/10.1049/htl2.12067
Zhao, H., Zhou, H., Zhang, Y., Chen, J., Yang, Y., Zhao, Y.: High-frequency stereo matching network. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 1327–1336 (2023)
Google Scholar

Download references

Acknowledgments

This work was supported by the Royal Society [URF$\setminus $R$\setminus $201014].

Author information

Authors and Affiliations

The Hamlyn Centre for Robotic Surgery, Department of Surgery and Cancer, Imperial College London, London, SW7 2AZ, UK
Haozheng Xu, Chi Xu & Stamatia Giannarou

Authors

Haozheng Xu
View author publications
You can also search for this author in PubMed Google Scholar
Chi Xu
View author publications
You can also search for this author in PubMed Google Scholar
Stamatia Giannarou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haozheng Xu .

Editor information

Editors and Affiliations

Children’s National Hospital/George Washington University, Washington, DC, USA
Marius George Linguraru
The Chinese University of Hong Kong, Hong Kong, China
Qi Dou
Technical University of Denmark, Kgs Lyngby, Denmark
Aasa Feragen
Imperial College London, London, UK
Stamatia Giannarou
Imperial College London, London, UK
Ben Glocker
Universitat de Barcelona, Barcelona, Spain
Karim Lekadir
Helmholtz Munich, Technical University of Munich and King’s College London, Munich, Germany
Julia A. Schnabel

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 86970 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, H., Xu, C., Giannarou, S. (2024). StereoDiffusion: Temporally Consistent Stereo Depth Estimation with Diffusion Models. In: Linguraru, M.G., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15006. Springer, Cham. https://doi.org/10.1007/978-3-031-72089-5_56

Download citation

DOI: https://doi.org/10.1007/978-3-031-72089-5_56
Published: 03 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72088-8
Online ISBN: 978-3-031-72089-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

StereoDiffusion: Temporally Consistent Stereo Depth Estimation with Diffusion Models