Skip to main content

StereoDiffusion: Temporally Consistent Stereo Depth Estimation with Diffusion Models

  • Conference paper
  • First Online:
Medical Image Computing and Computer Assisted Intervention – MICCAI 2024 (MICCAI 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15006))

  • 1604 Accesses

Abstract

In Minimally Invasive Surgery (MIS), temporally consistent depth estimation is necessary for accurate intraoperative surgical navigation and robotic control. Despite the plethora of stereo depth estimation methods, estimating temporally consistent disparity is still challenging due to scene and camera dynamics. The aim of this paper is to introduce the StereoDiffusion framework for temporally consistent disparity estimation. For the first time, a latent diffusion model is incorporated into stereo depth estimation. Advancing existing depth estimation methods based on diffusion models, StereoDiffusion uses prior knowledge to refine disparity. Prior knowledge is generated using optical flow to warp the disparity map of the previous frame and predict a reprojected disparity map in the current frame to be refined. For efficient inference, fewer denoising steps and an efficient denoising scheduler have been used. Extensive validation on MIS stereo datasets and comparison to state-of-the-art (SOTA) methods show that StereoDiffusion achieves the best performance and provides temporally consistent disparity estimation with high-fidelity details, despite having been trained on natural scenes only.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Allan, M., McLeod, A.J., Wang, C.C., et. al, J.R.: Stereo correspondence and reconstruction of endoscopic data challenge. CoRR abs/2101.01133 (2021)

    Google Scholar 

  2. Amit, T., Nachmani, E., Shaharabany, T., Wolf, L.: Segdiff: Image segmentation with diffusion probabilistic models. CoRR abs/2112.00390 (2021), https://arxiv.org/abs/2112.00390

  3. Hamlyn Centre Laparoscopic / Endoscopic Video Datasets: Hamlyn Centre Laparoscopic / Endoscopic Video Datasets. https://hamlyn.doc.ic.ac.uk/vision/

  4. Hirschmüller, H., Scharstein, D.: Evaluation of cost functions for stereo matching. 2007 IEEE Conference on Computer Vision and Pattern Recognition pp. 1–8 (2007)

    Google Scholar 

  5. Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models (2020)

    Google Scholar 

  6. Ke, B., Obukhov, A., Huang, S., Metzger, N., Daudt, R.C., Schindler, K.: Repurposing diffusion-based image generators for monocular depth estimation (2023)

    Google Scholar 

  7. Li, Z., Liu, X., Drenkow, N., Ding, A., Creighton, F.X., Taylor, R.H., Unberath, M.: Revisiting stereo depth estimation from a sequence-to-sequence perspective with transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 6197–6206 (October 2021)

    Google Scholar 

  8. Lipson, L., Teed, Z., Deng, J.: RAFT-Stereo: Multilevel Recurrent Field Transforms for Stereo Matching. arXiv preprint arXiv:2109.07547 (2021)

  9. Mayer, N., Ilg, E., Häusser, P., Fischer, P., Cremers, D., Dosovitskiy, A., Brox, T.: A large dataset to train convolutional networks for disparity, optical flow, and scene flow estimation. CVPR pp. 4040–4048 (2016)

    Google Scholar 

  10. Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR. pp. 10684–10695 (2022)

    Google Scholar 

  11. Salimans, T., Ho, J.: Progressive distillation for fast sampling of diffusion models. arXiv preprint arXiv:2202.00512 (2022)

  12. Saxena, S., Herrmann, C., Hur, J., Kar, A., Norouzi, M., Sun, D., Fleet, D.J.: The surprising effectiveness of diffusion models for optical flow and monocular depth estimation. In: Thirty-seventh Conference on Neural Information Processing Systems (2023)

    Google Scholar 

  13. Schmidt, A., Mohareri, O., DiMaio, S., Salcudean, S.: Stir: Surgical tattoos in infrared (2023). https://doi.org/10.21227/w8g4-g548

  14. Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. In: 9th International Conference on Learning Representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021. OpenReview.net (2021), https://openreview.net/forum?id=St1giarCHLP

  15. Teed, Z., Deng, J.: Raft: Recurrent all-pairs field transforms for optical flow. In: Computer Vision - ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part II. p. 402-419. Springer-Verlag, Berlin, Heidelberg (2020).https://doi.org/10.1007/978-3-030-58536-5_24, https://doi.org/10.1007/978-3-030-58536-5_24

  16. Tukra, S., Xu, H., Xu, C., Giannarou, S.: Generalizable stereo depth estimation with masked image modelling. Healthcare Technology Letters (12 2023)https://doi.org/10.1049/htl2.12067

  17. Zhao, H., Zhou, H., Zhang, Y., Chen, J., Yang, Y., Zhao, Y.: High-frequency stereo matching network. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 1327–1336 (2023)

    Google Scholar 

Download references

Acknowledgments

This work was supported by the Royal Society [URF\(\setminus \)R\(\setminus \)201014].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Haozheng Xu .

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 86970 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xu, H., Xu, C., Giannarou, S. (2024). StereoDiffusion: Temporally Consistent Stereo Depth Estimation with Diffusion Models. In: Linguraru, M.G., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15006. Springer, Cham. https://doi.org/10.1007/978-3-031-72089-5_56

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72089-5_56

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72088-8

  • Online ISBN: 978-3-031-72089-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics