Skip to main content

Multi-view Guidance for Self-supervised Monocular Depth Estimation on Laparoscopic Images via Spatio-Temporal Correspondence

  • Conference paper
  • First Online:
Medical Image Computing and Computer Assisted Intervention – MICCAI 2023 (MICCAI 2023)

Abstract

This work proposes an innovative self-supervised approach to monocular depth estimation in laparoscopic scenarios. Previous methods independently predicted depth maps ignoring spatial coherence in local regions and temporal correlation between adjacent images. The proposed approach leverages spatio-temporal coherence to address the challenges of textureless areas and homogeneous colors in such scenes. This approach utilizes a multi-view depth estimation model to guide monocular depth estimation when predicting depth maps. Moreover, the minimum reprojection error is extended to construct a cost volume for the multi-view model using adjacent images. Additionally, a 3D consistency of the point cloud back-projected from predicted depth maps is optimized for the monocular depth estimation model. To benefit from spatial coherence, deformable patch-matching is introduced to the monocular and multi-view models to smooth depth maps in local regions. Finally, a cycled prediction learning for view synthesis and relative poses is designed to exploit the temporal correlation between adjacent images fully. Experimental results show that the proposed method outperforms existing methods in both qualitative and quantitative evaluations. Our code is available at https://github.com/MoriLabNU/MGMDepthL.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Allan, M., et al.: Stereo correspondence and reconstruction of endoscopic data challenge. arXiv preprint arXiv:2101.01133 (2021)

  2. Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: Advances in Neural Information Processing Systems 27 (2014)

    Google Scholar 

  3. Godard, C., Mac Aodha, O., Firman, M., Brostow, G.J.: Digging into self-supervised monocular depth estimation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3828–3838 (2019)

    Google Scholar 

  4. Guizilini, V., Hou, R., Li, J., Ambrus, R., Gaidon, A.: Semantically-guided representation learning for self-supervised monocular depth. arXiv preprint arXiv:2002.12319 (2020)

  5. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognitionm, pp. 770–778 (2016)

    Google Scholar 

  6. Huang, B., et al.: Self-supervised generative adversarial network for depth estimation in laparoscopic images. In: de Bruijne, M., Cattin, P.C., Cotin, S., Padoy, N., Speidel, S., Zheng, Y., Essert, C. (eds.) MICCAI 2021. LNCS, vol. 12904, pp. 227–237. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87202-1_22

    Chapter  Google Scholar 

  7. Hwang, M., et al.: Applying depth-sensing to automated surgical manipulation with a da Vinci robot. In: 2020 International Symposium on Medical Robotics (ISMR), pp. 22–29. IEEE (2020)

    Google Scholar 

  8. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  9. Laina, I., Rupprecht, C., Belagiannis, V., Tombari, F., Navab, N.: Deeper depth prediction with fully convolutional residual networks. In: 2016 Fourth International Conference on 3D Vision (3DV), pp. 239–248. IEEE (2016)

    Google Scholar 

  10. Li, W., Hayashi, Y., Oda, M., Kitasaka, T., Misawa, K., Kensaku, M.: Attention guided self-supervised monocular depth estimation based on joint depth-pose loss for laparoscopic images. Comput. Assist. Radiol. Surg. (2022)

    Google Scholar 

  11. Li, W., Hayashi, Y., Oda, M., Kitasaka, T., Misawa, K., Mori, K.: Spatially variant biases considered self-supervised depth estimation based on laparoscopic videos. Comput. Methods Biomech. Biomed. Eng.: Imaging Vis., 1–9 (2021)

    Google Scholar 

  12. Li, W., Hayashi, Y., Oda, M., Kitasaka, T., Misawa, K., Mori, K.: Geometric constraints for self-supervised monocular depth estimation on laparoscopic images with dual-task consistency. In: Medical Image Computing and Computer Assisted Intervention-MICCAI 2022: 25th International Conference, Singapore, September 18–22, 2022, Proceedings, LNCS, Part IV, pp. 467–477. Springer (2022). https://doi.org/10.1007/978-3-031-16440-8_45

  13. Lyu, X., Liu, L., Wang, M., Kong, X., Liu, L., Liu, Y., Chen, X., Yuan, Y.: HR-Depth: high resolution self-supervised monocular depth estimation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 2294–2301 (2021)

    Google Scholar 

  14. Ming, Y., Meng, X., Fan, C., Yu, H.: Deep learning for monocular depth estimation: a review. Neurocomputing 438, 14–33 (2021)

    Article  Google Scholar 

  15. Park, J., Joo, K., Hu, Z., Liu, C.-K., So Kweon, I.: Non-local spatial propagation network for depth completion. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12358, pp. 120–136. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58601-0_8

    Chapter  Google Scholar 

  16. Paszke, A., et al.: Automatic differentiation in PyTorch. In: NIPS 2017 Workshop on Autodiff (2017)

    Google Scholar 

  17. Poggi, M., Aleotti, F., Tosi, F., Mattoccia, S.: On the uncertainty of self-supervised monocular depth estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3227–3237 (2020)

    Google Scholar 

  18. Qian, L., Zhang, X., Deguet, A., Kazanzides, P.: ARAMIS: augmented reality assistance for minimally invasive surgery using a head-mounted display. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11768, pp. 74–82. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32254-0_9

    Chapter  Google Scholar 

  19. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  20. Sánchez-González, P., et al.: Laparoscopic video analysis for training and image-guided surgery. Minim. Invasive Therapy Allied Technol. 20(6), 311–320 (2011)

    Article  Google Scholar 

  21. Tosi, F., Aleotti, F., Poggi, M., Mattoccia, S.: Learning monocular depth estimation infusing traditional stereo knowledge. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9799–9809 (2019)

    Google Scholar 

  22. Vecchio, R., MacFayden, B., Palazzo, F.: History of laparoscopic surgery. Panminerva Med. 42(1), 87–90 (2000)

    Google Scholar 

  23. Wang, F., Galliani, S., Vogel, C., Speciale, P., Pollefeys, M.: Patchmatchnet: learned multi-view patchmatch stereo. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 14194–14203 (2021)

    Google Scholar 

  24. Wang, Z., Bovik, A.C., Sheikh, H.R., Simoncelli, E.P.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)

    Article  Google Scholar 

  25. Watson, J., Mac Aodha, O., Prisacariu, V., Brostow, G., Firman, M.: The temporal opportunist: self-supervised multi-frame monocular depth. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1164–1174 (2021)

    Google Scholar 

  26. Yao, Y., Luo, Z., Li, S., Fang, T., Quan, L.: MVSNet: depth inference for unstructured multi-view stereo. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 767–783 (2018)

    Google Scholar 

  27. Ye, M., Johns, E., Handa, A., Zhang, L., Pratt, P., Yang, G.Z.: Self-supervised siamese learning on stereo image pairs for depth estimation in robotic surgery. arXiv preprint arXiv:1705.08260 (2017)

  28. Zhao, C., Yen, G.G., Sun, Q., Zhang, C., Tang, Y.: Masked GAN for unsupervised depth and pose prediction with scale consistency. IEEE Trans. Neural Netw. Learn. Syst. 32(12), 5392–5403 (2020)

    Article  Google Scholar 

  29. Zhao, C., et al.: MonoViT: self-supervised monocular depth estimation with a vision transformer. arXiv preprint arXiv:2208.03543 (2022)

  30. Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1851–1858 (2017)

    Google Scholar 

Download references

Acknowledgments

We extend our appreciation for the assistance provided through the JSPS Bilateral International Collaboration Grants; JST CREST Grant (JPMJCR20D5); MEXT/JSPS KAKENHI Grants (17H00867, 26108006, 21K19898); and the CIBoG initiative of Nagoya University, part of the MEXT WISE program.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Wenda Li or Kensaku Mori .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 203 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, W., Hayashi, Y., Oda, M., Kitasaka, T., Misawa, K., Mori, K. (2023). Multi-view Guidance for Self-supervised Monocular Depth Estimation on Laparoscopic Images via Spatio-Temporal Correspondence. In: Greenspan, H., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2023. MICCAI 2023. Lecture Notes in Computer Science, vol 14228. Springer, Cham. https://doi.org/10.1007/978-3-031-43996-4_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-43996-4_41

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-43995-7

  • Online ISBN: 978-3-031-43996-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics