Geometric Constraints for Self-supervised Monocular Depth Estimation on Laparoscopic Images with Dual-task Consistency

Li, Wenda; Hayashi, Yuichiro; Oda, Masahiro; Kitasaka, Takayuki; Misawa, Kazunari; Mori, Kensaku

doi:10.1007/978-3-031-16440-8_45

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13434))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

6608 Accesses

Abstract

Depth values are essential information to automate surgical robots and achieve Augmented Reality technology for minimally invasive surgery. Although depth-pose self-supervised monocular depth estimation performs impressively for autonomous driving scenarios, it is more challenging to predict accurate depth values for laparoscopic images due to the following two aspects: (i) the laparoscope’s motions contain many rotations, leading to pose estimation difficulties for the depth-pose learning strategy; (ii) the smooth surface reduces photometric error even if the matching pixels are inaccurate between adjacent frames. This paper proposes a novel self-supervised monocular depth estimation for laparoscopic images with geometric constraints. We predict the scene coordinates as an auxiliary task and construct dual-task consistency between the predicted depth maps and scene coordinates under a unified camera coordinate system to achieve pixel-level geometric constraints. We extend the pose estimation into a Siamese process to provide stronger and more balanced geometric constraints in a depth-pose learning strategy by leveraging the order of the adjacent frames in a video sequence. We also design a weight mask for depth estimation based on our consistency to alleviate the interference from predictions with low confidence. The experimental results showed that the proposed method outperformed the baseline on depth and pose estimation. Our code is available at https://github.com/MoriLabNU/GCDepthL.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Enhanced self-supervised monocular depth estimation with self-attention and joint depth-pose loss for laparoscopic images

Article Open access 28 February 2025

Multi-view Guidance for Self-supervised Monocular Depth Estimation on Laparoscopic Images via Spatio-Temporal Correspondence

Self-supervised Depth Estimation in Laparoscopic Image Using 3D Geometric Consistency

References

Allan, M., et al.: Stereo correspondence and reconstruction of endoscopic data challenge. arXiv preprint arXiv:2101.01133 (2021)
Bian, J., Li, Z., Wang, N., Zhan, H., Shen, C., Cheng, M.M., Reid, I.: Unsupervised scale-consistent depth and ego-motion learning from monocular video. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Dai, Q., Patil, V., Hecker, S., Dai, D., Van Gool, L., Schindler, K.: Self-supervised object motion and depth estimation from video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2020)
Google Scholar
Furukawa, Y., Curless, B., Seitz, S.M., Szeliski, R.: Towards internet-scale multi-view stereo. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1434–1441 (2010)
Google Scholar
Garg, R., B.G., V.K., Carneiro, G., Reid, I.: Unsupervised CNN for single view depth estimation: geometry to the rescue. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 740–756. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_45
Chapter Google Scholar
Geis, W.P.: Head-mounted video monitor for global visual access in mini-invasive surgery. Surg. Endosc. 10(7), 768–770 (1996)
Article Google Scholar
Godard, C., Mac Aodha, O., Firman, M., Brostow, G.J.: Digging into self-supervised monocular depth estimation. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3827–3837. IEEE (2019)
Google Scholar
Guizilini, V., Hou, R., Li, J., Ambrus, R., Gaidon, A.: Semantically-guided representation learning for self-supervised monocular depth. In: International Conference on Learning Representations (2020)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Huang, B., et al.: Self-supervised generative adversarial network for depth estimation in laparoscopic images. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12904, pp. 227–237. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87202-1_22
Chapter Google Scholar
Huynh, D.Q.: Metrics for 3D rotations: Comparison and analysis. Journal of Mathematical Imaging and Vision 35(2), 155–164 (2009)
Article MathSciNet Google Scholar
Hwang, M., et al.: Applying depth-sensing to automated surgical manipulation with a da Vinci robot. In: 2020 International Symposium on Medical Robotics (ISMR), pp. 22–29. IEEE (2020)
Google Scholar
Johnston, A., Carneiro, G.: Self-supervised monocular trained depth estimation using self-attention and discrete disparity volume. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4755–4764. IEEE (2020)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Li, W., Hayashi, Y., Oda, M., Kitasaka, T., Misawa, K., Kensaku, M.: Attention Guided Self-supervised Monocular Depth Estimation Based on Joint Depth-pose Loss for Laparoscopic Images. Computer Assisted Radiology and Surgery (2022)
Google Scholar
Li, W., Hayashi, Y., Oda, M., Kitasaka, T., Misawa, K., Mori, K.: Spatially variant biases considered self-supervised depth estimation based on laparoscopic videos. Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, pp. 1–9 (2021)
Google Scholar
Li, X., Wang, S., Zhao, Y., Verbeek, J., Kannala, J.: Hierarchical scene coordinate classification and regression for visual localization. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11980–11989. IEEE (2020)
Google Scholar
Li, X., Ylioinas, J., Verbeek, J., Kannala, J.: Scene coordinate regression with angle-based reprojection loss for camera relocalization. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pp. 1–16 (2018)
Google Scholar
Lyu, X., et al.: Hr-depth: High resolution self-supervised monocular depth estimation. arXiv preprint arXiv:2012.07356 6 (2020)
Ming, Y., Meng, X., Fan, C., Yu, H.: Deep learning for monocular depth estimation: a review. Neurocomputing 438, 14–33 (2021)
Article Google Scholar
Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017)
Google Scholar
Qian, L., Zhang, X., Deguet, A., Kazanzides, P.: ARAMIS: augmented reality assistance for minimally invasive surgery using a head-mounted display. In: Shen, D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou, S., Yap, P.-T., Khan, A. (eds.) MICCAI 2019. LNCS, vol. 11768, pp. 74–82. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32254-0_9
Chapter Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Tian, Y., Hu, X.: Monocular depth estimation based on a single image: a literature review. In: Twelfth International Conference on Graphics and Image Processing (ICGIP), vol. 11720, pp. 584–593. International Society for Optics and Photonics, SPIE (2021)
Google Scholar
Vecchio, R., MacFayden, B., Palazzo, F.: History of laparoscopic surgery. Panminerva Med. 42(1), 87–90 (2000)
Google Scholar
Wang, Z., Bovik, A., Sheikh, H., Simoncelli, E.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Article Google Scholar
Zhao, C.Q., Sun, Q.Y., Zhang, C.Z., Tang, Y., Qian, F.: Monocular depth estimation based on deep learning: an overview. Sci. China Technol. Sci. 63(9), 1612–1627 (2020). https://doi.org/10.1007/s11431-020-1582-8
Article Google Scholar
Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1851–1858 (2017)
Google Scholar

Download references

Acknowledgments

The authors are grateful for the support from JST CREST Grant Number JPMJCR20D5; MEXT/JSPS KAKENHI Grant Numbers 17H00867, 26108006, and 21K19898; JSPS Bilateral International Collaboration Grants; and CIBoG program of Nagoya University from the MEXT WISE program.

Author information

Authors and Affiliations

Graduate School of Informatics, Nagoya University, Furou-cho, Chikusa-ku, Nagoya, Aichi, 464-8601, Japan
Wenda Li, Yuichiro Hayashi, Masahiro Oda & Kensaku Mori
Information and Communications, Nagoya University, Furou-cho, Chikusa-ku, Nagoya, Aichi, 464-8601, Japan
Masahiro Oda
Faculty of Information Science, Aichi Institute of Technology, Yakusacho, Toyota, Aichi, 470-0392, Japan
Takayuki Kitasaka
Aichi Cancer Center Hospital, Chikusa-ku, Nagoya, Aichi, 464-8681, Japan
Kazunari Misawa
Information Technology Center, Nagoya University, Furou-cho, Chikusa-ku, Nagoya, Aichi, 464-8601, Japan
Kensaku Mori
Research Center of Medical Bigdata, National Institute of Informatics, Hitotsubashi, Chiyoda-ku, Tokyo, 101-8430, Japan
Kensaku Mori

Authors

Wenda Li
View author publications
You can also search for this author in PubMed Google Scholar
Yuichiro Hayashi
View author publications
You can also search for this author in PubMed Google Scholar
Masahiro Oda
View author publications
You can also search for this author in PubMed Google Scholar
Takayuki Kitasaka
View author publications
You can also search for this author in PubMed Google Scholar
Kazunari Misawa
View author publications
You can also search for this author in PubMed Google Scholar
Kensaku Mori
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kensaku Mori .

Editor information

Editors and Affiliations

Rochester Institute of Technology, Rochester, NY, USA
Linwei Wang
Chinese University of Hong Kong, Hong Kong, Hong Kong
Qi Dou
University of Virginia, Charlottesville, VA, USA
P. Thomas Fletcher
National Center for Tumor Diseases (NCT/UCC), Dresden, Germany
Stefanie Speidel
Case Western Reserve University, Cleveland, OH, USA
Shuo Li

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 180 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, W., Hayashi, Y., Oda, M., Kitasaka, T., Misawa, K., Mori, K. (2022). Geometric Constraints for Self-supervised Monocular Depth Estimation on Laparoscopic Images with Dual-task Consistency. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2022. MICCAI 2022. Lecture Notes in Computer Science, vol 13434. Springer, Cham. https://doi.org/10.1007/978-3-031-16440-8_45

Download citation

DOI: https://doi.org/10.1007/978-3-031-16440-8_45
Published: 16 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16439-2
Online ISBN: 978-3-031-16440-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

Geometric Constraints for Self-supervised Monocular Depth Estimation on Laparoscopic Images with Dual-task Consistency