Enhanced Scale-Aware Depth Estimation for Monocular Endoscopic Scenes with Geometric Modeling

Wei, Ruofeng; Li, Bin; Chen, Kai; Ma, Yiyao; Liu, Yunhui; Dou, Qi

doi:10.1007/978-3-031-72089-5_25

Ruofeng Wei¹⁴,
Bin Li¹⁵,
Kai Chen¹⁴,
Yiyao Ma¹⁴,
Yunhui Liu¹⁵ &
…
Qi Dou¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15006))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

1521 Accesses

Abstract

Scale-aware monocular depth estimation poses a significant challenge in computer-aided endoscopic navigation. However, existing depth estimation methods that do not consider the geometric priors struggle to learn the absolute scale from training with monocular endoscopic sequences. Additionally, conventional methods face difficulties in accurately estimating details on tissue and instruments boundaries. In this paper, we tackle these problems by proposing a novel enhanced scale-aware framework that only uses monocular images with geometric modeling for depth estimation. Specifically, we first propose a multi-resolution depth fusion strategy to enhance the quality of monocular depth estimation. To recover the precise scale between relative depth and real-world values, we further calculate the 3D poses of instruments in the endoscopic scenes by algebraic geometry based on the image-only geometric primitives (i.e., boundaries and tip of instruments). Afterwards, the 3D poses of surgical instruments enable the scale recovery of relative depth maps. By coupling scale factors and relative depth estimation, the scale-aware depth of the monocular endoscopic scenes can be estimated. We evaluate the pipeline on in-house endoscopic surgery videos and simulated data. The results demonstrate that our method can learn the absolute scale with geometric modeling and accurately estimate scale-aware depth for monocular scenes. Code is available at: https://github.com/med-air/MonoEndoDepth.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Self-supervised Siamese Network Using Vision Transformer for Depth Estimation in Endoscopic Surgeries

OneSLAM to map them all: a generalized approach to SLAM for monocular endoscopic imaging based on tracking any point

Article 22 May 2024

Multi-view Guidance for Self-supervised Monocular Depth Estimation on Laparoscopic Images via Spatio-Temporal Correspondence

References

Allan, M., Shvets, A., Kurmann, T., Zhang, Z., Duggal, R., Su, Y.H., Rieke, N., Laina, I., Kalavakonda, N., Bodenstedt, S., et al.: 2017 robotic instrument segmentation challenge. arXiv preprint arXiv:1902.06426 (2019)
Doignon, C., de Mathelin, M.: A degenerate conic-based method for a direct fitting and 3-d pose of cylinders with a single perspective view. In: Proceedings 2007 IEEE International Conference on Robotics and Automation. pp. 4220–4225 (2007)
Google Scholar
Dong, X., Garratt, M.A., Anavatti, S.G., Abbass, H.A.: Towards real-time monocular depth estimation for robotics: A survey. IEEE Transactions on Intelligent Transportation Systems 23(10), 16940–16961 (2022)
Article Google Scholar
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Advances in Neural Information Processing Systems 27 (2014)
Google Scholar
Godard, C., Mac Aodha, O., Firman, M., Brostow, G.J.: Digging into self-supervised monocular depth estimation. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 3828–3838 (2019)
Google Scholar
He, K., Sun, J., Tang, X.: Guided image filtering. IEEE Transactions on Pattern Analysis and Machine Intelligence 35(6), 1397–1409 (2012)
Article Google Scholar
Li, B., Liu, B., Zhu, M., Luo, X., Zhou, F.: Image intrinsic-based unsupervised monocular depth estimation in endoscopy. IEEE Journal of Biomedical and Health Informatics (2024)
Google Scholar
Lin, S., Zhi, Y., Yip, M.C.: Semhint-md: Learning from noisy semantic labels for self-supervised monocular depth estimation. arXiv preprint arXiv:2303.18219 (2023)
Ozyoruk, K.B., Gokceler, G.I., Bobrow, T.L., Coskun, G., Incetan, K., Almalioglu, Y., Mahmood, F., Curto, E., Perdigoto, L., Oliveira, M., et al.: Endoslam dataset and an unsupervised monocular visual odometry and depth estimation approach for endoscopic videos. Medical image analysis 71, 102058 (2021)
Article Google Scholar
Petrovai, A., Nedevschi, S.: Exploiting pseudo labels in a self-supervised learning framework for improved monocular depth estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 1578–1588 (2022)
Google Scholar
Ranftl, R., Bochkovskiy, A., Koltun, V.: Vision transformers for dense prediction. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 12179–12188 (2021)
Google Scholar
Recasens, D., Lamarca, J., Fácil, J.M., Montiel, J., Civera, J.: Endo-depth-and-motion: Reconstruction and tracking in endoscopic videos using depth networks and photometric constraints. IEEE Robotics and Automation Letters 6(4), 7225–7232 (2021)
Article Google Scholar
Shao, S., Pei, Z., Chen, W., Zhu, W., Wu, X., Sun, D., Zhang, B.: Self-supervised monocular depth and ego-motion estimation in endoscopy: Appearance flow to the rescue. Medical image analysis 77, 102338 (2022)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Watson, J., Mac Aodha, O., Prisacariu, V., Brostow, G., Firman, M.: The temporal opportunist: Self-supervised multi-frame monocular depth. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 1164–1174 (2021)
Google Scholar
Wei, R., Li, B., Mo, H., Lu, B., Long, Y., Yang, B., Dou, Q., Liu, Y., Sun, D.: Stereo dense scene reconstruction and accurate localization for learning-based navigation of laparoscope in minimally invasive surgery. IEEE Transactions on Biomedical Engineering 70(2), 488–500 (2022)
Article Google Scholar
Wei, R., Li, B., Mo, H., Zhong, F., Long, Y., Dou, Q., Liu, Y.H., Sun, D.: Distilled visual and robot kinematics embeddings for metric depth estimation in monocular scene reconstruction. In: 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 8072–8077 (2022)
Google Scholar
Wei, R., Li, B., Zhong, F., Mo, H., Dou, Q., Liu, Y.H., Sun, D.: Absolute monocular depth estimation on robotic visual and kinematics data via self-supervised learning. IEEE Transactions on Automation Science and Engineering (2024)
Google Scholar
Xu, J., Li, B., Lu, B., Liu, Y.H., Dou, Q., Heng, P.A.: Surrol: An open-source reinforcement learning centered and dvrk compatible platform for surgical robot learning. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 1821–1828 (2021)
Google Scholar
Xue, F., Zhuo, G., Huang, Z., Fu, W., Wu, Z., Ang, M.H.: Toward hierarchical self-supervised monocular absolute depth estimation for autonomous driving applications. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). pp. 2330–2337 (2020)
Google Scholar
Yang, L., Kang, B., Huang, Z., Xu, X., Feng, J., Zhao, H.: Depth anything: Unleashing the power of large-scale unlabeled data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 10371–10381 (2024)
Google Scholar
Yang, Z., Pan, J., Dai, J., Sun, Z., Xiao, Y.: Self-supervised lightweight depth estimation in endoscopy combining cnn and transformer. IEEE Transactions on Medical Imaging (2024)
Google Scholar
Yip, M., Salcudean, S., Goldberg, K., Althoefer, K., Menciassi, A., Opfermann, J.D., Krieger, A., Swaminathan, K., Walsh, C.J., Huang, H., et al.: Artificial intelligence meets medical robotics. Science 381(6654), 141–146 (2023)
Article Google Scholar
Zhang, S., Zhang, J., Tao, D.: Towards scale-aware, robust, and generalizable unsupervised monocular depth estimation by integrating imu motion dynamics. In: European Conference on Computer Vision. pp. 143–160. Springer (2022)
Google Scholar

Download references

Acknowledgments

This work was supported in part by the Shenzhen Portion of Shenzhen-Hong Kong Science and Technology Innovation Cooperation Zone under HZQB-KCZYB-20200089, in part by the National Natural Science Foundation of China under Project No. 62322318, in part by the ANR/RGC Joint Research Scheme of the Research Grants Council of the Hong Kong Special Administrative Region, China and the French National Research Agency (Project No. A-CUHK402/23), and in part by Hong Kong Innovation and Technology Commission under Project No. PRP/026/22FX.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong, China
Ruofeng Wei, Kai Chen, Yiyao Ma & Qi Dou
Department of Mechanical and Automation Engineering, The Chinese University of Hong Kong, Hong Kong, China
Bin Li & Yunhui Liu

Authors

Ruofeng Wei
View author publications
You can also search for this author in PubMed Google Scholar
Bin Li
View author publications
You can also search for this author in PubMed Google Scholar
Kai Chen
View author publications
You can also search for this author in PubMed Google Scholar
Yiyao Ma
View author publications
You can also search for this author in PubMed Google Scholar
Yunhui Liu
View author publications
You can also search for this author in PubMed Google Scholar
Qi Dou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qi Dou .

Editor information

Editors and Affiliations

Children’s National Hospital/George Washington University, Washington, DC, USA
Marius George Linguraru
The Chinese University of Hong Kong, Hong Kong, China
Qi Dou
Technical University of Denmark, Kgs Lyngby, Denmark
Aasa Feragen
Imperial College London, London, UK
Stamatia Giannarou
Imperial College London, London, UK
Ben Glocker
Universitat de Barcelona, Barcelona, Spain
Karim Lekadir
Helmholtz Munich, Technical University of Munich and King’s College London, Munich, Germany
Julia A. Schnabel

Ethics declarations

Disclosure of Interests

The authors have no competing interests to declare that are relevant to the content of this article.

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 836 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wei, R., Li, B., Chen, K., Ma, Y., Liu, Y., Dou, Q. (2024). Enhanced Scale-Aware Depth Estimation for Monocular Endoscopic Scenes with Geometric Modeling. In: Linguraru, M.G., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. MICCAI 2024. Lecture Notes in Computer Science, vol 15006. Springer, Cham. https://doi.org/10.1007/978-3-031-72089-5_25

Download citation

DOI: https://doi.org/10.1007/978-3-031-72089-5_25
Published: 03 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-72088-8
Online ISBN: 978-3-031-72089-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

Enhanced Scale-Aware Depth Estimation for Monocular Endoscopic Scenes with Geometric Modeling

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Self-supervised Siamese Network Using Vision Transformer for Depth Estimation in Endoscopic Surgeries

OneSLAM to map them all: a generalized approach to SLAM for monocular endoscopic imaging based on tracking any point

Multi-view Guidance for Self-supervised Monocular Depth Estimation on Laparoscopic Images via Spatio-Temporal Correspondence

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

1 Electronic supplementary material

Supplementary material 1 (pdf 836 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

Enhanced Scale-Aware Depth Estimation for Monocular Endoscopic Scenes with Geometric Modeling

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Self-supervised Siamese Network Using Vision Transformer for Depth Estimation in Endoscopic Surgeries

OneSLAM to map them all: a generalized approach to SLAM for monocular endoscopic imaging based on tracking any point

Multi-view Guidance for Self-supervised Monocular Depth Estimation on Laparoscopic Images via Spatio-Temporal Correspondence

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Ethics declarations

Disclosure of Interests

1 Electronic supplementary material

Supplementary material 1 (pdf 836 KB)

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation