Abstract
Depth values are essential information to automate surgical robots and achieve Augmented Reality technology for minimally invasive surgery. Although depth-pose self-supervised monocular depth estimation performs impressively for autonomous driving scenarios, it is more challenging to predict accurate depth values for laparoscopic images due to the following two aspects: (i) the laparoscope’s motions contain many rotations, leading to pose estimation difficulties for the depth-pose learning strategy; (ii) the smooth surface reduces photometric error even if the matching pixels are inaccurate between adjacent frames. This paper proposes a novel self-supervised monocular depth estimation for laparoscopic images with geometric constraints. We predict the scene coordinates as an auxiliary task and construct dual-task consistency between the predicted depth maps and scene coordinates under a unified camera coordinate system to achieve pixel-level geometric constraints. We extend the pose estimation into a Siamese process to provide stronger and more balanced geometric constraints in a depth-pose learning strategy by leveraging the order of the adjacent frames in a video sequence. We also design a weight mask for depth estimation based on our consistency to alleviate the interference from predictions with low confidence. The experimental results showed that the proposed method outperformed the baseline on depth and pose estimation. Our code is available at https://github.com/MoriLabNU/GCDepthL.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Allan, M., et al.: Stereo correspondence and reconstruction of endoscopic data challenge. arXiv preprint arXiv:2101.01133 (2021)
Bian, J., Li, Z., Wang, N., Zhan, H., Shen, C., Cheng, M.M., Reid, I.: Unsupervised scale-consistent depth and ego-motion learning from monocular video. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Dai, Q., Patil, V., Hecker, S., Dai, D., Van Gool, L., Schindler, K.: Self-supervised object motion and depth estimation from video. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops (2020)
Furukawa, Y., Curless, B., Seitz, S.M., Szeliski, R.: Towards internet-scale multi-view stereo. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1434–1441 (2010)
Garg, R., B.G., V.K., Carneiro, G., Reid, I.: Unsupervised CNN for single view depth estimation: geometry to the rescue. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 740–756. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_45
Geis, W.P.: Head-mounted video monitor for global visual access in mini-invasive surgery. Surg. Endosc. 10(7), 768–770 (1996)
Godard, C., Mac Aodha, O., Firman, M., Brostow, G.J.: Digging into self-supervised monocular depth estimation. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3827–3837. IEEE (2019)
Guizilini, V., Hou, R., Li, J., Ambrus, R., Gaidon, A.: Semantically-guided representation learning for self-supervised monocular depth. In: International Conference on Learning Representations (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Huang, B., et al.: Self-supervised generative adversarial network for depth estimation in laparoscopic images. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12904, pp. 227–237. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87202-1_22
Huynh, D.Q.: Metrics for 3D rotations: Comparison and analysis. Journal of Mathematical Imaging and Vision 35(2), 155–164 (2009)
Hwang, M., et al.: Applying depth-sensing to automated surgical manipulation with a da Vinci robot. In: 2020 International Symposium on Medical Robotics (ISMR), pp. 22–29. IEEE (2020)
Johnston, A., Carneiro, G.: Self-supervised monocular trained depth estimation using self-attention and discrete disparity volume. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4755–4764. IEEE (2020)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)
Li, W., Hayashi, Y., Oda, M., Kitasaka, T., Misawa, K., Kensaku, M.: Attention Guided Self-supervised Monocular Depth Estimation Based on Joint Depth-pose Loss for Laparoscopic Images. Computer Assisted Radiology and Surgery (2022)
Li, W., Hayashi, Y., Oda, M., Kitasaka, T., Misawa, K., Mori, K.: Spatially variant biases considered self-supervised depth estimation based on laparoscopic videos. Computer Methods in Biomechanics and Biomedical Engineering: Imaging & Visualization, pp. 1–9 (2021)
Li, X., Wang, S., Zhao, Y., Verbeek, J., Kannala, J.: Hierarchical scene coordinate classification and regression for visual localization. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11980–11989. IEEE (2020)
Li, X., Ylioinas, J., Verbeek, J., Kannala, J.: Scene coordinate regression with angle-based reprojection loss for camera relocalization. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops, pp. 1–16 (2018)
Lyu, X., et al.: Hr-depth: High resolution self-supervised monocular depth estimation. arXiv preprint arXiv:2012.07356 6 (2020)
Ming, Y., Meng, X., Fan, C., Yu, H.: Deep learning for monocular depth estimation: a review. Neurocomputing 438, 14–33 (2021)
Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS 2017 Workshop on Autodiff (2017)
Qian, L., Zhang, X., Deguet, A., Kazanzides, P.: ARAMIS: augmented reality assistance for minimally invasive surgery using a head-mounted display. In: Shen, D., Liu, T., Peters, T.M., Staib, L.H., Essert, C., Zhou, S., Yap, P.-T., Khan, A. (eds.) MICCAI 2019. LNCS, vol. 11768, pp. 74–82. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32254-0_9
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)
Tian, Y., Hu, X.: Monocular depth estimation based on a single image: a literature review. In: Twelfth International Conference on Graphics and Image Processing (ICGIP), vol. 11720, pp. 584–593. International Society for Optics and Photonics, SPIE (2021)
Vecchio, R., MacFayden, B., Palazzo, F.: History of laparoscopic surgery. Panminerva Med. 42(1), 87–90 (2000)
Wang, Z., Bovik, A., Sheikh, H., Simoncelli, E.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Zhao, C.Q., Sun, Q.Y., Zhang, C.Z., Tang, Y., Qian, F.: Monocular depth estimation based on deep learning: an overview. Sci. China Technol. Sci. 63(9), 1612–1627 (2020). https://doi.org/10.1007/s11431-020-1582-8
Zhou, T., Brown, M., Snavely, N., Lowe, D.G.: Unsupervised learning of depth and ego-motion from video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1851–1858 (2017)
Acknowledgments
The authors are grateful for the support from JST CREST Grant Number JPMJCR20D5; MEXT/JSPS KAKENHI Grant Numbers 17H00867, 26108006, and 21K19898; JSPS Bilateral International Collaboration Grants; and CIBoG program of Nagoya University from the MEXT WISE program.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Li, W., Hayashi, Y., Oda, M., Kitasaka, T., Misawa, K., Mori, K. (2022). Geometric Constraints for Self-supervised Monocular Depth Estimation on Laparoscopic Images with Dual-task Consistency. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2022. MICCAI 2022. Lecture Notes in Computer Science, vol 13434. Springer, Cham. https://doi.org/10.1007/978-3-031-16440-8_45
Download citation
DOI: https://doi.org/10.1007/978-3-031-16440-8_45
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16439-2
Online ISBN: 978-3-031-16440-8
eBook Packages: Computer ScienceComputer Science (R0)