Deep Homography Prediction for Endoscopic Camera Motion Imitation Learning

Huber, Martin; Ourselin, Sébastien; Bergeles, Christos; Vercauteren, Tom

doi:10.1007/978-3-031-43996-4_21

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14228))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

4421 Accesses

Abstract

In this work, we investigate laparoscopic camera motion automation through imitation learning from retrospective videos of laparoscopic interventions. A novel method is introduced that learns to augment a surgeon’s behavior in image space through object motion invariant image registration via homographies. Contrary to existing approaches, no geometric assumptions are made and no depth information is necessary, enabling immediate translation to a robotic setup. Deviating from the dominant approach in the literature which consist of following a surgical tool, we do not handcraft the objective and no priors are imposed on the surgical scene, allowing the method to discover unbiased policies. In this new research field, significant improvements are demonstrated over two baselines on the Cholec80 and HeiChole datasets, showcasing an improvement of $47\%$ over camera motion continuation. The method is further shown to indeed predict camera motion correctly on the public motion classification labels of the AutoLaparo dataset. All code is made accessible on GitHub (https://github.com/RViMLab/homography_imitation_learning).

C. Bergeles and T. Vercauteren—These authors contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Transferring Relative Monocular Depth to Surgical Vision with Temporal Consistency

Learning how to robustly estimate camera pose in endoscopic videos

Article Open access 15 May 2023

Vision-Based Neurosurgical Guidance: Unsupervised Localization and Camera-Pose Prediction

References

Agrawal, A.S.: Automating endoscopic camera motion for teleoperated minimally invasive surgery using inverse reinforcement learning. Ph.D. thesis, Worcester Polytechnic Institute (2018)
Google Scholar
Budd, C., Garcia-Peraza Herrera, L.C., Huber, M., Ourselin, S., Vercauteren, T.: Rapid and robust endoscopic content area estimation: a lean GPU-based pipeline and curated benchmark dataset. Comput. Methods Biomech. Biomed. Eng. Imaging Vis. 11(4), 1215–1224 (2022). https://doi.org/10.1080/21681163.2022.2156393
Article Google Scholar
Cartucho, J., Tukra, S., Li, Y., Elson, D.S., Giannarou, S.: VisionBlender: a tool to efficiently generate computer vision datasets for robotic surgery. Comput. Methods Biomech. Biomed. Eng. Imaging Vis. 9(4), 331–338 (2021)
Article Google Scholar
Da Col, T., Mariani, A., Deguet, A., Menciassi, A., Kazanzides, P., De Momi, E.: SCAN: system for camera autonomous navigation in robotic-assisted surgery. In: 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2996–3002. IEEE (2020)
Google Scholar
Davenport, T., Kalakota, R.: The potential for artificial intelligence in healthcare. Future Healthc. J. 6(2), 94 (2019)
Article Google Scholar
DeTone, D., Malisiewicz, T., Rabinovich, A.: Deep image homography estimation (2016). http://arxiv.org/abs/1606.03798
Esteva, A., et al.: A guide to deep learning in healthcare. Nat. Med. 25(1), 24–29 (2019)
Article Google Scholar
Fiorini, P., Goldberg, K.Y., Liu, Y., Taylor, R.H.: Concepts and trends in autonomy for robot-assisted surgery. Proc. IEEE 110(7), 993–1011 (2022)
Article Google Scholar
Garcia-Peraza-Herrera, L.C., et al.: Robotic endoscope control via autonomous instrument tracking. Front. Robot. AI 9, 832208 (2022)
Article Google Scholar
Huber, M., Mitchell, J.B., Henry, R., Ourselin, S., Vercauteren, T., Bergeles, C.: Homography-based visual servoing with remote center of motion for semi-autonomous robotic endoscope manipulation. In: 2021 International Symposium on Medical Robotics (ISMR), pp. 1–7. IEEE (2021)
Google Scholar
Huber, M., Ourselin, S., Bergeles, C., Vercauteren, T.: Deep homography estimation in dynamic surgical scenes for laparoscopic camera motion extraction. Comput. Methods Biomech. Biomed. Eng. Imaging Visu. 10(3), 321–329 (2022)
Article Google Scholar
Ji, J.J., Krishnan, S., Patel, V., Fer, D., Goldberg, K.: Learning 2D surgical camera motion from demonstrations. In: 2018 IEEE 14th International Conference on Automation Science and Engineering (CASE), pp. 35–42. IEEE (2018)
Google Scholar
Kassahun, Y., et al.: Surgical robotics beyond enhanced dexterity instrumentation: a survey of machine learning techniques and their role in intelligent and autonomous surgical actions. Int. J. Comput. Assist. Radiol. Surg. 11, 553–568 (2016). https://doi.org/10.1007/s11548-015-1305-z
Article Google Scholar
Kitaguchi, D., Takeshita, N., Hasegawa, H., Ito, M.: Artificial intelligence-based computer vision in surgery: recent advances and future perspectives. Ann. Gastroenterological Surg. 6(1), 29–36 (2022)
Article Google Scholar
Li, B., Lu, B., Lu, Y., Dou, Q., Liu, Y.H.: Data-driven holistic framework for automated laparoscope optimal view control with learning-based depth perception. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 12366–12372. IEEE (2021)
Google Scholar
Li, B., Lu, B., Wang, Z., Zhong, F., Dou, Q., Liu, Y.H.: Learning laparoscope actions via video features for proactive robotic field-of-view control. IEEE Robot. Autom. Lett. 7(3), 6653–6660 (2022)
Article Google Scholar
Li, B., et al.: 3D perception based imitation learning under limited demonstration for laparoscope control in robotic surgery. In: 2022 International Conference on Robotics and Automation (ICRA), pp. 7664–7670. IEEE (2022)
Google Scholar
Loukas, C.: Video content analysis of surgical procedures. Surg. Endosc. 32, 553–568 (2018). https://doi.org/10.1007/s00464-017-5878-1
Article Google Scholar
Maier-Hein, L., et al.: Surgical data science-from concepts toward clinical translation. Med. Image Anal. 76, 102306 (2022)
Article Google Scholar
Marzullo, A., Moccia, S., Catellani, M., Calimeri, F., De Momi, E.: Towards realistic laparoscopic image generation using image-domain translation. Comput. Methods Programs Biomed. 200, 105834 (2021)
Article Google Scholar
Sandoval, J., Laribi, M.A., Faure, J., Breque, C., Richer, J.P., Zeghloul, S.: Towards an autonomous robot-assistant for laparoscopy using exteroceptive sensors: feasibility study and implementation. IEEE Robot. Autom. Lett. 6(4), 6473–6480 (2021)
Article Google Scholar
Scheikl, P.M., et al.: LapGym-an open source framework for reinforcement learning in robot-assisted laparoscopic surgery. arXiv preprint arXiv:2302.09606 (2023)
Su, Y.H., Huang, K., Hannaford, B.: Multicamera 3D viewpoint adjustment for robotic surgery via deep reinforcement learning. J. Med. Robot. Res. 6(01n02), 2140003 (2021)
Google Scholar
Sun, J., Shen, Z., Wang, Y., Bao, H., Zhou, X.: LoFTR: detector-free local feature matching with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8922–8931 (2021)
Google Scholar
Twinanda, A.P., Shehata, S., Mutter, D., Marescaux, J., De Mathelin, M., Padoy, N.: EndoNet: a deep architecture for recognition tasks on laparoscopic videos. IEEE Trans. Med. Imaging 36(1), 86–97 (2016)
Article Google Scholar
Wagner, M., et al.: A learning robot for cognitive camera control in minimally invasive surgery. Surg. Endosc. 35(9), 5365–5374 (2021). https://doi.org/10.1007/s00464-021-08509-8
Article Google Scholar
Wagner, M., et al.: Comparative validation of machine learning algorithms for surgical workflow and skill analysis with the heichole benchmark. Med. Image Anal. 86, 102770 (2023)
Article Google Scholar
Wang, Z., et al.: AutoLaparo: a new dataset of integrated multi-tasks for image-guided surgical automation in laparoscopic hysterectomy. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds.) MICCAI 2022, Part VII. LNCS, vol. 13437, pp. 486–496. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16449-1_46
Chapter Google Scholar
van Workum, F., Fransen, L., Luyer, M.D., Rosman, C.: Learning curves in minimally invasive esophagectomy. World J. Gastroenterol. 24(44), 4974 (2018)
Article Google Scholar
Zidane, I.F., Khattab, Y., Rezeka, S., El-Habrouk, M.: Robotics in laparoscopic surgery-a review. Robotica 41(1), 126–173 (2023)
Article Google Scholar

Download references

Acknowledgements

This work was supported by core and project funding from the Wellcome/EPSRC [WT203148/Z/16/Z; NS/A000049/1; WT101957; NS/A000027/1]. This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 101016985 (FAROS project). TV is supported by a Medtronic/RAEng Research Chair [RCSRF1819$\backslash $7$\backslash $34]. SO and TV are co-founders and shareholders of Hypervision Surgical. TV is co-founder and shareholder of Hypervision Surgical. TV holds shares from Mauna Kea Technologies.

Author information

Authors and Affiliations

School of Biomedical Engineering & Image Sciences, King’s College London, London, UK
Martin Huber, Sébastien Ourselin, Christos Bergeles & Tom Vercauteren

Authors

Martin Huber
View author publications
You can also search for this author in PubMed Google Scholar
Sébastien Ourselin
View author publications
You can also search for this author in PubMed Google Scholar
Christos Bergeles
View author publications
You can also search for this author in PubMed Google Scholar
Tom Vercauteren
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Martin Huber .

Editor information

Editors and Affiliations

Icahn School of Medicine, Mount Sinai, NYC, NY, USA, Tel Aviv University, Tel Aviv, Israel
Hayit Greenspan
Emory University, Atlanta, GA, USA
Anant Madabhushi
Queen’s University, Kingston, ON, Canada
Parvin Mousavi
The University of British Columbia, Vancouver, BC, Canada
Septimiu Salcudean
Yale University, New Haven, CT, USA
James Duncan
IBM Research, San Jose, CA, USA
Tanveer Syeda-Mahmood
Johns Hopkins University, Baltimore, MD, USA
Russell Taylor

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Huber, M., Ourselin, S., Bergeles, C., Vercauteren, T. (2023). Deep Homography Prediction for Endoscopic Camera Motion Imitation Learning. In: Greenspan, H., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2023. MICCAI 2023. Lecture Notes in Computer Science, vol 14228. Springer, Cham. https://doi.org/10.1007/978-3-031-43996-4_21

Download citation

DOI: https://doi.org/10.1007/978-3-031-43996-4_21
Published: 01 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43995-7
Online ISBN: 978-3-031-43996-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

Deep Homography Prediction for Endoscopic Camera Motion Imitation Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Transferring Relative Monocular Depth to Surgical Vision with Temporal Consistency

Learning how to robustly estimate camera pose in endoscopic videos

Vision-Based Neurosurgical Guidance: Unsupervised Localization and Camera-Pose Prediction

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

Deep Homography Prediction for Endoscopic Camera Motion Imitation Learning

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Transferring Relative Monocular Depth to Surgical Vision with Temporal Consistency

Learning how to robustly estimate camera pose in endoscopic videos

Vision-Based Neurosurgical Guidance: Unsupervised Localization and Camera-Pose Prediction

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation