Abstract
Purpose
Recent advances in computer vision and machine learning have resulted in endoscopic video-based solutions for dense reconstruction of the anatomy. To effectively use these systems in surgical navigation, a reliable image-based technique is required to constantly track the endoscopic camera’s position within the anatomy, despite frequent removal and re-insertion. In this work, we investigate the use of recent learning-based keypoint descriptors for six degree-of-freedom camera pose estimation in intraoperative endoscopic sequences and under changes in anatomy due to surgical resection.
Methods
Our method employs a dense structure from motion (SfM) reconstruction of the preoperative anatomy, obtained with a state-of-the-art patient-specific learning-based descriptor. During the reconstruction step, each estimated 3D point is associated with a descriptor. This information is employed in the intraoperative sequences to establish 2D–3D correspondences for Perspective-n-Point (PnP) camera pose estimation. We evaluate this method in six intraoperative sequences that include anatomical modifications obtained from two cadaveric subjects.
Results
Show that this approach led to translation and rotation errors of 3.9 mm and 0.2 radians, respectively, with 21.86% of localized cameras averaged over the six sequences. In comparison to an additional learning-based descriptor (HardNet++), the selected descriptor can achieve a better percentage of localized cameras with similar pose estimation performance. We further discussed potential error causes and limitations of the proposed approach.
Conclusion
Patient-specific learning-based descriptors can relocalize images that are well distributed across the inspected anatomy, even where the anatomy is modified. However, camera relocalization in endoscopic sequences remains a persistently challenging problem, and future research is necessary to increase the robustness and accuracy of this technique.
Similar content being viewed by others
Code availability
The source code is available at https://github.com/arcadelab/camera-relocalization.
References
Mirota DJ, Masaru I, Hager GD (2011) Vision-based navigation in image-guided interventions. Annu Rev Biomed Eng 13:297–319
Yeung BPM, Gourlay T (2012) A technical review of flexible endoscopic multitasking platforms. Int J Surg 10(7):345–54
Liu X, Zheng Y, Killeen B, Ishii M, Hager GD, Taylor RH, Unberath M (2020) Extremely dense point correspondences using a learned feature descriptor. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4847–4856
Liu X, Stiber M, Huang J, Ishii M, Hager GD, Taylor RH, Unberath M (2020) Reconstructing sinus anatomy from endoscopic video—towards a radiation-free approach for quantitative longitudinal assessment. In: Martel AL, Abolmaesumi P, Stoyanov D, Mateus D, Zuluaga MA, Zhou SK, Racoceanu D, Joskowicz L (eds) Medical image computing and computer assisted intervention—MICCAI 2020. Springer, Cham, pp 3–13
Liu X, Li Z, Ishii M, Hager GD, Taylor RH, Unberath M (2022) SAGE: SLAM with appearance and geometry prior for endoscopy. In: ICRA
Waelkens P, Van Oosterom M, Van den Berg N, Navab N, Leeuwen FWB (2016) Surgical navigation: an overview of the state-of-the-art clinical applications. In: Radioguided surgery
Schonberger JL, Frahm J-M (2016) Structure-from-motion revisited. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4104–4113
Kendall A, Grimes M, Cipolla R (2015) PoseNet: a convolutional network for real-time 6-DOF camera relocalization. In: Proceedings of the IEEE international conference on computer vision, pp 2938–2946
Sattler T, Zhou Q, Pollefeys M, Leal-Taixe L (2019) Understanding the limitations of CNN-based absolute camera pose regression. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3302–3312
Mildenhall B, Srinivasan PP, Tancik M, Barron JT, Ramamoorthi R, Ng R (2021) NeRF: representing scenes as neural radiance fields for view synthesis. Commun ACM 65(1):99–106
Sattler T, Leibe B, Kobbelt L (2011) Fast image-based localization using direct 2D-to-3D matching. In: 2011 international conference on computer vision, pp 667–674. IEEE
Lepetit V, Moreno-Noguer F, Fua P (2009) EPnP: an accurate O(n) solution to the PnP problem. Int J Comput Vis. https://doi.org/10.1007/s11263-008-0152-6
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110. https://doi.org/10.1023/B:VISI.0000029664.99615.94
Strobl K, Hirzinger G (2006) Optimal hand-eye calibration, pp 4647–4653. https://doi.org/10.1109/IROS.2006.282250
Vagdargi P, Uneri A, Jones C, Wu P, Han R, Luciano M, Anderson W, Hager G, Siewerdsen J (2021) Robot-assisted ventriculoscopic 3D reconstruction for guidance of deep-brain stimulation surgery. In: Medical imaging 2021: image-guided procedures, robotic interventions, and modeling, vol 11598, pp 47–54. SPIE
Moreno-Noguer F, Lepetit V, Fua P (2007) Accurate non-iterative O(n) solution to the PnP problem. In: 11th IEEE international conference on computer vision
Mishchuk A, Mishkin D, Radenovic F, Matas J (2017) Working hard to know your neighbor’s margins: Local descriptor learning loss. In: Advances in neural information processing systems, vol 30
Acknowledgements
Isabela Hernández acknowledges the support of the 2021 Uniandes-DeepMind Scholarship.
Funding
This work was funded in part by Johns Hopkins University internal funds and in part by NIH R01EB030511. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Ethics approval
Not necessary for this work.
Consent to participate
This study was performed under the approved IRB00267324 protocol on non-living subjects, for which informed consent was not required.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Hernández, I., Soberanis-Mukul, R., Mangulabnan, J.E. et al. Investigating keypoint descriptors for camera relocalization in endoscopy surgery. Int J CARS 18, 1135–1142 (2023). https://doi.org/10.1007/s11548-023-02918-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11548-023-02918-x