Abstract
Deformable tracking and real-time estimation of 3D tissue motion is essential to enable automation and image guidance applications in robotically assisted surgery. Our model, Sparse Efficient Neural Depth and Deformation (SENDD), extends prior 2D tracking work to estimate flow in 3D space. SENDD introduces novel contributions of learned detection, and sparse per-point depth and 3D flow estimation, all with less than half a million parameters. SENDD does this by using graph neural networks of sparse keypoint matches to estimate both depth and 3D flow anywhere. We quantify and benchmark SENDD on a comprehensively labelled tissue dataset, and compare it to an equivalent 2D flow model. SENDD performs comparably while enabling applications that 2D flow cannot. SENDD can track points and estimate depth at 10fps on an NVIDIA RTX 4000 for 1280 tracked (query) points and its cost scales linearly with an increasing/decreasing number of points. SENDD enables multiple downstream applications that require estimation of 3D motion in stereo endoscopy.
This work was supported by Intuitive Surgical.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Barbed, O.L., Chadebecq, F., Morlana, J., Montiel, J.M.M., Murillo, A.C.: Super-point features in endoscopy. In: Imaging Systems for GI Endoscopy, and Graphs in Biomedical Image Analysis. LNCS, pp. 45–55. Springer, Cham (2022)
Cartucho, J., et al.: SurgT: soft-tissue tracking for robotic surgery, benchmark and challenge (2023). https://doi.org/10.48550/ARXIV.2302.03022
DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperPoint: self-supervised interest point detection and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (2018)
Geiger, A., Roser, M., Urtasun, R.: Efficient large-scale stereo matching. In: Kimmel, R., Klette, R., Sugimoto, A. (eds.) ACCV 2010. LNCS, vol. 6492, pp. 25–38. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19315-6_3
Giannarou, S., Ye, M., Gras, G., Leibrandt, K., Marcus, H., Yang, G.: Vision-based deformation recovery for intraoperative force estimation of tool-tissue interaction for neurosurgery. IJCARS 11, 929–936 (2016)
Gómez-Rodríguez, J.J., Lamarca, J., Morlana, J., Tardós, J.D., Montiel, J.M.M.: SD-DefSLAM: semi-direct monocular slam for deformable and intracorporeal scenes. In: 2021 IEEE International Conference on Robotics and Automation (ICRA), pp. 5170–5177 (May 2021)
Jonschkowski, R., Stone, A., Barron, J.T., Gordon, A., Konolige, K., Angelova, A.: What matters in unsupervised optical flow. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M. (eds.) ECCV 2020. LNCS, pp. 557–572. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_33
Kalia, M., Mathur, P., Tsang, K., Black, P., Navab, N., Salcudean, S.: Evaluation of a marker-less, intra-operative, augmented reality guidance system for robot-assisted laparoscopic radical prostatectomy. Int. J. CARS 15(7), 1225–1233 (2020)
Lamarca, J., Parashar, S., Bartoli, A., Montiel, J.: DefSLAM: tracking and mapping of deforming scenes from monocular sequences. IEEE Trans. Rob. 37(1), 291–303 (2021). https://doi.org/10.1109/TRO.2020.3020739
Lamarca, J., Gómez Rodríguez, J.J., Tardós, J.D., Montiel, J.: Direct and sparse deformable tracking. IEEE Robot. Autom. Lett. 7(4), 11450–11457 (2022). https://doi.org/10.1109/LRA.2022.3201253
Li, Y., et al.: SuPer: a surgical perception framework for endoscopic tissue manipulation with surgical robotics. IEEE Robot. Autom. Lett. 5(2), 2294–2301 (2020)
Lin, S., et al.: Semantic-SuPer: a semantic-aware surgical perception framework for endoscopic tissue identification, reconstruction, and tracking. In: 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 4739–4746 (2023). https://doi.org/10.1109/ICRA48891.2023.10160746
Lu, J., Jayakumari, A., Richter, F., Li, Y., Yip, M.C.: Super deep: a surgical perception framework for robotic tissue manipulation using deep learning for feature extraction. In: ICRA. IEEE (2021)
Lukezic, A., Vojir, T., Zajc, L.C., Matas, J., Kristan, M.: Discriminative correlation filter with channel and spatial reliability. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4847–4856. IEEE, Honolulu, HI (2017). https://doi.org/10.1109/CVPR.2017.515
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: representing scenes as neural radiance fields for view synthesis. Commun. ACM 65(1), 99–106 (2021). https://doi.org/10.1145/3503250
Recasens, D., Lamarca, J., Fácil, J.M., Montiel, J.M.M., Civera, J.: Endo-depth-and-motion: reconstruction and tracking in endoscopic videos using depth networks and photometric constraints. IEEE Robot. Autom. Lett. 6(4), 7225–7232 (2021). https://doi.org/10.1109/LRA.2021.3095528
Richa, R., Bó, A.P., Poignet, P.: Towards robust 3D visual tracking for motion compensation in beating heart surgery. Med. Image Anal. 15, 302–315 (2011)
Sarlin, P.E., DeTone, D., Malisiewicz, T., Rabinovich, A.: SuperGlue: learning feature matching with graph neural networks. In: CVPR (2020)
Schmidt, A., Mohareri, O., DiMaio, S.P., Salcudean, S.E.: Fast graph refinement and implicit neural representation for tissue tracking. In: ICRA (2022)
Schmidt, A., Salcudean, S.E.: Real-time rotated convolutional descriptor for surgical environments. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12904, pp. 279–289. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87202-1_27
Song, J., Wang, J., Zhao, L., Huang, S., Dissanayake, G.: MIS-SLAM: real-time large-scale dense deformable slam system in minimal invasive surgery based on heterogeneous computing. IEEE Robot. Autom. Lett. 3, 4068–4075 (2018)
Song, J., Zhu, Q., Lin, J., Ghaffari, M.: BDIS: Bayesian dense inverse searching method for real-time stereo surgical image matching. IEEE Trans. Rob. 39(2), 1388–1406 (2023)
Tancik, M., et al.: Fourier features let networks learn high frequency functions in low dimensional domains. In: NeurIPS (2020)
Teed, Z., Deng, J.: RAFT: recurrent all-pairs field transforms for optical flow. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 402–419. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_24
Teed, Z., Deng, J.: RAFT-3D: scene flow using rigid-motion embeddings. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8375–8384 (2021)
Wang, Y., Long, Y., Fan, S.H., Dou, Q.: Neural rendering for stereo 3D reconstruction of deformable tissues in robotic surgery. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds.) MICCAI 2022. LNCS, pp. 431–441. Springer, Cham (2022)
Zhang, F., Prisacariu, V., Yang, R., Torr, P.H.: GA-Net: guided aggregation net for end-to-end stereo matching. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 185–194. IEEE, Long Beach, CA, USA (2019). https://doi.org/10.1109/CVPR.2019.00027
Zhang, Y., et al.: ColDE: a depth estimation framework for colonoscopy reconstruction. arXiv:2111.10371 [cs, eess] (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Schmidt, A., Mohareri, O., DiMaio, S., Salcudean, S.E. (2023). SENDD: Sparse Efficient Neural Depth and Deformation for Tissue Tracking. In: Greenspan, H., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2023. MICCAI 2023. Lecture Notes in Computer Science, vol 14228. Springer, Cham. https://doi.org/10.1007/978-3-031-43996-4_23
Download citation
DOI: https://doi.org/10.1007/978-3-031-43996-4_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43995-7
Online ISBN: 978-3-031-43996-4
eBook Packages: Computer ScienceComputer Science (R0)