Efficient Spatiotemporal Learning of Microscopic Video for Augmented Reality-Guided Phacoemulsification Cataract Surgery

Tu, Puxun; Ye, Hongfei; Young, Jeff; Xie, Meng; Zheng, Ce; Chen, Xiaojun

doi:10.1007/978-3-031-43990-2_64

Puxun Tu ORCID: orcid.org/0000-0003-4809-9081¹⁴,
Hongfei Ye¹⁵,
Jeff Young¹⁶,
Meng Xie¹⁵,
Ce Zheng¹⁵ &
…
Xiaojun Chen ORCID: orcid.org/0000-0002-0298-4491^14,17

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14226))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

2901 Accesses

Abstract

Phacoemulsification cataract surgery (PCS) is typically performed under a surgical microscope and adhering to standard procedures. The success of this surgery depends heavily on the seniority and experience of the ophthalmologist performing it. In this study, we developed an augmented reality (AR) guidance system to enhance the intraoperative skills of ophthalmologists by proposing a two-stage spatiotemporal learning network for surgical microscope video recognition. In the first stage, we designed a multi-task network that recognizes surgical phases and segments the limbus region to extract limbus-focused spatial features. In the second stage, we developed a temporal pyramid-based spatiotemporal feature aggregation (TP-SFA) module that uses causal and dilated temporal convolution for smooth and online surgical phase recognition. To provide phase-specific AR guidance, we designed several intraoperative visual cues based on the parameters of the fitted limbus ellipse and the recognized surgical phase. The comparison experiments results indicate that our method outperforms several strong baselines in surgical phase recognition. Furthermore, ablation experiments show the positive effects of the multi-task feature extractor and TP-SFA module. Our developed system has the potential for clinical application in PCS to provide real-time intraoperative AR guidance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Al Hajj, H., et al.: CATARACTS: challenge on automatic tool annotation for cataract surgery. Med. Image Anal. 52, 24–41 (2019)
Article Google Scholar
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)
Czempiel, T., et al.: TeCNO: surgical phase recognition with multi-stage temporal convolutional networks. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12263, pp. 343–352. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59716-0_33
Chapter Google Scholar
Czempiel, T., Paschali, M., Ostler, D., Kim, S.T., Busam, B., Navab, N.: OperA: attention-regularized transformers for surgical phase recognition. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12904, pp. 604–614. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87202-1_58
Chapter Google Scholar
Day, A.C., Gore, D.M., Bunce, C., Evans, J.R.: Laser-assisted cataract surgery versus standard ultrasound phacoemulsification cataract surgery. Cochrane Database of Systematic Reviews (7) (2016)
Google Scholar
Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2650–2658 (2015)
Google Scholar
Farha, Y.A., Gall, J.: MS-TCN: multi-stage temporal convolutional network for action segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3575–3584 (2019)
Google Scholar
Gao, X., Jin, Y., Long, Y., Dou, Q., Heng, P.-A.: Trans-SVNet: accurate phase recognition from surgical videos via hybrid embedding aggregation transformer. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12904, pp. 593–603. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87202-1_57
Chapter Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Jin, Y., et al.: SV-RCNet: workflow recognition from surgical videos using recurrent convolutional network. IEEE Trans. Med. Imaging 37(5), 1114–1126 (2017)
Article MathSciNet Google Scholar
Jin, Y., Long, Y., Gao, X., Stoyanov, D., Dou, Q., Heng, P.A.: Trans-SVNet: hybrid embedding aggregation transformer for surgical workflow analysis. Int. J. Comput. Assist. Radiol. Surg. 17(12), 2193–2202 (2022)
Article Google Scholar
Lea, C., Vidal, R., Reiter, A., Hager, G.D.: Temporal convolutional networks: a unified approach to action segmentation. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 47–54. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_7
Chapter Google Scholar
Lee, J.S., Hou, C.H., Lin, K.K.: Surgical results of phacoemulsification performed by residents: a time-trend analysis in a teaching hospital from 2005 to 2021. J. Ophthalmol. 2022 (2022)
Google Scholar
Ma, L., Fei, B.: Comprehensive review of surgical microscopes: technology development and medical applications. J. Biomed. Opt. 26(1), 010901–010901 (2021)
Article Google Scholar
Nespolo, R.G., Yi, D., Cole, E., Valikodath, N., Luciano, C., Leiderman, Y.I.: Evaluation of artificial intelligence-based intraoperative guidance tools for phacoemulsification cataract surgery. JAMA Ophthalmol. 140(2), 170–177 (2022)
Article Google Scholar
Nespolo, R.G., Yi, D., Cole, E., Wang, D., Warren, A., Leiderman, Y.I.: Feature tracking and segmentation in real time via deep learning in vitreoretinal surgery-a platform for artificial intelligence-mediated surgical guidance. Ophthalmol. Retina 7(3), 236–242 (2022)
Article Google Scholar
Primus, M.J.: Frame-based classification of operation phases in cataract surgery videos. In: Schoeffmann, K., et al. (eds.) MMM 2018. LNCS, vol. 10704, pp. 241–253. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73603-7_20
Chapter Google Scholar
Quellec, G., Lamard, M., Cochener, B., Cazuguel, G.: Real-time task recognition in cataract surgery videos using adaptive spatiotemporal polynomials. IEEE Trans. Med. Imaging 34(4), 877–887 (2014)
Article Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Russakovsky, O.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015)
Article MathSciNet Google Scholar
Twinanda, A.P., Shehata, S., Mutter, D., Marescaux, J., De Mathelin, M., Padoy, N.: EndoNet: a deep architecture for recognition tasks on laparoscopic videos. IEEE Trans. Med. Imaging 36(1), 86–97 (2016)
Article Google Scholar
Wang, W., et al.: Cataract surgical rate and socioeconomics: a global study. Invest. Ophthalmol. Vis. Sci. 57(14), 5872–5881 (2016)
Article Google Scholar
Yi, F., Yang, Y., Jiang, T.: Not end-to-end: explore multi-stage architecture for online surgical phase recognition. In: Proceedings of the Asian Conference on Computer Vision, pp. 2613–2628 (2022)
Google Scholar
Zhai, Y., et al.: Computer-aided intraoperative toric intraocular lens positioning and alignment during cataract surgery. IEEE J. Biomed. Health Inform. 25(10), 3921–3932 (2021)
Article Google Scholar
Zhao, W., Zhang, Z., Wang, Z., Guo, Y., Xie, J., Xu, X.: ECLNet: center localization of eye structures based on adaptive gaussian ellipse heatmap. Comput. Biol. Med. 153, 106485 (2023)
Article Google Scholar
Zou, X., Liu, W., Wang, J., Tao, R., Zheng, G.: ARST: auto-regressive surgical transformer for phase recognition from laparoscopic videos. Comput. Meth. Biomech. Biomed. Eng. Imaging Visual. 11, 1012–1018 (2022)
Google Scholar

Download references

Acknowledgements

This work was supported by grants from the National Natural Science Foundation of China (81971709; M-0019; 82011530141), the Foundation of Science and Technology Commission of Shanghai Municipality (20490740700; 22Y11911700), Shanghai Jiao Tong University Foundation on Medical and Technological Joint Science Research (YG2021ZD21; YG2021QN72; YG2022QN056; YG2023ZD19; YG2023ZD15), Hospital Funded Clinical Research, Xinhua Hospital Affiliated to Shanghai Jiao Tong University School of Medicine (21XJMR02), and the Funding of Xiamen Science and Technology Bureau (No. 3502Z20221012).

Author information

Authors and Affiliations

Institute of Biomedical Manufacturing and Life Quality Engineering, School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai, China
Puxun Tu & Xiaojun Chen
Department of Ophthalmology, Xinhua Hospital Affiliated to Shanghai Jiao Tong University School of Medicine, Shanghai, China
Hongfei Ye, Meng Xie & Ce Zheng
Department of Bioengineering, University of Texas at Dallas, Richardson, USA
Jeff Young
Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, China
Xiaojun Chen

Authors

Puxun Tu
View author publications
You can also search for this author in PubMed Google Scholar
Hongfei Ye
View author publications
You can also search for this author in PubMed Google Scholar
Jeff Young
View author publications
You can also search for this author in PubMed Google Scholar
Meng Xie
View author publications
You can also search for this author in PubMed Google Scholar
Ce Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Xiaojun Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Ce Zheng or Xiaojun Chen .

Editor information

Editors and Affiliations

Icahn School of Medicine, Mount Sinai, NYC, NY, USA, Tel Aviv University, Tel Aviv, Israel
Hayit Greenspan
Emory University, Atlanta, GA, USA
Anant Madabhushi
Queen’s University, Kingston, ON, Canada
Parvin Mousavi
The University of British Columbia, Vancouver, BC, Canada
Septimiu Salcudean
Yale University, New Haven, CT, USA
James Duncan
IBM Research, San Jose, CA, USA
Tanveer Syeda-Mahmood
Johns Hopkins University, Baltimore, MD, USA
Russell Taylor

1 Electronic supplementary material

Supplementary material 1 (mp4 80263 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tu, P., Ye, H., Young, J., Xie, M., Zheng, C., Chen, X. (2023). Efficient Spatiotemporal Learning of Microscopic Video for Augmented Reality-Guided Phacoemulsification Cataract Surgery. In: Greenspan, H., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2023. MICCAI 2023. Lecture Notes in Computer Science, vol 14226. Springer, Cham. https://doi.org/10.1007/978-3-031-43990-2_64

Download citation

DOI: https://doi.org/10.1007/978-3-031-43990-2_64
Published: 01 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43989-6
Online ISBN: 978-3-031-43990-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

Efficient Spatiotemporal Learning of Microscopic Video for Augmented Reality-Guided Phacoemulsification Cataract Surgery