Skip to main content

Efficient Spatiotemporal Learning of Microscopic Video for Augmented Reality-Guided Phacoemulsification Cataract Surgery

  • Conference paper
  • First Online:
Medical Image Computing and Computer Assisted Intervention – MICCAI 2023 (MICCAI 2023)

Abstract

Phacoemulsification cataract surgery (PCS) is typically performed under a surgical microscope and adhering to standard procedures. The success of this surgery depends heavily on the seniority and experience of the ophthalmologist performing it. In this study, we developed an augmented reality (AR) guidance system to enhance the intraoperative skills of ophthalmologists by proposing a two-stage spatiotemporal learning network for surgical microscope video recognition. In the first stage, we designed a multi-task network that recognizes surgical phases and segments the limbus region to extract limbus-focused spatial features. In the second stage, we developed a temporal pyramid-based spatiotemporal feature aggregation (TP-SFA) module that uses causal and dilated temporal convolution for smooth and online surgical phase recognition. To provide phase-specific AR guidance, we designed several intraoperative visual cues based on the parameters of the fitted limbus ellipse and the recognized surgical phase. The comparison experiments results indicate that our method outperforms several strong baselines in surgical phase recognition. Furthermore, ablation experiments show the positive effects of the multi-task feature extractor and TP-SFA module. Our developed system has the potential for clinical application in PCS to provide real-time intraoperative AR guidance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Al Hajj, H., et al.: CATARACTS: challenge on automatic tool annotation for cataract surgery. Med. Image Anal. 52, 24–41 (2019)

    Article  Google Scholar 

  2. Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)

  3. Czempiel, T., et al.: TeCNO: surgical phase recognition with multi-stage temporal convolutional networks. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12263, pp. 343–352. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59716-0_33

    Chapter  Google Scholar 

  4. Czempiel, T., Paschali, M., Ostler, D., Kim, S.T., Busam, B., Navab, N.: OperA: attention-regularized transformers for surgical phase recognition. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12904, pp. 604–614. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87202-1_58

    Chapter  Google Scholar 

  5. Day, A.C., Gore, D.M., Bunce, C., Evans, J.R.: Laser-assisted cataract surgery versus standard ultrasound phacoemulsification cataract surgery. Cochrane Database of Systematic Reviews (7) (2016)

    Google Scholar 

  6. Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2650–2658 (2015)

    Google Scholar 

  7. Farha, Y.A., Gall, J.: MS-TCN: multi-stage temporal convolutional network for action segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3575–3584 (2019)

    Google Scholar 

  8. Gao, X., Jin, Y., Long, Y., Dou, Q., Heng, P.-A.: Trans-SVNet: accurate phase recognition from surgical videos via hybrid embedding aggregation transformer. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12904, pp. 593–603. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87202-1_57

    Chapter  Google Scholar 

  9. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  10. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  11. Jin, Y., et al.: SV-RCNet: workflow recognition from surgical videos using recurrent convolutional network. IEEE Trans. Med. Imaging 37(5), 1114–1126 (2017)

    Article  MathSciNet  Google Scholar 

  12. Jin, Y., Long, Y., Gao, X., Stoyanov, D., Dou, Q., Heng, P.A.: Trans-SVNet: hybrid embedding aggregation transformer for surgical workflow analysis. Int. J. Comput. Assist. Radiol. Surg. 17(12), 2193–2202 (2022)

    Article  Google Scholar 

  13. Lea, C., Vidal, R., Reiter, A., Hager, G.D.: Temporal convolutional networks: a unified approach to action segmentation. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 47–54. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_7

    Chapter  Google Scholar 

  14. Lee, J.S., Hou, C.H., Lin, K.K.: Surgical results of phacoemulsification performed by residents: a time-trend analysis in a teaching hospital from 2005 to 2021. J. Ophthalmol. 2022 (2022)

    Google Scholar 

  15. Ma, L., Fei, B.: Comprehensive review of surgical microscopes: technology development and medical applications. J. Biomed. Opt. 26(1), 010901–010901 (2021)

    Article  Google Scholar 

  16. Nespolo, R.G., Yi, D., Cole, E., Valikodath, N., Luciano, C., Leiderman, Y.I.: Evaluation of artificial intelligence-based intraoperative guidance tools for phacoemulsification cataract surgery. JAMA Ophthalmol. 140(2), 170–177 (2022)

    Article  Google Scholar 

  17. Nespolo, R.G., Yi, D., Cole, E., Wang, D., Warren, A., Leiderman, Y.I.: Feature tracking and segmentation in real time via deep learning in vitreoretinal surgery-a platform for artificial intelligence-mediated surgical guidance. Ophthalmol. Retina 7(3), 236–242 (2022)

    Article  Google Scholar 

  18. Primus, M.J.: Frame-based classification of operation phases in cataract surgery videos. In: Schoeffmann, K., et al. (eds.) MMM 2018. LNCS, vol. 10704, pp. 241–253. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73603-7_20

    Chapter  Google Scholar 

  19. Quellec, G., Lamard, M., Cochener, B., Cazuguel, G.: Real-time task recognition in cataract surgery videos using adaptive spatiotemporal polynomials. IEEE Trans. Med. Imaging 34(4), 877–887 (2014)

    Article  Google Scholar 

  20. Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28

    Chapter  Google Scholar 

  21. Russakovsky, O.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  22. Twinanda, A.P., Shehata, S., Mutter, D., Marescaux, J., De Mathelin, M., Padoy, N.: EndoNet: a deep architecture for recognition tasks on laparoscopic videos. IEEE Trans. Med. Imaging 36(1), 86–97 (2016)

    Article  Google Scholar 

  23. Wang, W., et al.: Cataract surgical rate and socioeconomics: a global study. Invest. Ophthalmol. Vis. Sci. 57(14), 5872–5881 (2016)

    Article  Google Scholar 

  24. Yi, F., Yang, Y., Jiang, T.: Not end-to-end: explore multi-stage architecture for online surgical phase recognition. In: Proceedings of the Asian Conference on Computer Vision, pp. 2613–2628 (2022)

    Google Scholar 

  25. Zhai, Y., et al.: Computer-aided intraoperative toric intraocular lens positioning and alignment during cataract surgery. IEEE J. Biomed. Health Inform. 25(10), 3921–3932 (2021)

    Article  Google Scholar 

  26. Zhao, W., Zhang, Z., Wang, Z., Guo, Y., Xie, J., Xu, X.: ECLNet: center localization of eye structures based on adaptive gaussian ellipse heatmap. Comput. Biol. Med. 153, 106485 (2023)

    Article  Google Scholar 

  27. Zou, X., Liu, W., Wang, J., Tao, R., Zheng, G.: ARST: auto-regressive surgical transformer for phase recognition from laparoscopic videos. Comput. Meth. Biomech. Biomed. Eng. Imaging Visual. 11, 1012–1018 (2022)

    Google Scholar 

Download references

Acknowledgements

This work was supported by grants from the National Natural Science Foundation of China (81971709; M-0019; 82011530141), the Foundation of Science and Technology Commission of Shanghai Municipality (20490740700; 22Y11911700), Shanghai Jiao Tong University Foundation on Medical and Technological Joint Science Research (YG2021ZD21; YG2021QN72; YG2022QN056; YG2023ZD19; YG2023ZD15), Hospital Funded Clinical Research, Xinhua Hospital Affiliated to Shanghai Jiao Tong University School of Medicine (21XJMR02), and the Funding of Xiamen Science and Technology Bureau (No. 3502Z20221012).

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Ce Zheng or Xiaojun Chen .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tu, P., Ye, H., Young, J., Xie, M., Zheng, C., Chen, X. (2023). Efficient Spatiotemporal Learning of Microscopic Video for Augmented Reality-Guided Phacoemulsification Cataract Surgery. In: Greenspan, H., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2023. MICCAI 2023. Lecture Notes in Computer Science, vol 14226. Springer, Cham. https://doi.org/10.1007/978-3-031-43990-2_64

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-43990-2_64

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-43989-6

  • Online ISBN: 978-3-031-43990-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics