Abstract
Phacoemulsification cataract surgery (PCS) is typically performed under a surgical microscope and adhering to standard procedures. The success of this surgery depends heavily on the seniority and experience of the ophthalmologist performing it. In this study, we developed an augmented reality (AR) guidance system to enhance the intraoperative skills of ophthalmologists by proposing a two-stage spatiotemporal learning network for surgical microscope video recognition. In the first stage, we designed a multi-task network that recognizes surgical phases and segments the limbus region to extract limbus-focused spatial features. In the second stage, we developed a temporal pyramid-based spatiotemporal feature aggregation (TP-SFA) module that uses causal and dilated temporal convolution for smooth and online surgical phase recognition. To provide phase-specific AR guidance, we designed several intraoperative visual cues based on the parameters of the fitted limbus ellipse and the recognized surgical phase. The comparison experiments results indicate that our method outperforms several strong baselines in surgical phase recognition. Furthermore, ablation experiments show the positive effects of the multi-task feature extractor and TP-SFA module. Our developed system has the potential for clinical application in PCS to provide real-time intraoperative AR guidance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Al Hajj, H., et al.: CATARACTS: challenge on automatic tool annotation for cataract surgery. Med. Image Anal. 52, 24–41 (2019)
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555 (2014)
Czempiel, T., et al.: TeCNO: surgical phase recognition with multi-stage temporal convolutional networks. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12263, pp. 343–352. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59716-0_33
Czempiel, T., Paschali, M., Ostler, D., Kim, S.T., Busam, B., Navab, N.: OperA: attention-regularized transformers for surgical phase recognition. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12904, pp. 604–614. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87202-1_58
Day, A.C., Gore, D.M., Bunce, C., Evans, J.R.: Laser-assisted cataract surgery versus standard ultrasound phacoemulsification cataract surgery. Cochrane Database of Systematic Reviews (7) (2016)
Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2650–2658 (2015)
Farha, Y.A., Gall, J.: MS-TCN: multi-stage temporal convolutional network for action segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3575–3584 (2019)
Gao, X., Jin, Y., Long, Y., Dou, Q., Heng, P.-A.: Trans-SVNet: accurate phase recognition from surgical videos via hybrid embedding aggregation transformer. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12904, pp. 593–603. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87202-1_57
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Jin, Y., et al.: SV-RCNet: workflow recognition from surgical videos using recurrent convolutional network. IEEE Trans. Med. Imaging 37(5), 1114–1126 (2017)
Jin, Y., Long, Y., Gao, X., Stoyanov, D., Dou, Q., Heng, P.A.: Trans-SVNet: hybrid embedding aggregation transformer for surgical workflow analysis. Int. J. Comput. Assist. Radiol. Surg. 17(12), 2193–2202 (2022)
Lea, C., Vidal, R., Reiter, A., Hager, G.D.: Temporal convolutional networks: a unified approach to action segmentation. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 47–54. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_7
Lee, J.S., Hou, C.H., Lin, K.K.: Surgical results of phacoemulsification performed by residents: a time-trend analysis in a teaching hospital from 2005 to 2021. J. Ophthalmol. 2022 (2022)
Ma, L., Fei, B.: Comprehensive review of surgical microscopes: technology development and medical applications. J. Biomed. Opt. 26(1), 010901–010901 (2021)
Nespolo, R.G., Yi, D., Cole, E., Valikodath, N., Luciano, C., Leiderman, Y.I.: Evaluation of artificial intelligence-based intraoperative guidance tools for phacoemulsification cataract surgery. JAMA Ophthalmol. 140(2), 170–177 (2022)
Nespolo, R.G., Yi, D., Cole, E., Wang, D., Warren, A., Leiderman, Y.I.: Feature tracking and segmentation in real time via deep learning in vitreoretinal surgery-a platform for artificial intelligence-mediated surgical guidance. Ophthalmol. Retina 7(3), 236–242 (2022)
Primus, M.J.: Frame-based classification of operation phases in cataract surgery videos. In: Schoeffmann, K., et al. (eds.) MMM 2018. LNCS, vol. 10704, pp. 241–253. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73603-7_20
Quellec, G., Lamard, M., Cochener, B., Cazuguel, G.: Real-time task recognition in cataract surgery videos using adaptive spatiotemporal polynomials. IEEE Trans. Med. Imaging 34(4), 877–887 (2014)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Russakovsky, O.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015)
Twinanda, A.P., Shehata, S., Mutter, D., Marescaux, J., De Mathelin, M., Padoy, N.: EndoNet: a deep architecture for recognition tasks on laparoscopic videos. IEEE Trans. Med. Imaging 36(1), 86–97 (2016)
Wang, W., et al.: Cataract surgical rate and socioeconomics: a global study. Invest. Ophthalmol. Vis. Sci. 57(14), 5872–5881 (2016)
Yi, F., Yang, Y., Jiang, T.: Not end-to-end: explore multi-stage architecture for online surgical phase recognition. In: Proceedings of the Asian Conference on Computer Vision, pp. 2613–2628 (2022)
Zhai, Y., et al.: Computer-aided intraoperative toric intraocular lens positioning and alignment during cataract surgery. IEEE J. Biomed. Health Inform. 25(10), 3921–3932 (2021)
Zhao, W., Zhang, Z., Wang, Z., Guo, Y., Xie, J., Xu, X.: ECLNet: center localization of eye structures based on adaptive gaussian ellipse heatmap. Comput. Biol. Med. 153, 106485 (2023)
Zou, X., Liu, W., Wang, J., Tao, R., Zheng, G.: ARST: auto-regressive surgical transformer for phase recognition from laparoscopic videos. Comput. Meth. Biomech. Biomed. Eng. Imaging Visual. 11, 1012–1018 (2022)
Acknowledgements
This work was supported by grants from the National Natural Science Foundation of China (81971709; M-0019; 82011530141), the Foundation of Science and Technology Commission of Shanghai Municipality (20490740700; 22Y11911700), Shanghai Jiao Tong University Foundation on Medical and Technological Joint Science Research (YG2021ZD21; YG2021QN72; YG2022QN056; YG2023ZD19; YG2023ZD15), Hospital Funded Clinical Research, Xinhua Hospital Affiliated to Shanghai Jiao Tong University School of Medicine (21XJMR02), and the Funding of Xiamen Science and Technology Bureau (No. 3502Z20221012).
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
1 Electronic supplementary material
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Tu, P., Ye, H., Young, J., Xie, M., Zheng, C., Chen, X. (2023). Efficient Spatiotemporal Learning of Microscopic Video for Augmented Reality-Guided Phacoemulsification Cataract Surgery. In: Greenspan, H., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2023. MICCAI 2023. Lecture Notes in Computer Science, vol 14226. Springer, Cham. https://doi.org/10.1007/978-3-031-43990-2_64
Download citation
DOI: https://doi.org/10.1007/978-3-031-43990-2_64
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-43989-6
Online ISBN: 978-3-031-43990-2
eBook Packages: Computer ScienceComputer Science (R0)