Visual Modalities Based Multimodal Fusion for Surgical Phase Recognition

Park, Bogyu; Chi, Hyeongyu; Park, Bokyung; Lee, Jiwon; Park, Sunghyun; Hyung, Woo Jin; Choi, Min-Kook

doi:10.1007/978-3-031-18814-5_2

Bogyu Park¹³,
Hyeongyu Chi¹³,
Bokyung Park¹³,
Jiwon Lee¹³,
Sunghyun Park¹⁴,
Woo Jin Hyung^13,14 &
…
Min-Kook Choi¹³

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13594))

Included in the following conference series:

International Workshop on Multiscale Multimodal Medical Imaging

516 Accesses

Abstract

We propose visual modalities-based multimodal fusion for surgical phase recognition to overcome the limitation of the diversity of information such as the presence of tools. Through the proposed methods, we extracted a visual kinematics-based index related to the usage of tools such as movement and the relation between tools in surgery. In addition, we improved recognition performance using the effective fusion method which is fusing CNN-based visual feature and visual kinematics-based index. The visual kinematics-based index is helpful for understanding the surgical procedure as the information related to the interaction between tools. Furthermore, these indices can be extracted in any environment unlike kinematics in robotic surgery. The proposed methodology was applied to two multimodal datasets to verify that it can help to improve recognition performance in clinical environments.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 59.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Please refer supplementary material for class definition details and segmentation results on G40.
2.
Please refer to supplementary material for additional experimental results of Accuracy, mPrecision, mRecall, and mF1 on PETRAW.

References

Zisimopoulos, O., et al.: DeepPhase: surgical phase recognition in CATARACTS Videos. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11073, pp. 265–272. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00937-3_31
Chapter Google Scholar
Klank, U., Padoy, N., Feussner, H., Navab, N.: Automatic feature generation in endoscopic images. Int. J. Comput. Assist. Radiol. Surg. 3(3), 331–339 (2008). https://doi.org/10.1007/s11548-008-0223-8
Article Google Scholar
Hong, S., et al.: Rethinking generalization performance of surgical phase recognition with expert-generated annotations. arXiv preprint. arXiv:2110.11626 (2021)
Padoy, N., Blum, T., Ahmadi, S.-A., Feussner, H., Berger, M.-O., Navab, N.: Statistical modeling and recognition of surgical workflow. Med. Image Anal. 16(3), 632–641 (2012)
Article Google Scholar
Twinanda, A.P., Shehata, S., Mutter, D., Marescaux, J., De Mathelin, M., Padoy, N.: Endonet: a deep architecture for recognition tasks on laparoscopic videos. IEEE Trans. Med. Imaging 36(1), 86–97 (2016)
Article Google Scholar
Jin, Y.: Multi-task recurrent convolutional network with correlation loss for surgical video analysis. Med. Image Anal. 59, 101572 (2020)
Article Google Scholar
Lecuyer, G., Ragot, M., Martin, N., Launay, L., Jannin, P.: Assisted phase and step annotation for surgical videos. Int. J. Comput. Assist. Radiol. Surg. 15(4), 673–680 (2020). https://doi.org/10.1007/s11548-019-02108-8
Article Google Scholar
Dergachyova, O., Bouget, D., Huaulmé, A., Morandi, X., Jannin, P.: Automatic data-driven real-time segmentation and recognition of surgical workflow. Int. J. Comput. Assist. Radiol. Surg. 11(6), 1081–1089 (2016). https://doi.org/10.1007/s11548-016-1371-x
Article Google Scholar
Loukas, C.: Video content analysis of surgical procedures. Surg. Endosc. 32(2), 553–568 (2017). https://doi.org/10.1007/s00464-017-5878-1
Article Google Scholar
Czempiel, T., et al.: TeCNO: surgical phase recognition with multi-stage temporal convolutional networks. In: Martel, A.L., et al. (eds.) MICCAI 2020. LNCS, vol. 12263, pp. 343–352. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-59716-0_33
Chapter Google Scholar
Maier-Hein, L., et al.: Surgical data science for next-generation interventions. Nat. Biomed. Eng. 1(9), 691–696 (2017)
Article Google Scholar
Gao, Y., et al.: Jhu-isi gesture and skill assessment working set (jigsaws): a surgical activity dataset for human motion modeling. In: MICCAI Workshop: M2cai, vol. 3 (2014)
Google Scholar
Huaulmé, A., et al.: Micro-surgical anastomose workflow recognition challenge report. Comput. Methods Programs Biomed. 212, 106452 (2021)
Article Google Scholar
Huaulmé, A., et al.: Peg transfer workflow recognition challenge report: does multi-modal data improve recognition? arXiv preprint. arXiv:2202.05821 (2022)
Khalid, S., Goldenberg, M., Grantcharov, T., Taati, B., Rudzicz, F.: Evaluation of deep learning models for identifying surgical actions and measuring performance. JAMA Netw. Open 3(3), e201664–e201664 (2020)
Article Google Scholar
Funke, I., Mees, S.T., Weitz, J., Speidel, S.: Video-based surgical skill assessment using 3D convolutional neural networks. Int. J. Comput. Assist. Radiol. Surg. 14(7), 1217–1225 (2019). https://doi.org/10.1007/s11548-019-01995-1
Article Google Scholar
Hung, A.J., Chen, J., Jarc, A., Hatcher, D., Djaladat, H., Gill, I.S.: Development and validation of objective performance metrics for robot-assisted radical prostatectomy: a pilot study. J. Urol. 199(1), 296–304 (2018)
Article Google Scholar
Lee, D., Yu, H.W., Kwon, H., Kong, H.J., Lee, K.E., Kim, H.C.: Evaluation of surgical skills during robotic surgery by deep learning-based multiple surgical instrument tracking in training and actual operations. J. Clin. Med. 9(6), 1964 (2020)
Article Google Scholar
Liu, D., et al.: Towards unified surgical skill assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9522–9531 (2021)
Google Scholar
Birkmeyer, J.D., et al.: Surgical skill and complication rates after bariatric surgery. N. Engl. J. Med. 369(15), 1434–1442 (2013)
Article Google Scholar
Oropesa, I., et al.: Eva: laparoscopic instrument tracking based on endoscopic video analysis for psychomotor skills assessment. Surg. Endosc. 27(3), 1029–1039 (2013). https://doi.org/10.1007/s00464-012-2513-z
Article Google Scholar
Rezatofighi, H., Tsoi, N., Gwak, J., Sadeghian, A., Reid, I., Savarese, S.: Generalized intersection over union: a metric and a loss for bounding box regression. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 658–666 (2019)
Google Scholar
Zheng, Z., Wang, P., Liu, W., Li, J., Ye, R., Ren, D.: Distance-iou loss: Faster and better learning for bounding box regression. In: Proceedings of the AAAI Conference on Artificial Intelligence 34, 12993–13000 (2020)
Google Scholar
Chen, X., He, K.: Exploring simple siamese representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15750–15758 (2021)
Google Scholar
Feichtenhofer, C., Fan, H., Malik, J., He, K.: Slowfast networks for video recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6202–6211 (2019)
Google Scholar
Schuster, M., Paliwal, K.K.: Bidirectional recurrent neural networks. IEEE Trans. Sig. Process. 45(11), 2673–2681 (1997)
Article Google Scholar
Xiao, T., Liu, Y., Zhou, B., Jiang, Y., Sun, J.: Unified perceptual parsing for scene understanding. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 418–434 (2018)
Google Scholar
Liu, Z., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Google Scholar
Cui, Y., Jia, M., Lin, T.Y., Song, Y., Belongie, S.: Class-balanced loss based on effective number of samples. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9268–9277 (2019)
Google Scholar
Chen, L.C., Zhu, Y., Papandreou, G., Schroff, F., Adam, H.: Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 801–818 (2018)
Google Scholar
Yuan, Y., Chen, X., Wang, J.: Object-contextual representations for semantic segmentation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12351, pp. 173–190. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58539-6_11
Chapter Google Scholar
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5693–5703 (2019)
Google Scholar
MMSegmentation Contributors. MMSegmentation: Openmmlab semantic segmentation toolbox and benchmark. https://github.com/open-mmlab/mmsegmentation (2020)

Download references

Acknowledgement

“This research was funded by the Ministry of Health & Welfare, Republic of Korea (grant number : 1465035498 / HI21C1753000022).”

Author information

Authors and Affiliations

Vision AI, Hutom, Seoul, Republic of Korea
Bogyu Park, Hyeongyu Chi, Bokyung Park, Jiwon Lee, Woo Jin Hyung & Min-Kook Choi
Yonsei University College of Medicine, Seoul, Republic of Korea
Sunghyun Park & Woo Jin Hyung

Authors

Bogyu Park
View author publications
You can also search for this author in PubMed Google Scholar
Hyeongyu Chi
View author publications
You can also search for this author in PubMed Google Scholar
Bokyung Park
View author publications
You can also search for this author in PubMed Google Scholar
Jiwon Lee
View author publications
You can also search for this author in PubMed Google Scholar
Sunghyun Park
View author publications
You can also search for this author in PubMed Google Scholar
Woo Jin Hyung
View author publications
You can also search for this author in PubMed Google Scholar
Min-Kook Choi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Min-Kook Choi .

Editor information

Editors and Affiliations

Massachusetts General Hospital, Boston, MA, USA
Xiang Li
University of Sydney, Sydney, Australia
Jinglei Lv
Vanderbilt University, Nashville, TN, USA
Yuankai Huo
Peking University, Beijing, China
Bin Dong
University of Southern California, Los Angeles, CA, USA
Richard M. Leahy
Massachusetts General Hospital, Boston, MA, USA
Quanzheng Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Park, B. et al. (2022). Visual Modalities Based Multimodal Fusion for Surgical Phase Recognition. In: Li, X., Lv, J., Huo, Y., Dong, B., Leahy, R.M., Li, Q. (eds) Multiscale Multimodal Medical Imaging. MMMI 2022. Lecture Notes in Computer Science, vol 13594. Springer, Cham. https://doi.org/10.1007/978-3-031-18814-5_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-18814-5_2
Published: 12 October 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-18813-8
Online ISBN: 978-3-031-18814-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

Visual Modalities Based Multimodal Fusion for Surgical Phase Recognition