Towards Accurate and Interpretable Surgical Skill Assessment: A Video-Based Method Incorporating Recognized Surgical Gestures and Skill Levels

Wang, Tianyu; Wang, Yijie; Li, Mian

doi:10.1007/978-3-030-59716-0_64

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12263))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

7642 Accesses
18 Citations

Abstract

Nowadays, surgical skill assessment becomes increasingly important for surgical training, given the explosive growth of automation technologies. Existing work on skill score prediction is limited and deserves more promising outcomes. The challenges lie on complicated surgical tasks and new subjects as trial performers. Moreover, previous work mostly provides local feedback involving each individual video frame or clip that does not manifest human-interpretable semantics itself. To overcome these issues and facilitate more accurate and interpretable skill score prediction, we propose a novel video-based method incorporating recognized surgical gestures (segments) and skill levels (for both performers and gestures). Our method consists of two correlated multi-task learning frameworks. The main task of the first framework is to predict final skill scores of surgical trials and the auxiliary tasks are to recognize surgical gestures and to classify performers’ skills into self-proclaimed skill levels. The second framework, which is based on gesture-level features accumulated until the end of each previously identified gesture, incrementally generates running intermediate skill scores for feedback decoding. Experiments on JIGSAWS dataset show our first framework on C3D features pushes state-of-the-art prediction performance further to 0.83, 0.86 and 0.69 of Spearman’s correlation for the three surgical tasks under LOUO validation scheme. It even achieves 0.68 when generalizing across these tasks. For the second framework, additional gesture-level skill levels and captions are annotated by experts. The trend of predicted intermediate skill scores indicating problematic gestures is demonstrated as interpretable feedback. It turns out such trend resembles human’s scoring process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Additional annotations for JIGSAWS dataset can be accessed via request.
2.
Our code is available on https://github.com/gunnerwang/MTL-VF-and-IMTL-AGF.

References

Ahmidi, N., et al.: A dataset and benchmarks for segmentation and recognition of gestures in robotic surgery. IEEE Trans. Biomed. Eng. 64(9), 2025–2041 (2017)
Article Google Scholar
Benmansour, M., Handouzi, W., Malti, A.: A neural network architecture for automatic and objective surgical skill assessment. In: CISTEM, pp. 1–5. IEEE (2018)
Google Scholar
Birkmeyer, J.D., et al.: Surgical skill and complication rates after bariatric surgery. N. Engl. J. Med. 369, 1434–1442 (2013)
Article Google Scholar
DiPietro, R., Hager, G.D.: Automated surgical activity recognition with one labeled sequence. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11768, pp. 458–466. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32254-0_51
Chapter Google Scholar
Ershad, M., Rege, R., Majewicz, A.: Surgical skill level assessment using automatic feature extraction methods. In: Medical Imaging: Image-Guided Procedures, Robotic Interventions, and Modeling, vol. 10576 (2018)
Google Scholar
Fard, M.J., et al.: Machine learning approach for skill evaluation in robotic-assisted surgery. In: WCECS, vol. 1 (2016)
Google Scholar
Fard, M.J., et al.: Automated robot-assisted surgical skill evaluation: predictive analytics approach. Int. J. Med. Robot. Comput. Assist. Surg. 14(1), e1850 (2018)
Article Google Scholar
Farha, Y.A., Gall, J.: MS-TCN: multi-stage temporal convolutional network for action segmentation. In: CVPR, pp. 3575–3584. IEEE (2019)
Google Scholar
Ismail Fawaz, H., Forestier, G., Weber, J., Idoumghar, L., Muller, P.-A.: Evaluating surgical skills from kinematic data using convolutional neural networks. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11073, pp. 214–221. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00937-3_25
Chapter Google Scholar
Funke, I., Mees, S.T., Weitz, J., Speidel, S.: Video-based surgical skill assessment using 3D convolutional neural networks. IJCARS 14(7), 1217–1225 (2019)
Google Scholar
Funke, I., Bodenstedt, S., Oehme, F., von Bechtolsheim, F., Weitz, J., Speidel, S.: Using 3D convolutional neural networks to learn spatiotemporal features for automatic surgical gesture recognition in video. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11768, pp. 467–475. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32254-0_52
Chapter Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778. IEEE (2016)
Google Scholar
Karpathy, A., et al.: Large-scale video classification with convolutional neural networks. In: CVPR, pp. 1725–1732. IEEE (2014)
Google Scholar
Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491. IEEE (2018)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
Google Scholar
Lea, C., Reiter, A., Vidal, R., Hager, G.D.: Segmental spatiotemporal CNNs for fine-grained action segmentation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 36–52. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_3
Chapter Google Scholar
Lea, C., Vidal, R., Reiter, A., Hager, G.D.: Temporal convolutional networks: a unified approach to action segmentation. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 47–54. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_7
Chapter Google Scholar
Liu, D., Jiang, T.: Deep reinforcement learning for surgical gesture segmentation and classification. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11073, pp. 247–255. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00937-3_29
Chapter Google Scholar
Liu, D., Jiang, T., Wang, Y., Miao, R., Shan, F., Li, Z.: surgical skill assessment on in-vivo clinical data via the clearness of operating field. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11768, pp. 476–484. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32254-0_53
Chapter Google Scholar
Martin, J.A., et al.: Objective structured assessment of technical skill (OSATS) for surgical residents. Br. J. Surg. 84(2), 273–278 (1997)
Google Scholar
Parmar, P., Morris, B.T.: Learning to score olympic events. In: CVPR-W, pp. 20–28. IEEE (2017)
Google Scholar
Parmar, P., Morris, B.T.: Action quality assessment across multiple actions. In: WACV, pp. 1468–1476. IEEE (2019)
Google Scholar
Parmar, P., Morris, B.T.: What and how well you performed? A multitask learning approach to action quality assessment. In: CVPR, pp. 304–313. IEEE (2019)
Google Scholar
Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS-W (2017)
Google Scholar
Regenbogen, S., et al.: Patterns of technical error among surgical malpractice claims: an analysis of strategies to prevent injury to surgical patients. Ann. Surg. 246(5), 705–711 (2007)
Article Google Scholar
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015)
Article MathSciNet Google Scholar
Tao, L., Elhamifar, E., Khudanpur, S., Hager, G.D., Vidal, R.: Sparse hidden markov models for surgical gesture classification and skill evaluation. In: Abolmaesumi, P., Joskowicz, L., Navab, N., Jannin, P. (eds.) IPCAI 2012. LNCS, vol. 7330, pp. 167–177. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30618-1_17
Chapter Google Scholar
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: ICCV, pp. 4489–4497. IEEE (2015)
Google Scholar
Wang, Z., Majewicz Fey, A.: Deep learning with convolutional neural network for objective skill evaluation in robot-assisted surgery. Int. J. Comput. Assist. Radiol. Surg. 13(12), 1959–1970 (2018). https://doi.org/10.1007/s11548-018-1860-1
Article Google Scholar
Xiang, X., Tian, Y., Reiter, A., Hager, G.D., Tran, T.D.: S3D: Stacking segmental P3D for action quality assessment. In: ICIP, pp. 928–932. IEEE (2018)
Google Scholar
Zhou, K., Qiao, Y., Xiang, T.: Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In: AAAI (2018)
Google Scholar
Zia, A., Essa, I.: Automated surgical skill assessment in RMIS training. Int. J. Comput. Assist. Radiol. Surg. 13(5), 731–739 (2018). https://doi.org/10.1007/s11548-018-1735-5
Article Google Scholar
Zia, A., Hung, A., Essa, I., Jarc, A.: Surgical activity recognition in robot-assisted radical prostatectomy using deep learning. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11073, pp. 273–280. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00937-3_32
Chapter Google Scholar
Zia, A., Sharma, Y., Bettadapura, V., Sarin, E.L., Essa, I.: Video and accelerometer-based motion analysis for automated surgical skills assessment. Int. J. Comput. Assist. Radiol. Surg. 13(3), 443–455 (2018). https://doi.org/10.1007/s11548-018-1704-z
Article Google Scholar

Download references

Acknowledgements

This work was supported in part by Science and Technology Commission of Shanghai Municipality under Grant No.: 18511105603. Special thanks go to Dr. Qiongjie Zhou’s team from Obstetrics and Gynecology Hospital affiliated to Fudan University for the help on extra annotations.

Author information

Authors and Affiliations

Shanghai Jiao Tong University, Shanghai, China
Tianyu Wang, Yijie Wang & Mian Li

Authors

Tianyu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yijie Wang
View author publications
You can also search for this author in PubMed Google Scholar
Mian Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mian Li .

Editor information

Editors and Affiliations

University of Toronto, Toronto, ON, Canada
Anne L. Martel
The University of British Columbia, Vancouver, BC, Canada
Purang Abolmaesumi
University College London, London, UK
Danail Stoyanov
École Centrale de Nantes, Nantes, France
Diana Mateus
EURECOM, Biot, France
Maria A. Zuluaga
Chinese Academy of Sciences, Beijing, China
S. Kevin Zhou
Sorbonne University, Paris, France
Daniel Racoceanu
The Hebrew University of Jerusalem, Jerusalem, Israel
Leo Joskowicz

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 91 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, T., Wang, Y., Li, M. (2020). Towards Accurate and Interpretable Surgical Skill Assessment: A Video-Based Method Incorporating Recognized Surgical Gestures and Skill Levels. In: Martel, A.L., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2020. MICCAI 2020. Lecture Notes in Computer Science(), vol 12263. Springer, Cham. https://doi.org/10.1007/978-3-030-59716-0_64

Download citation

DOI: https://doi.org/10.1007/978-3-030-59716-0_64
Published: 29 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59715-3
Online ISBN: 978-3-030-59716-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)