Skip to main content

Towards Accurate and Interpretable Surgical Skill Assessment: A Video-Based Method Incorporating Recognized Surgical Gestures and Skill Levels

  • Conference paper
  • First Online:
Medical Image Computing and Computer Assisted Intervention – MICCAI 2020 (MICCAI 2020)

Abstract

Nowadays, surgical skill assessment becomes increasingly important for surgical training, given the explosive growth of automation technologies. Existing work on skill score prediction is limited and deserves more promising outcomes. The challenges lie on complicated surgical tasks and new subjects as trial performers. Moreover, previous work mostly provides local feedback involving each individual video frame or clip that does not manifest human-interpretable semantics itself. To overcome these issues and facilitate more accurate and interpretable skill score prediction, we propose a novel video-based method incorporating recognized surgical gestures (segments) and skill levels (for both performers and gestures). Our method consists of two correlated multi-task learning frameworks. The main task of the first framework is to predict final skill scores of surgical trials and the auxiliary tasks are to recognize surgical gestures and to classify performers’ skills into self-proclaimed skill levels. The second framework, which is based on gesture-level features accumulated until the end of each previously identified gesture, incrementally generates running intermediate skill scores for feedback decoding. Experiments on JIGSAWS dataset show our first framework on C3D features pushes state-of-the-art prediction performance further to 0.83, 0.86 and 0.69 of Spearman’s correlation for the three surgical tasks under LOUO validation scheme. It even achieves 0.68 when generalizing across these tasks. For the second framework, additional gesture-level skill levels and captions are annotated by experts. The trend of predicted intermediate skill scores indicating problematic gestures is demonstrated as interpretable feedback. It turns out such trend resembles human’s scoring process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Additional annotations for JIGSAWS dataset can be accessed via request.

  2. 2.

    Our code is available on https://github.com/gunnerwang/MTL-VF-and-IMTL-AGF.

References

  1. Ahmidi, N., et al.: A dataset and benchmarks for segmentation and recognition of gestures in robotic surgery. IEEE Trans. Biomed. Eng. 64(9), 2025–2041 (2017)

    Article  Google Scholar 

  2. Benmansour, M., Handouzi, W., Malti, A.: A neural network architecture for automatic and objective surgical skill assessment. In: CISTEM, pp. 1–5. IEEE (2018)

    Google Scholar 

  3. Birkmeyer, J.D., et al.: Surgical skill and complication rates after bariatric surgery. N. Engl. J. Med. 369, 1434–1442 (2013)

    Article  Google Scholar 

  4. DiPietro, R., Hager, G.D.: Automated surgical activity recognition with one labeled sequence. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11768, pp. 458–466. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32254-0_51

    Chapter  Google Scholar 

  5. Ershad, M., Rege, R., Majewicz, A.: Surgical skill level assessment using automatic feature extraction methods. In: Medical Imaging: Image-Guided Procedures, Robotic Interventions, and Modeling, vol. 10576 (2018)

    Google Scholar 

  6. Fard, M.J., et al.: Machine learning approach for skill evaluation in robotic-assisted surgery. In: WCECS, vol. 1 (2016)

    Google Scholar 

  7. Fard, M.J., et al.: Automated robot-assisted surgical skill evaluation: predictive analytics approach. Int. J. Med. Robot. Comput. Assist. Surg. 14(1), e1850 (2018)

    Article  Google Scholar 

  8. Farha, Y.A., Gall, J.: MS-TCN: multi-stage temporal convolutional network for action segmentation. In: CVPR, pp. 3575–3584. IEEE (2019)

    Google Scholar 

  9. Ismail Fawaz, H., Forestier, G., Weber, J., Idoumghar, L., Muller, P.-A.: Evaluating surgical skills from kinematic data using convolutional neural networks. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11073, pp. 214–221. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00937-3_25

    Chapter  Google Scholar 

  10. Funke, I., Mees, S.T., Weitz, J., Speidel, S.: Video-based surgical skill assessment using 3D convolutional neural networks. IJCARS 14(7), 1217–1225 (2019)

    Google Scholar 

  11. Funke, I., Bodenstedt, S., Oehme, F., von Bechtolsheim, F., Weitz, J., Speidel, S.: Using 3D convolutional neural networks to learn spatiotemporal features for automatic surgical gesture recognition in video. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11768, pp. 467–475. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32254-0_52

    Chapter  Google Scholar 

  12. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778. IEEE (2016)

    Google Scholar 

  13. Karpathy, A., et al.: Large-scale video classification with convolutional neural networks. In: CVPR, pp. 1725–1732. IEEE (2014)

    Google Scholar 

  14. Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491. IEEE (2018)

    Google Scholar 

  15. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)

    Google Scholar 

  16. Lea, C., Reiter, A., Vidal, R., Hager, G.D.: Segmental spatiotemporal CNNs for fine-grained action segmentation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 36–52. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_3

    Chapter  Google Scholar 

  17. Lea, C., Vidal, R., Reiter, A., Hager, G.D.: Temporal convolutional networks: a unified approach to action segmentation. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 47–54. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_7

    Chapter  Google Scholar 

  18. Liu, D., Jiang, T.: Deep reinforcement learning for surgical gesture segmentation and classification. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11073, pp. 247–255. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00937-3_29

    Chapter  Google Scholar 

  19. Liu, D., Jiang, T., Wang, Y., Miao, R., Shan, F., Li, Z.: surgical skill assessment on in-vivo clinical data via the clearness of operating field. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11768, pp. 476–484. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32254-0_53

    Chapter  Google Scholar 

  20. Martin, J.A., et al.: Objective structured assessment of technical skill (OSATS) for surgical residents. Br. J. Surg. 84(2), 273–278 (1997)

    Google Scholar 

  21. Parmar, P., Morris, B.T.: Learning to score olympic events. In: CVPR-W, pp. 20–28. IEEE (2017)

    Google Scholar 

  22. Parmar, P., Morris, B.T.: Action quality assessment across multiple actions. In: WACV, pp. 1468–1476. IEEE (2019)

    Google Scholar 

  23. Parmar, P., Morris, B.T.: What and how well you performed? A multitask learning approach to action quality assessment. In: CVPR, pp. 304–313. IEEE (2019)

    Google Scholar 

  24. Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS-W (2017)

    Google Scholar 

  25. Regenbogen, S., et al.: Patterns of technical error among surgical malpractice claims: an analysis of strategies to prevent injury to surgical patients. Ann. Surg. 246(5), 705–711 (2007)

    Article  Google Scholar 

  26. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015)

    Article  MathSciNet  Google Scholar 

  27. Tao, L., Elhamifar, E., Khudanpur, S., Hager, G.D., Vidal, R.: Sparse hidden markov models for surgical gesture classification and skill evaluation. In: Abolmaesumi, P., Joskowicz, L., Navab, N., Jannin, P. (eds.) IPCAI 2012. LNCS, vol. 7330, pp. 167–177. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30618-1_17

    Chapter  Google Scholar 

  28. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: ICCV, pp. 4489–4497. IEEE (2015)

    Google Scholar 

  29. Wang, Z., Majewicz Fey, A.: Deep learning with convolutional neural network for objective skill evaluation in robot-assisted surgery. Int. J. Comput. Assist. Radiol. Surg. 13(12), 1959–1970 (2018). https://doi.org/10.1007/s11548-018-1860-1

    Article  Google Scholar 

  30. Xiang, X., Tian, Y., Reiter, A., Hager, G.D., Tran, T.D.: S3D: Stacking segmental P3D for action quality assessment. In: ICIP, pp. 928–932. IEEE (2018)

    Google Scholar 

  31. Zhou, K., Qiao, Y., Xiang, T.: Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In: AAAI (2018)

    Google Scholar 

  32. Zia, A., Essa, I.: Automated surgical skill assessment in RMIS training. Int. J. Comput. Assist. Radiol. Surg. 13(5), 731–739 (2018). https://doi.org/10.1007/s11548-018-1735-5

    Article  Google Scholar 

  33. Zia, A., Hung, A., Essa, I., Jarc, A.: Surgical activity recognition in robot-assisted radical prostatectomy using deep learning. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11073, pp. 273–280. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00937-3_32

    Chapter  Google Scholar 

  34. Zia, A., Sharma, Y., Bettadapura, V., Sarin, E.L., Essa, I.: Video and accelerometer-based motion analysis for automated surgical skills assessment. Int. J. Comput. Assist. Radiol. Surg. 13(3), 443–455 (2018). https://doi.org/10.1007/s11548-018-1704-z

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported in part by Science and Technology Commission of Shanghai Municipality under Grant No.: 18511105603. Special thanks go to Dr. Qiongjie Zhou’s team from Obstetrics and Gynecology Hospital affiliated to Fudan University for the help on extra annotations.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mian Li .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 91 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Wang, T., Wang, Y., Li, M. (2020). Towards Accurate and Interpretable Surgical Skill Assessment: A Video-Based Method Incorporating Recognized Surgical Gestures and Skill Levels. In: Martel, A.L., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2020. MICCAI 2020. Lecture Notes in Computer Science(), vol 12263. Springer, Cham. https://doi.org/10.1007/978-3-030-59716-0_64

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-59716-0_64

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-59715-3

  • Online ISBN: 978-3-030-59716-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics