Abstract
Nowadays, surgical skill assessment becomes increasingly important for surgical training, given the explosive growth of automation technologies. Existing work on skill score prediction is limited and deserves more promising outcomes. The challenges lie on complicated surgical tasks and new subjects as trial performers. Moreover, previous work mostly provides local feedback involving each individual video frame or clip that does not manifest human-interpretable semantics itself. To overcome these issues and facilitate more accurate and interpretable skill score prediction, we propose a novel video-based method incorporating recognized surgical gestures (segments) and skill levels (for both performers and gestures). Our method consists of two correlated multi-task learning frameworks. The main task of the first framework is to predict final skill scores of surgical trials and the auxiliary tasks are to recognize surgical gestures and to classify performers’ skills into self-proclaimed skill levels. The second framework, which is based on gesture-level features accumulated until the end of each previously identified gesture, incrementally generates running intermediate skill scores for feedback decoding. Experiments on JIGSAWS dataset show our first framework on C3D features pushes state-of-the-art prediction performance further to 0.83, 0.86 and 0.69 of Spearman’s correlation for the three surgical tasks under LOUO validation scheme. It even achieves 0.68 when generalizing across these tasks. For the second framework, additional gesture-level skill levels and captions are annotated by experts. The trend of predicted intermediate skill scores indicating problematic gestures is demonstrated as interpretable feedback. It turns out such trend resembles human’s scoring process.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Additional annotations for JIGSAWS dataset can be accessed via request.
- 2.
Our code is available on https://github.com/gunnerwang/MTL-VF-and-IMTL-AGF.
References
Ahmidi, N., et al.: A dataset and benchmarks for segmentation and recognition of gestures in robotic surgery. IEEE Trans. Biomed. Eng. 64(9), 2025–2041 (2017)
Benmansour, M., Handouzi, W., Malti, A.: A neural network architecture for automatic and objective surgical skill assessment. In: CISTEM, pp. 1–5. IEEE (2018)
Birkmeyer, J.D., et al.: Surgical skill and complication rates after bariatric surgery. N. Engl. J. Med. 369, 1434–1442 (2013)
DiPietro, R., Hager, G.D.: Automated surgical activity recognition with one labeled sequence. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11768, pp. 458–466. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32254-0_51
Ershad, M., Rege, R., Majewicz, A.: Surgical skill level assessment using automatic feature extraction methods. In: Medical Imaging: Image-Guided Procedures, Robotic Interventions, and Modeling, vol. 10576 (2018)
Fard, M.J., et al.: Machine learning approach for skill evaluation in robotic-assisted surgery. In: WCECS, vol. 1 (2016)
Fard, M.J., et al.: Automated robot-assisted surgical skill evaluation: predictive analytics approach. Int. J. Med. Robot. Comput. Assist. Surg. 14(1), e1850 (2018)
Farha, Y.A., Gall, J.: MS-TCN: multi-stage temporal convolutional network for action segmentation. In: CVPR, pp. 3575–3584. IEEE (2019)
Ismail Fawaz, H., Forestier, G., Weber, J., Idoumghar, L., Muller, P.-A.: Evaluating surgical skills from kinematic data using convolutional neural networks. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11073, pp. 214–221. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00937-3_25
Funke, I., Mees, S.T., Weitz, J., Speidel, S.: Video-based surgical skill assessment using 3D convolutional neural networks. IJCARS 14(7), 1217–1225 (2019)
Funke, I., Bodenstedt, S., Oehme, F., von Bechtolsheim, F., Weitz, J., Speidel, S.: Using 3D convolutional neural networks to learn spatiotemporal features for automatic surgical gesture recognition in video. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11768, pp. 467–475. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32254-0_52
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778. IEEE (2016)
Karpathy, A., et al.: Large-scale video classification with convolutional neural networks. In: CVPR, pp. 1725–1732. IEEE (2014)
Kendall, A., Gal, Y., Cipolla, R.: Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp. 7482–7491. IEEE (2018)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: ICLR (2015)
Lea, C., Reiter, A., Vidal, R., Hager, G.D.: Segmental spatiotemporal CNNs for fine-grained action segmentation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 36–52. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_3
Lea, C., Vidal, R., Reiter, A., Hager, G.D.: Temporal convolutional networks: a unified approach to action segmentation. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 47–54. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_7
Liu, D., Jiang, T.: Deep reinforcement learning for surgical gesture segmentation and classification. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11073, pp. 247–255. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00937-3_29
Liu, D., Jiang, T., Wang, Y., Miao, R., Shan, F., Li, Z.: surgical skill assessment on in-vivo clinical data via the clearness of operating field. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11768, pp. 476–484. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32254-0_53
Martin, J.A., et al.: Objective structured assessment of technical skill (OSATS) for surgical residents. Br. J. Surg. 84(2), 273–278 (1997)
Parmar, P., Morris, B.T.: Learning to score olympic events. In: CVPR-W, pp. 20–28. IEEE (2017)
Parmar, P., Morris, B.T.: Action quality assessment across multiple actions. In: WACV, pp. 1468–1476. IEEE (2019)
Parmar, P., Morris, B.T.: What and how well you performed? A multitask learning approach to action quality assessment. In: CVPR, pp. 304–313. IEEE (2019)
Paszke, A., et al.: Automatic differentiation in pytorch. In: NIPS-W (2017)
Regenbogen, S., et al.: Patterns of technical error among surgical malpractice claims: an analysis of strategies to prevent injury to surgical patients. Ann. Surg. 246(5), 705–711 (2007)
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015)
Tao, L., Elhamifar, E., Khudanpur, S., Hager, G.D., Vidal, R.: Sparse hidden markov models for surgical gesture classification and skill evaluation. In: Abolmaesumi, P., Joskowicz, L., Navab, N., Jannin, P. (eds.) IPCAI 2012. LNCS, vol. 7330, pp. 167–177. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-30618-1_17
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: ICCV, pp. 4489–4497. IEEE (2015)
Wang, Z., Majewicz Fey, A.: Deep learning with convolutional neural network for objective skill evaluation in robot-assisted surgery. Int. J. Comput. Assist. Radiol. Surg. 13(12), 1959–1970 (2018). https://doi.org/10.1007/s11548-018-1860-1
Xiang, X., Tian, Y., Reiter, A., Hager, G.D., Tran, T.D.: S3D: Stacking segmental P3D for action quality assessment. In: ICIP, pp. 928–932. IEEE (2018)
Zhou, K., Qiao, Y., Xiang, T.: Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In: AAAI (2018)
Zia, A., Essa, I.: Automated surgical skill assessment in RMIS training. Int. J. Comput. Assist. Radiol. Surg. 13(5), 731–739 (2018). https://doi.org/10.1007/s11548-018-1735-5
Zia, A., Hung, A., Essa, I., Jarc, A.: Surgical activity recognition in robot-assisted radical prostatectomy using deep learning. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11073, pp. 273–280. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00937-3_32
Zia, A., Sharma, Y., Bettadapura, V., Sarin, E.L., Essa, I.: Video and accelerometer-based motion analysis for automated surgical skills assessment. Int. J. Comput. Assist. Radiol. Surg. 13(3), 443–455 (2018). https://doi.org/10.1007/s11548-018-1704-z
Acknowledgements
This work was supported in part by Science and Technology Commission of Shanghai Municipality under Grant No.: 18511105603. Special thanks go to Dr. Qiongjie Zhou’s team from Obstetrics and Gynecology Hospital affiliated to Fudan University for the help on extra annotations.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Wang, T., Wang, Y., Li, M. (2020). Towards Accurate and Interpretable Surgical Skill Assessment: A Video-Based Method Incorporating Recognized Surgical Gestures and Skill Levels. In: Martel, A.L., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2020. MICCAI 2020. Lecture Notes in Computer Science(), vol 12263. Springer, Cham. https://doi.org/10.1007/978-3-030-59716-0_64
Download citation
DOI: https://doi.org/10.1007/978-3-030-59716-0_64
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59715-3
Online ISBN: 978-3-030-59716-0
eBook Packages: Computer ScienceComputer Science (R0)