Towards accurate and interpretable surgical skill assessment: a video-based method for skill score prediction and guiding feedback generation

Wang, Tianyu; Jin, Minhao; Li, Mian

doi:10.1007/s11548-021-02448-4

Towards accurate and interpretable surgical skill assessment: a video-based method for skill score prediction and guiding feedback generation

Original Article
Published: 10 July 2021

Volume 16, pages 1595–1605, (2021)
Cite this article

International Journal of Computer Assisted Radiology and Surgery Aims and scope Submit manuscript

677 Accesses
3 Citations
Explore all metrics

Abstract

Purpose

Recently, automatic surgical skill assessment has received the attention given the increasingly important role of surgical training. The assessment usually involves skill score prediction and further feedback generation. Existing work on skill score prediction is limited with several challenges and deserves more promising outcomes. For the feedback, most work identifies the flaws on the granularity of video frames or clips. It thus remains to be explored how to identify poorly performed gestures (segments) and further how to provide good references for improvement.

Methods

To overcome these problems, a novel method consisting of three correlated frameworks is proposed. The first framework learns to predict final skill scores of surgical trials with two auxiliary tasks. The second framework learns to predict running intermediate skill scores that indicate the problematic gestures, while the third framework explores the optimal gesture sequences as references through a new Policy Gradient based formulation.

Results

Our method is experimented on JIGSAWS dataset. The first framework pushes state-of-the-art prediction performance further to 0.83, 0.86 and 0.69 Spearman’s correlations for the three surgical tasks under LOUO validation scheme. Moreover, the intermediate scores predicted by the second framework are better in accord with the experts’. Besides, the generated gesture sequences in the third framework reflect the optimality of the gesture flow.

Conclusion

In summary, multi-task learning with semantic visual features successfully boosts the performance of skill score prediction, while exploring gesture-level annotations and score elements of the final skill score is useful for generating more interpretable feedback. Our presented method potentially contributes towards a complete loop of automated surgical training.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards Accurate and Interpretable Surgical Skill Assessment: A Video-Based Method Incorporating Recognized Surgical Gestures and Skill Levels

Video-Based Surgical Skills Assessment Using Long Term Tool Tracking

Deep Reinforcement Learning for Surgical Gesture Segmentation and Classification

Availability of data and material

All data generated or analysed during this study is available from the corresponding author on reasonable request.

Notes

Our code is available on https://github.com/gunnerwang/Novel-Surgical-Skill-Assessment.

References

Ahmidi N, Tao L, Sefati S, Gao Y, Lea C, Haro BB, Zappella L, Khudanpur S, Vidal R, Hager GD (2017) A dataset and benchmarks for segmentation and recognition of gestures in robotic surgery. IEEE Trans Biomed Eng 64(9):2025–2041
Article Google Scholar
Birkmeyer JD, Finks JF, O’Reilly A, Oerline M, Carlin AM, Nunn AR, Dimick J, Banerjee M, Birkmeyer NJO (2013) Surgical skill and complication rates after bariatric surgery. N Engl J Med 369:1434–1442
Article CAS Google Scholar
Farha YA, Gall J (2019) MS-TCN: Multi-stage temporal convolutional network for action segmentation. In: CVPR, pp. 3575–3584. IEEE
Fawaz HI, Forestier G, Weber J, Idoumghar L, Muller PA (2018) Evaluating surgical skills from kinematic data using convolutional neural networks. MICCAI, LNCS. vol 11073, pp 214–221. Springer, Cham
Funke I, Mees ST, Weitz J, Speidel S (2019) Video-based surgical skill assessment using 3D convolutional neural networks. IJCARS 14(7):1217–1225
Google Scholar
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: CVPR, pp 770–778. IEEE
Iyengar K, Dwyer G, Stoyanov D (2020) Investigating exploration for deep reinforcement learning of concentric tube robot control. IJCARS 15:1157–1165
Google Scholar
Kendall A, Gal Y, Cipolla R (2018) Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In: CVPR, pp 7482–7491. IEEE
Kullback S, Leibler RA (1951) On information and sufficiency. Ann Math Stat 22:79–86
Article Google Scholar
Liu D, Jiang T (2018) Deep reinforcement learning for surgical gesture segmentation and classification. MICCAI, LNCS. vol 11073, pp 247–255. Springer, Cham
Liu D, Jiang T, Wang Y, Miao R, Shan F, Li Z (2019) Surgical skill assessment on in-vivo clinical data via the clearness of operating field. MICCAI, LNCS. vol 11768, pp 476–484. Springer, Cham
Martin JA, Regehr G, Reznick R, Macrae H, Murnaghan J, Hutchison C, Brown M (1997) Objective structured assessment of technical skill (OSATS) for surgical residents. Br J Surg 84(2):273–278
CAS PubMed Google Scholar
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: NIPS, pp 3111–3119
Napalkova L, Rozenblit JW, Hwang G, Hamilton AJ, Suantak L (2014) An optimal motion planning method for computer-assisted surgical training. Appl Soft Comput 24:889–899
Article Google Scholar
Parmar P, Morris BT (2017) Learning to score olympic events. In: CVPR-W, pp 20–28. IEEE
Parmar P, Morris BT (2019) Action quality assessment across multiple actions. In: WACV, pp 1468–1476. IEEE
Parmar P, Morris BT (2019) What and how well you performed? A multitask learning approach to action quality assessment. In: CVPR, pp 304–313. IEEE
Sutton RS, McAllester DA, Singh SP, Mansour Y (2000) Policy gradient methods for reinforcement learning with function approximation. In: NIPS, pp 1057–1063
Tan X, Lee Y, Chng C, Lim K, Chui C (2020) Robot-assisted flexible needle insertion using universal distributional deep reinforcement learning. IJCARS 15:341–349
Google Scholar
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3D convolutional networks. In: ICCV, pp 4489–4497. IEEE
Wang T, Wang Y, Li M (2020) Towards accurate and interpretable surgical skill assessment: A video-based method incorporating recognized surgical gestures and skill levels. In: MICCAI 2020, LNCS, vol 12263, pp 668–678. Springer, Cham. https://doi.org/10.1007/978-3-030-59716-0_64
Wang Z, Fey AM (2018) Deep learning with convolutional neural network for objective skill evaluation in robot-assisted surgery. IJCARS 13(12):1959–1970
Google Scholar
Xiang X, Tian Y, Reiter A, Hager GD, Tran TD (2018) S3D: Stacking segmental P3D for action quality assessment. In: ICIP, pp 928–932. IEEE
Zhou K, Qiao Y, Xiang T (2018) Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In: AAAI
Zia A, Essa I (2018) Automated surgical skill assessment in RMIS training. IJCARS 13(5):731–739
Google Scholar
Zia A, Sharma Y, Bettadapura V, Sarin EL, Essa I (2018) Video and accelerometer-based motion analysis for automated surgical skills assessment. IJCARS 13(3):443–455
Google Scholar

Download references

Acknowledgements

This work was supported in part by Science and Technology Commission of Shanghai Municipality under Grant No.: 18511105603. Special thanks go to Dr. Qiongjie Zhou’s team from Obstetrics and Gynecology Hospital affiliated to Fudan University for the help on extra annotations.

Funding

This study was funded by Science and Technology Commission of Shanghai Municipality (Grant No.: 18511105603)

Author information

Authors and Affiliations

Shanghai Jiao Tong University, Shanghai, China
Tianyu Wang, Minhao Jin & Mian Li

Authors

Tianyu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Minhao Jin
View author publications
You can also search for this author in PubMed Google Scholar
Mian Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mian Li.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards.

Informed consent

This articles does not contain patient data.

Code availability

The codes used during the current study are available at https://github.com/gunnerwang/Novel-Surgical-Skill-Assessment

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This work is extended from our conference paper [21] by including a new framework PG-GS for discovering optimal gesture sequences with corresponding experiments and logically organizing the three frameworks into systematic surgical practice.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, T., Jin, M. & Li, M. Towards accurate and interpretable surgical skill assessment: a video-based method for skill score prediction and guiding feedback generation. Int J CARS 16, 1595–1605 (2021). https://doi.org/10.1007/s11548-021-02448-4

Download citation

Received: 14 January 2021
Accepted: 30 June 2021
Published: 10 July 2021
Issue Date: September 2021
DOI: https://doi.org/10.1007/s11548-021-02448-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Towards accurate and interpretable surgical skill assessment: a video-based method for skill score prediction and guiding feedback generation