Symmetric Dilated Convolution for Surgical Gesture Recognition

Zhang, Jinglu; Nie, Yinyu; Lyu, Yao; Li, Hailin; Chang, Jian; Yang, Xiaosong; Zhang, Jian Jun

doi:10.1007/978-3-030-59716-0_39

Jinglu Zhang¹⁶,
Yinyu Nie¹⁶,
Yao Lyu¹⁶,
Hailin Li¹⁷,
Jian Chang¹⁶,
Xiaosong Yang¹⁶ &
…
Jian Jun Zhang¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12263))

Included in the following conference series:

International Conference on Medical Image Computing and Computer-Assisted Intervention

7884 Accesses

Abstract

Automatic surgical gesture recognition is a prerequisite of intra-operative computer assistance and objective surgical skill assessment. Prior works either require additional sensors to collect kinematics data or have limitations on capturing temporal information from long and untrimmed surgical videos. To tackle these challenges, we propose a novel temporal convolutional architecture to automatically detect and segment surgical gestures with corresponding boundaries only using RGB videos. We devise our method with a symmetric dilation structure bridged by a self-attention module to encode and decode the long-term temporal patterns and establish the frame-to-frame relationship accordingly. We validate the effectiveness of our approach on a fundamental robotic suturing task from the JIGSAWS dataset. The experiment results demonstrate the ability of our method on capturing long-term frame dependencies, which largely outperform the state-of-the-art methods on the frame-wise accuracy up to $\sim $6 points and the F1@50 score $\sim $6 points.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

SD-Net: joint surgical gesture recognition and skill assessment

Article Open access 16 October 2021

Using 3D Convolutional Neural Networks to Learn Spatiotemporal Features for Automatic Surgical Gesture Recognition in Video

Deep Reinforcement Learning for Surgical Gesture Segmentation and Classification

References

Ding, L., Xu, C.: Tricornet: a hybrid temporal convolutional and recurrent network for video action segmentation. arXiv preprint arXiv:1705.07818 (2017)
DiPietro, R., et al.: Recognizing surgical activities with recurrent neural networks. In: Ourselin, S., Joskowicz, L., Sabuncu, M.R., Unal, G., Wells, W. (eds.) MICCAI 2016. LNCS, vol. 9900, pp. 551–558. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46720-7_64
Chapter Google Scholar
Farha, Y.A., Gall, J.: Ms-tcn: multi-stage temporal convolutional network for action segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3575–3584 (2019)
Google Scholar
Funke, I., Bodenstedt, S., Oehme, F., von Bechtolsheim, F., Weitz, J., Speidel, S.: Using 3D convolutional neural networks to learn spatiotemporal features for automatic surgical gesture recognition in video. In: Shen, D., et al. (eds.) MICCAI 2019. LNCS, vol. 11768, pp. 467–475. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32254-0_52
Chapter Google Scholar
Gao, Y., et al.: Jhu-isi gesture and skill assessment working set (jigsaws): a surgical activity dataset for human motion modeling. In: MICCAI Workshop: M2CAI, vol. 3, p. 3 (2014)
Google Scholar
Hu, H., Gu, J., Zhang, Z., Dai, J., Wei, Y.: Relation networks for object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3588–3597 (2018)
Google Scholar
Lea, C., Flynn, M.D., Vidal, R., Reiter, A., Hager, G.D.: Temporal convolutional networks for action segmentation and detection. In: proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 156–165 (2017)
Google Scholar
Lea, C., Hager, G.D., Vidal, R.: An improved model for segmentation and recognition of fine-grained activities with application to surgical training tasks. In: 2015 IEEE Winter Conference on Applications of Computer Vision, pp. 1123–1129. IEEE (2015)
Google Scholar
Liu, D., Jiang, T.: Deep reinforcement learning for surgical gesture segmentation and classification. In: Frangi, A.F., Schnabel, J.A., Davatzikos, C., Alberola-López, C., Fichtinger, G. (eds.) MICCAI 2018. LNCS, vol. 11073, pp. 247–255. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00937-3_29
Chapter Google Scholar
Maier-Hein, L.: Surgical data science for next-generation interventions. Nat. Biomed. Eng. 1(9), 691–696 (2017)
Article Google Scholar
Oord, A.V.d., et al.: Wavenet: a generative model for raw audio. arXiv preprint arXiv:1609.03499 (2016)
Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: International Conference on Machine Learning, pp. 1310–1318 (2013)
Google Scholar
Singh, B., Marks, T.K., Jones, M., Tuzel, O., Shao, M.: A multi-stream bi-directional recurrent neural network for fine-grained action detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1961–1970 (2016)
Google Scholar
Tao, L., Zappella, L., Hager, G.D., Vidal, R.: Surgical gesture segmentation and recognition. In: Mori, K., Sakuma, I., Sato, Y., Barillot, C., Navab, N. (eds.) MICCAI 2013. LNCS, vol. 8151, pp. 339–346. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40760-4_43
Chapter Google Scholar
Twinanda, A.P., Shehata, S., Mutter, D., Marescaux, J., De Mathelin, M., Padoy, N.: Endonet: a deep architecture for recognition tasks on laparoscopic videos. IEEE Trans. Med. Imaging 36(1), 86–97 (2016)
Article Google Scholar
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)
Google Scholar
Wang, X., Girshick, R., Gupta, A., He, K.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
Google Scholar
Zhang, S., Guo, S., Huang, W., Scott, M.R., Wang, L.: V4d: 4D convolutional neural networks for video-level representation learning. arXiv preprint arXiv:2002.07442 (2020)

Download references

Acknowledgement

The authors thank Bournemouth University PhD scholarship and Hengdaoruyi Company as well as the Rabin Ezra Scholarship Trust for partly supported this research.

Author information

Authors and Affiliations

National Centre for Computer Animation, Bournemouth University, Poole, UK
Jinglu Zhang, Yinyu Nie, Yao Lyu, Jian Chang, Xiaosong Yang & Jian Jun Zhang
Communication University of China, Beijing, China
Hailin Li

Authors

Jinglu Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yinyu Nie
View author publications
You can also search for this author in PubMed Google Scholar
Yao Lyu
View author publications
You can also search for this author in PubMed Google Scholar
Hailin Li
View author publications
You can also search for this author in PubMed Google Scholar
Jian Chang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaosong Yang
View author publications
You can also search for this author in PubMed Google Scholar
Jian Jun Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jian Chang .

Editor information

Editors and Affiliations

University of Toronto, Toronto, ON, Canada
Anne L. Martel
The University of British Columbia, Vancouver, BC, Canada
Purang Abolmaesumi
University College London, London, UK
Danail Stoyanov
École Centrale de Nantes, Nantes, France
Diana Mateus
EURECOM, Biot, France
Maria A. Zuluaga
Chinese Academy of Sciences, Beijing, China
S. Kevin Zhou
Sorbonne University, Paris, France
Daniel Racoceanu
The Hebrew University of Jerusalem, Jerusalem, Israel
Leo Joskowicz

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 155 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, J. et al. (2020). Symmetric Dilated Convolution for Surgical Gesture Recognition. In: Martel, A.L., et al. Medical Image Computing and Computer Assisted Intervention – MICCAI 2020. MICCAI 2020. Lecture Notes in Computer Science(), vol 12263. Springer, Cham. https://doi.org/10.1007/978-3-030-59716-0_39

Download citation

DOI: https://doi.org/10.1007/978-3-030-59716-0_39
Published: 29 September 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59715-3
Online ISBN: 978-3-030-59716-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)