Self-supervised learning via cluster distance prediction for operating room context awareness

Hamoud, Idris; Karargyris, Alexandros; Sharghi, Aidean; Mohareri, Omid; Padoy, Nicolas

doi:10.1007/s11548-022-02629-9

Self-supervised learning via cluster distance prediction for operating room context awareness

Original Article
Published: 26 April 2022

Volume 17, pages 1469–1476, (2022)
Cite this article

International Journal of Computer Assisted Radiology and Surgery Aims and scope Submit manuscript

Idris Hamoud ORCID: orcid.org/0000-0002-6791-9703¹,
Alexandros Karargyris²,
Aidean Sharghi³,
Omid Mohareri³ &
…
Nicolas Padoy^1,2

420 Accesses
Explore all metrics

Abstract

Purpose

Semantic segmentation and activity classification are key components to create intelligent surgical systems able to understand and assist clinical workflow. In the operating room, semantic segmentation is at the core of creating robots aware of clinical surroundings, whereas activity classification aims at understanding OR workflow at a higher level. State-of-the-art semantic segmentation and activity recognition approaches are fully supervised, which is not scalable. Self-supervision can decrease the amount of annotated data needed.

Methods

We propose a new 3D self-supervised task for OR scene understanding utilizing OR scene images captured with ToF cameras. Contrary to other self-supervised approaches, where handcrafted pretext tasks are focused on 2D image features, our proposed task consists of predicting relative 3D distance of image patches by exploiting the depth maps. By learning 3D spatial context, it generates discriminative features for our downstream tasks.

Results

Our approach is evaluated on two tasks and datasets containing multiview data captured from clinical scenarios. We demonstrate a noteworthy improvement in performance on both tasks, specifically on low-regime data where utility of self-supervised learning is the highest.

Conclusion

We propose a novel privacy-preserving self-supervised approach utilizing depth maps. Our proposed method shows performance on par with other self-supervised approaches and could be an interesting way to alleviate the burden of full supervision.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Multi-modal Unsupervised Pre-training for Surgical Operating Room Workflow Analysis

SegmentOR: Obtaining Efficient Operating Room Semantics Through Temporal Propagation

3D Semantic Mapping from Arthroscopy Using Out-of-Distribution Pose and Depth and In-Distribution Segmentation Training

References

Achanta R, Shaji A, Smith K, Lucchi A, Fua P, Süsstrunk S (2012) Slic superpixels compared to state-of-the-art superpixel methods. IEEE TPAMI 34(11):2274
Article Google Scholar
Adam S, Aidean S, Helene H, Daniel O, Omid M (2021) Multi-view surgical video action detection via mixed global view attention. In: MICCAI
Aidean S, Helene H, Daniel O, Omid M (2020) Automatic operating room surgical activity recognition for robot-assisted surgery. In: MICCAI
Asano YM, Rupprecht C, Vedaldi A (2020) A critical analysis of self-supervision, or what we can learn from a single image. In: CVPR
Azizi S, Mustafa B, Ryan F, Beaver Z, Freyberg J, Deaton J, Loh A, Karthikesalingam A, Kornblith S, Chen T, Natarajan V, Norouzi M (2021) Big self-supervised models advance medical image classification (ICCV)
Caron M, Misra I, Mairal J, Goyal P, Bojanowski P, Joulin A (2021) Unsupervised learning of visual features by contrasting cluster assignments
Caron M, Touvron H, Misra I, Jégou H, Mairal J, Bojanowski P, Joulin A (2021) Emerging properties in self-supervised vision transformers. In: Proceedings of ICCV
Catchpole K, Perkins CE, Bresee C, Solnik MJ, Sherman B, Fritch JL, Gross B, Jagannathan S, Hakami-Majd N, Avenido RM, Anger JT (2015) Safety, efficiency and learning curves in robotic surgery: a human factors analysis. Surg Endosc 30:3749–3761
Article Google Scholar
Chakraborty I, Elgammal A, Burd RS (2013) Video based activity recognition in trauma resuscitation. In: 2013 10th IEEE international conference and workshops on automatic face and gesture recognition (FG), pp 1–8
Chen T, Kornblith S, Norouzi M, Hinton GE (2020) A simple framework for contrastive learning of visual representations. arXiv:2002.05709
Dias RD, Yule SJ, Zenati MA (2020) Augmented cognition in the operating room
Doersch C, Gupta AK, Efros AA (2015) Unsupervised visual representation learning by context prediction. In: 2015 IEEE ICCV
Gidaris S, Singh P, Komodakis N (2018) Unsupervised representation learning by predicting image rotations. In: ICLR
Grill JB, Strub F, Altch’e F, Tallec C, Richemond PH, Buchatskaya E, Doersch C, Pires BÁ, Guo ZD, Azar MG, Piot B, Kavukcuoglu K, Munos R, Valko M (2020) Bootstrap your own latent: a new approach to self-supervised learning. In: NeurIPS
Hajj HA, Lamard M, Conze PH, Cochener B, Quellec G (2018) Monitoring tool usage in surgery videos using boosted convolutional and recurrent neural networks. MedIA 47:203–218
Google Scholar
He K, Fan H, Wu Y, Xie S, Girshick RB (2020) Momentum contrast for unsupervised visual representation learning. In: 2020 IEEE/CVF CVPR, pp 9726–9735
He K, Zhang X, Ren S, Sun J (2016) Identity mappings in deep residual networks. In: Computer vision—ECCV 2016, Springer International Publishing, pp 630–645
Issenhuth T, Srivastav VK, Gangi A, Padoy N (2019) Face detection in the operating room: comparison of state-of-the-art methods and a self-supervised approach. In: IJCARS
Kadkhodamohammadi A, Gangi A, de Mathelin M, Padoy N A multi-view rgb-d approach for human pose estimation in operating rooms. In: 2017 IEEE WACV
Li Z, Shaban A, Simard JG, Rabindran D, DiMaio SP, Mohareri O (2020) A robotic 3d perception system for operating room environment awareness. In: IPCAI
Liu MY, Tuzel O, Ramalingam S, Chellappa R (2011) Entropy rate superpixel segmentation. In: CVPR 2011, pp 2097–2104
Long J, Shelhamer E, Darrell T (2015) Fully convolutional networks for semantic segmentation. In: 2015 IEEE CVPR, pp 3431–3440
Luo Z, Hsieh JT, Balachandar N, Yeung S, Pusiol G, Luxenberg JS, Li G, Li LJ, Milstein A, Fei-Fei L (2018) Vision-based descriptive analytics of seniors—daily activities for long-term health monitoring
Newell A, Deng J (2020) How useful is self-supervised pretraining for visual tasks? In: 2020 IEEE/CVF conference on computer vision and pattern recognition (CVPR)
Noroozi M, Favaro P (2016) Unsupervised learning of visual representations by solving jigsaw puzzles. In: ECCV
Ouyang C, Biffi C, Chen C, Kart T, Qiu H, Rueckert D (2020) Self-supervision with superpixels: training few-shot medical image segmentation without annotation. In: ECCV
Roß T, Zimmerer D, Vemuri AS, Isensee F, Bodenstedt S, Both F, Kessler P, Wagner M, Müller-Stich BP, Kenngott H, Speidel S, Maier-Hein K, Maier-Hein L (2018) Exploiting the potential of unlabeled endoscopic video data with self-supervised learning. In: IJCARS
Sheetz KH, Claflin J (2020) Trends in the adoption of robotic surgery for common surgical procedures. JAMA Netw Open 3:e1918911
Article Google Scholar
Srivastav VK, Gangi A, Padoy N (2019) Human pose estimation on privacy-preserving low-resolution depth images. In: MICCAI. arXiv:2007.08340
Srivastav VK, Issenhuth T, Kadkhodamohammadi A, de Mathelin M, Gangi A, Padoy N (2018) Mvor: a multi-view RGB-d operating room dataset for 2d and 3d human pose estimation. arXiv:1808.08180
Taleb A, Loetzsch W, Danz N, Severin J thomas. gaertner, Bergner B, Lippert C (2020) 3d self-supervised methods for medical imaging. In: NeurIPS. arXiv:2006.03829
Twinanda AP, Shehata S, Mutter D, Marescaux J, de Mathelin M, Padoy N (2017) EndoNet: a deep architecture for recognition tasks on laparoscopic videos. IEEE TMI 36(1):86–97
Google Scholar
Twinanda AP, Winata P, Gangi A, De M, Mathelin PN (2017) Multi-stream deep architecture for surgical phase recognition on multi-view RGBD videos
van der Maaten L, Hinton G (2008) Visualizing data using t-sne. J Mach Learn Res 9(86):2579–2605
Google Scholar
van den Oord A, Li Y, Vinyals O (2018) Representation learning with contrastive predictive coding. arXiv:1807.03748
Wang X, Zhang R, Shen C, Kong T, Li L (2021) Dense contrastive learning for self-supervised visual pre-training. In: 2021 IEEE/CVF CVPR, pp 3023–3032
Yu T, Mutter D, Marescaux J, Padoy N (2019) Learning from a tiny dataset of manual annotations: a teacher/student approach for surgical phase recognition. In: IPCAI
Zbontar J, Jing L, Misra I, LeCun Y, Deny S (2021) Barlow twins: self-supervised learning via redundancy reduction. In: ICML

Download references

Acknowledgements

This work is supported by a PhD fellowship from Intuitive Surgical and by French state funds managed within the “Plan Investissements d’Avenir” by the ANR (reference ANR-10-IAHU-02).

Author information

Authors and Affiliations

CNRS, ICube, University of Strasbourg, Strasbourg, France
Idris Hamoud & Nicolas Padoy
IHU Strasbourg, Strasbourg, France
Alexandros Karargyris & Nicolas Padoy
Intuitive Surgical Inc., Sunnyvale, USA
Aidean Sharghi & Omid Mohareri

Authors

Idris Hamoud
View author publications
You can also search for this author in PubMed Google Scholar
Alexandros Karargyris
View author publications
You can also search for this author in PubMed Google Scholar
Aidean Sharghi
View author publications
You can also search for this author in PubMed Google Scholar
Omid Mohareri
View author publications
You can also search for this author in PubMed Google Scholar
Nicolas Padoy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Idris Hamoud.

Ethics declarations

Conflict of interest

Idris Hamoud is funded by a research scholarship from Intuitive Surgical. Nicolas Padoy is a scientific advisor to Caresyntax on topics unrelated to this study. The other authors declare that they have no conflict of interest.

Ethical approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Declaration of Helsinki and its later amendments or comparable ethical standards.

Informed consent

Data have been collected within an Institutional Review Board (IRB)-approved study, and all participants’ informed consent has been obtained.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hamoud, I., Karargyris, A., Sharghi, A. et al. Self-supervised learning via cluster distance prediction for operating room context awareness. Int J CARS 17, 1469–1476 (2022). https://doi.org/10.1007/s11548-022-02629-9

Download citation

Received: 04 March 2022
Accepted: 24 March 2022
Published: 26 April 2022
Issue Date: August 2022
DOI: https://doi.org/10.1007/s11548-022-02629-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Self-supervised learning via cluster distance prediction for operating room context awareness