Exploring the Effect of Dataset Diversity in Self-supervised Learning for Surgical Computer Vision

Jaspers, Tim J. M.; de Jong, Ronald L. P. D.; Al Khalil, Yasmina; Zeelenberg, Tijn; Kusters, Carolus H. J.; Li, Yiping; van Jaarsveld, Romy C.; Bakker, Franciscus H. A.; Ruurda, Jelle P.; Brinkman, Willem M.; De With, Peter H. N.; van der Sommen, Fons

doi:10.1007/978-3-031-73748-0_5

Tim J. M. Jaspers ORCID: orcid.org/0009-0001-8306-5058¹⁵,
Ronald L. P. D. de Jong ORCID: orcid.org/0009-0005-7806-4340¹⁶,
Yasmina Al Khalil¹⁶,
Tijn Zeelenberg¹⁵,
Carolus H. J. Kusters ORCID: orcid.org/0009-0004-3114-3888¹⁵,
Yiping Li¹⁶,
Romy C. van Jaarsveld¹⁷,
Franciscus H. A. Bakker^18,19,
Jelle P. Ruurda¹⁷,
Willem M. Brinkman¹⁸,
Peter H. N. De With¹⁵ &
…
Fons van der Sommen ORCID: orcid.org/0000-0002-3593-2356¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15265))

Included in the following conference series:

MICCAI Workshop on Data Engineering in Medical Imaging

273 Accesses

Abstract

Over the past decade, computer vision applications in minimally invasive surgery have rapidly increased. Despite this growth, the impact of surgical computer vision remains limited compared to other medical fields like pathology and radiology, primarily due to the scarcity of representative annotated data. Whereas transfer learning from large annotated datasets such as ImageNet has been conventionally the norm to achieve high-performing models, recent advancements in self-supervised learning (SSL) have demonstrated superior performance. In medical image analysis, in-domain SSL pretraining has already been shown to outperform ImageNet-based initialization. Although unlabeled data in the field of surgical computer vision is abundant, the diversity within this data is limited. This study investigates the role of dataset diversity in SSL for surgical computer vision, comparing procedure-specific datasets against a more heterogeneous general surgical dataset across three different downstream surgical applications. The obtained results show that using solely procedure-specific data can lead to substantial improvements of 13.8%, 9.5%, and 36.8% compared to ImageNet pretraining. However, extending this data with more heterogeneous surgical data further increases performance by an additional 5.0%, 5.2%, and 2.5%, suggesting that increasing diversity within SSL data is beneficial for model performance. The code and pretrained model weights are made publicly available at https://github.com/TimJaspers0801/SurgeNet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 44.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Jumpstarting Surgical Computer Vision

Free Lunch for Surgical Video Understanding by Distilling Self-supervisions

EndoViT: pretraining vision transformers on a large collection of endoscopic images

Article Open access 03 April 2024

References

Alapatt, D., Murali, A., Srivastav, V., Mascagni, P., Consortium, A., Padoy, N.: Jumpstarting surgical computer vision (2023)
Google Scholar
Bakker, F.H.A., de Nijs, J.V., Jaspers, T., et al.: Estimating surgical urethral length on intraoperative robot-assisted prostatectomy images using artificial intelligence anatomy recognition. J. Endourol. 38(7), 690–696 (2024). https://doi.org/10.1089/end.2023.0697, pMID: 38613819
Bawa, V.S., Singh, G., KapingA, F., et al.: The SARAS Endoscopic Surgeon Action Detection (ESAD) dataset: challenges and methods (2021)
Google Scholar
den Boer, R.B., Jaspers, T.J.M., de Jongh, C., et al.: Deep learning-based recognition of key anatomical structures during robot-assisted minimally invasive esophagectomy. Surg. Endosc. 37(7), 5164–5175 (2023). https://doi.org/10.1007/s00464-023-09990-z
Article Google Scholar
den Boer, R.B., de Jongh, C., Huijbers, W.T.E., et al.: Computer-aided anatomy recognition in intrathoracic and -abdominal surgery: a systematic review. Surg. Endosc. 36(12), 8737–8752 (2022). https://doi.org/10.1007/s00464-022-09421-5
Article Google Scholar
Caron, M., Touvron, H., et al.: Emerging properties in self-supervised vision transformers. In: Proceedings of the International Conference on Computer Vision (ICCV) (2021)
Google Scholar
Carstens, M., Rinner, F.M., Bodenstedt, S., et al.: The Dresden surgical anatomy dataset for abdominal organ segmentation in surgical data science. Sci. Data 10(1), 3 (2023). https://doi.org/10.1038/s41597-022-01719-2
Deng, J., Dong, W., Socher, R., et al.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009). https://doi.org/10.1109/CVPR.2009.5206848
Dosovitskiy, A., et al.: An image is worth 16 $\times $ 16 words: transformers for image recognition at scale (2021)
Google Scholar
Hashimoto, D.A., Rosman, G., Volkov, M., Rus, D.L., Meireles, O.R.: Artificial intelligence for intraoperative video analysis: machine learning’s role in surgical education. J. Am. Coll. Surg. 225(4, Suppl. 1), S171 (2017). https://doi.org/10.1016/j.jamcollsurg.2017.07.387, Scientific Forum Abstracts: 2017 Clinical Congress
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015)
Google Scholar
Hirsch, R., Caron, M., Cohen, R., et al.: Self-supervised learning for endoscopic video analysis. In: Greenspan, H., et al. (eds.) Medical Image Computing and Computer Assisted Intervention – MICCAI 2023, pp. 569–578. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-43904-9_55
Hong, W.Y., Kao, C.L., Kuo, Y.H., et al.: CholecSeg8k: a semantic segmentation dataset for laparoscopic cholecystectomy based on Cholec80 (2020)
Google Scholar
Kirillov, A., Girshick, R., He, K., Dollar, P.: Panoptic feature pyramid networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2019
Google Scholar
Lavanchy, J.L., Ramesh, S., Dall’Alba, D., et al.: Challenges in multi-centric generalization: phase and step recognition in Roux-en-Y gastric bypass surgery. Int. J. Comput. Assist. Radiol. Surg. (2024). https://doi.org/10.1007/s11548-024-03166-3
Leibetseder, A., Kletz, S., Schoeffmann, K., Keckstein, S., Keckstein, J.: GLENDA: gynecologic laparoscopy endometriosis dataset. In: Ro, Y.M., et al. (eds.) MultiMedia Modeling, pp. 439–450. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-37734-2_36
Leibetseder, A., Petscharnig, S., Primus, M.J., et al.: LapGyn4: a dataset for 4 automatic content analysis problems in the domain of laparoscopic gynecology. In: Proceedings of the 9th ACM Multimedia Systems Conference, pp. 357–362 (2018)
Google Scholar
Maier-Hein, L., Eisenmann, M., Sarikaya, D., et al.: Surgical data science - from concepts toward clinical translation. Med. Image Anal. 76, 102306 (2022). https://doi.org/10.1016/j.media.2021.102306
Article Google Scholar
Maier-Hein, L., Wagner, M., Ross, T., et al.: Heidelberg colorectal data set for surgical data science in the sensor operating room (2021)
Google Scholar
Mascagni, P., Vardazaryan, A., Alapatt, D., et al.: Artificial intelligence for surgical safety: automatic assessment of the critical view of safety in laparoscopic cholecystectomy using deep learning. Ann. Surg. 275(5), 955–961 (2022)
Google Scholar
Padoy, N., Blum, T., Ahmadi, S.A., Feussner, H., Berger, M.O., Navab, N.: Statistical modeling and recognition of surgical workflow. Med. Image Anal. 16(3), 632–641 (2012). https://doi.org/10.1016/j.media.2010.10.001, Computer Assisted Interventions
Ramesh, S., Srivastav, V., Alapatt, D., et al.: Dissecting self-supervised learning methods for surgical computer vision. Med. Image Anal. 88, 102844 (2023). https://doi.org/10.1016/j.media.2023.102844
Article Google Scholar
Tan, M., Le, Q.V.: EfficientNet: rethinking model scaling for convolutional neural networks (2020)
Google Scholar
Twinanda, A.P., Shehata, S., Mutter, D., Marescaux, J., de Mathelin, M., Padoy, N.: EndoNet: a deep architecture for recognition tasks on laparoscopic videos. IEEE Trans. Med. Imaging 36(1), 86–97 (2017). https://doi.org/10.1109/TMI.2016.2593957
Article Google Scholar
Valderrama, N., Ruiz Puentes, P., Hernández, I., et al.: Towards holistic surgical scene understanding. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds.) Medical Image Computing and Computer Assisted Intervention – MICCAI 2022, pp. 442–452. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16449-1_42
Wang, W., et al.: Pyramid vision transformer: a versatile backbone for dense prediction without convolutions (2021)
Google Scholar
Wang, Z., Liu, C., et al.: Foundation model for endoscopy video analysis via large-scale self-supervised pre-train. In: Greenspan, H., et al. (eds.) International Conference on Medical Image Computing and Computer-Assisted Intervention, vol. 14228, pp. 101–111. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-43996-4_10
Yoon, J., Lee, J., Heo, S., et al.: hSDB-instrument: Instrument localization database for laparoscopic and robotic surgeries. In: de Bruijne, M., et al. (eds.) Medical Image Computing and Computer Assisted Intervention – MICCAI 2021, pp. 393–402. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87202-1_38
Yu, W., Si, C., Zhou, P., et al.: MetaFormer baselines for vision. IEEE Trans. Pattern Anal. Mach. Intell. 46(2), 896–912 (2024). https://doi.org/10.1109/tpami.2023.3329173
Article Google Scholar
Zhang, Y., Bano, S., Page, A.S., Deprest, J., Stoyanov, D., Vasconcelos, F.: Retrieval of surgical phase transitions using reinforcement learning. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds.) Medical Image Computing and Computer Assisted Intervention – MICCAI 2022, pp. 497–506. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-16449-1_47
Zia, A., Bhattacharyya, K., Liu, X., et al.: Surgical tool classification and localization: results and methods from the MICCAI 2022 SurgToolLoc challenge (2023)
Google Scholar

Download references

Acknowledgements

We thank SURF (www.surf.nl) for the support in using the National Supercomputer Snellius.

Author information

Authors and Affiliations

Department of Electrical Engineering, Video Coding and Architectures, Eindhoven University of Technology, Eindhoven, The Netherlands
Tim J. M. Jaspers, Tijn Zeelenberg, Carolus H. J. Kusters, Peter H. N. De With & Fons van der Sommen
Department of Biomedical Engineering, Medical Image Analysis, Eindhoven University of Technology, Eindhoven, The Netherlands
Ronald L. P. D. de Jong, Yasmina Al Khalil & Yiping Li
Department of Surgery, University Medical Center Utrecht, Utrecht, The Netherlands
Romy C. van Jaarsveld & Jelle P. Ruurda
Department of Oncological Urology, University Medical Center Utrecht, Utrecht, The Netherlands
Franciscus H. A. Bakker & Willem M. Brinkman
Department of Urology, Catharina Hospital, Eindhoven, The Netherlands
Franciscus H. A. Bakker

Authors

Tim J. M. Jaspers
View author publications
You can also search for this author in PubMed Google Scholar
Ronald L. P. D. de Jong
View author publications
You can also search for this author in PubMed Google Scholar
Yasmina Al Khalil
View author publications
You can also search for this author in PubMed Google Scholar
Tijn Zeelenberg
View author publications
You can also search for this author in PubMed Google Scholar
Carolus H. J. Kusters
View author publications
You can also search for this author in PubMed Google Scholar
Yiping Li
View author publications
You can also search for this author in PubMed Google Scholar
Romy C. van Jaarsveld
View author publications
You can also search for this author in PubMed Google Scholar
Franciscus H. A. Bakker
View author publications
You can also search for this author in PubMed Google Scholar
Jelle P. Ruurda
View author publications
You can also search for this author in PubMed Google Scholar
Willem M. Brinkman
View author publications
You can also search for this author in PubMed Google Scholar
Peter H. N. De With
View author publications
You can also search for this author in PubMed Google Scholar
Fons van der Sommen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tim J. M. Jaspers .

Editor information

Editors and Affiliations

University of Aberdeen, Aberdeen, UK
Binod Bhattarai
University of Leeds, Leeds, UK
Sharib Ali
Stanford University, Stanford, CA, USA
Anita Rau
University College London, London, UK
Razvan Caramalau
University of Liverpool, Liverpool, UK
Anh Nguyen
West Virginia University, Morgantown, WV, USA
Prashnna Gyawali
University of Oxford, Oxford, UK
Ana Namburete
University College London, London, UK
Danail Stoyanov

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jaspers, T.J.M. et al. (2025). Exploring the Effect of Dataset Diversity in Self-supervised Learning for Surgical Computer Vision. In: Bhattarai, B., et al. Data Engineering in Medical Imaging. DEMI 2024. Lecture Notes in Computer Science, vol 15265. Springer, Cham. https://doi.org/10.1007/978-3-031-73748-0_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-73748-0_5
Published: 25 October 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-73747-3
Online ISBN: 978-3-031-73748-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The Medical Image Computing and Computer Assisted Intervention Society (opens in a new tab)

Exploring the Effect of Dataset Diversity in Self-supervised Learning for Surgical Computer Vision