Abstract
Rich temporal information and variations in viewpoints make video data an attractive choice for learning image representations using unsupervised contrastive learning (UCL) techniques. State-of-the-art (SOTA) contrastive learning techniques consider frames within a video as positives in the embedding space, whereas the frames from other videos are considered negatives. We observe that unlike multiple views of an object in natural scene videos, an Ultrasound (US) video captures different 2D slices of an organ. Hence, there is almost no similarity between the temporally distant frames of even the same US video. In this paper we propose to instead utilize such frames as hard negatives. We advocate mining both intra-video and cross-video negatives in a hardness-sensitive negative mining curriculum in a UCL framework to learn rich image representations. We deploy our framework to learn the representations of Gallbladder (GB) malignancy from US videos. We also construct the first large-scale US video dataset containing 64 videos and 15,800 frames for learning GB representations. We show that the standard ResNet50 backbone trained with our framework improves the accuracy of models pretrained with SOTA UCL techniques as well as supervised pretrained models on ImageNet for the GB malignancy detection task by 2–6%. We further validate the generalizability of our method on a publicly available lung US image dataset of COVID-19 pathologies and show an improvement of 1.5% compared to SOTA. Source code, dataset, and models are available at https://gbc-iitd.github.io/usucl.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Butterfly videos. https://www.butterflynetwork.com/index.html, Accessed: 2 Mar 2022
Afshar, P., et al.: Covid-CT-MD, Covid-19 computed tomography scan dataset applicable in machine learning and deep learning. Sci. Data 8(1), 1–8 (2021)
Alzubaidi, L., et al.: Towards a better understanding of transfer learning for medical imaging: a case study. Appl. Sci. 10(13), 4523 (2020)
Ardila, D., et al.: End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography. Nat. Med. 25(6), 954–961 (2019)
Basu, S., Gupta, M., Rana, P., Gupta, P., Arora, C.: Surpassing the human accuracy: detecting gallbladder cancer from USG images with curriculum learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 20886–20896 (2022)
Bejnordi, B.E., et al.: Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. Jama 318(22), 2199–2210 (2017)
Bo, X., et al.: Diagnostic accuracy of imaging modalities in differentiating xanthogranulomatous cholecystitis from gallbladder cancer. Ann. Transl. Med. 7(22), 627 (2019)
Born, J., et al.: POCOVID-Net: automatic detection of COVID-19 from a new lung ultrasound imaging dataset (pocus). arXiv preprint arXiv:2004.12084 (2020)
Chen, T., et al.: Computer-aided diagnosis of gallbladder polyps based on high resolution ultrasonography. Comput. Methods Programs Biomed. 185, 105118 (2020)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: ICML, pp. 1597–1607. PMLR (2020)
Chen, X., He, K.: Exploring simple siamese representation learning. In: CVPR, pp. 15750–15758 (2021)
Chen, Y., et al.: USCL: pretraining deep ultrasound image diagnosis model through video contrastive representation learning. In: de Bruijne, M., et al. (eds.) MICCAI 2021. LNCS, vol. 12908, pp. 627–637. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-87237-3_60
Cheng, P.M., Malhi, H.S.: Transfer learning with convolutional neural networks for classification of abdominal ultrasound images. J. Digit. Imaging 30(2), 234–243 (2017)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)
Grill, J.B., et al.: Bootstrap your own latent-a new approach to self-supervised learning. NIPS 33, 21271–21284 (2020)
Gupta, P., Kumar, M., Sharma, V., Dutta, U., Sandhu, M.S.: Evaluation of gallbladder wall thickening: a multimodality imaging approach. Expert Rev. Gastroenterol. Hepatol. 14(6), 463–473 (2020)
Gupta, P., et al.: Gallbladder reporting and data system (gb-rads) for risk stratification of gallbladder wall thickening on ultrasonography: an international expert consensus. Abdom. Radiol., 1–12 (2021)
Gupta, P., et al.: Imaging-based algorithmic approach to gallbladder wall thickening. World J. Gastroenterol. 26(40), 6163 (2020)
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: CVPR, pp. 9729–9738 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Jeong, Y., et al.: Deep learning-based decision support system for the diagnosis of neoplastic gallbladder polyps on ultrasonography: preliminary results. Sci. Rep. 10(1), 1–10 (2020)
Komodakis, N., Gidaris, S.: Unsupervised representation learning by predicting image rotations. In: International Conference on Learning Representations (ICLR) (2018)
Lian, J., et al.: Automatic gallbladder and gallstone regions segmentation in ultrasound image. Int. J. Comput. Assist. Radiol. Surg. 12(4), 553–568 (2017). https://doi.org/10.1007/s11548-016-1515-z
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE international conference on computer vision, pp. 618–626 (2017)
Wu, H., Wang, X.: Contrastive learning of image representations with cross-video cycle-consistency. In: ICCV, pp. 10149–10159 (2021)
Xu, D., Xiao, J., Zhao, Z., Shao, J., Xie, D., Zhuang, Y.: Self-supervised spatiotemporal learning via video clip order prediction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10334–10343 (2019)
Yang, X., He, X., Zhao, J., Zhang, Y., Zhang, S., Xie, P.: Covid-CT-dataset: a CT scan dataset about covid-19. arXiv preprint arXiv:2003.13865 (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Basu, S., Singla, S., Gupta, M., Rana, P., Gupta, P., Arora, C. (2022). Unsupervised Contrastive Learning of Image Representations from Ultrasound Videos with Hard Negative Mining. In: Wang, L., Dou, Q., Fletcher, P.T., Speidel, S., Li, S. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2022. MICCAI 2022. Lecture Notes in Computer Science, vol 13434. Springer, Cham. https://doi.org/10.1007/978-3-031-16440-8_41
Download citation
DOI: https://doi.org/10.1007/978-3-031-16440-8_41
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16439-2
Online ISBN: 978-3-031-16440-8
eBook Packages: Computer ScienceComputer Science (R0)