FastSiam: Resource-Efficient Self-supervised Learning on a Single GPU

Pototzky, Daniel; Sultan, Azhar; Schmidt-Thieme, Lars

doi:10.1007/978-3-031-16788-1_4

Daniel Pototzky^13,14,
Azhar Sultan¹³ &
Lars Schmidt-Thieme¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13485))

Included in the following conference series:

DAGM German Conference on Pattern Recognition

2351 Accesses
1 Citations

Abstract

Self-supervised pretraining has shown impressive performance in recent years, matching or even outperforming ImageNet weights on a broad range of downstream tasks. Unfortunately, existing methods require massive amounts of computing power with large batch sizes and batch norm statistics synchronized across multiple GPUs. This effectively excludes substantial parts of the computer vision community from the benefits of self-supervised learning who do not have access to extensive computing resources.

To address that, we develop FastSiam with the aim of matching ImageNet weights given as little computing power as possible. We find that a core weakness of previous methods like SimSiam is that they compute the training target based on a single augmented crop (or “view”), leading to target instability. We show that by using multiple views per image instead of one, the training target can be stabilized, allowing for faster convergence and substantially reduced runtime. We evaluate FastSiam on multiple challenging downstream tasks including object detection, instance segmentation and keypoint detection and find that it matches ImageNet weights after 25 epochs of pretraining on a single GPU with a batch size of only 32.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

FastHebb: Scaling Hebbian Training of Deep Neural Networks to ImageNet Level

Self-supervised Learning for Object Detection in Autonomous Driving

SpaceJAM: a Lightweight and Regularization-Free Method for Fast Joint Alignment of Images

References

Bar, A., et al.: Detreg: unsupervised pretraining with region priors for object detection (2021). arXiv:2106.04550
Caron, M., Bojanowski, P., Joulin, A., Douze, M.: Deep clustering for unsupervised learning of visual features. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)
Google Scholar
Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A.: Unsupervised learning of visual features by contrasting cluster assignments. In: Advances in Neural Information Processing Systems, vol. 33, pp. 9912–9924 (2020)
Google Scholar
Caron, M., et al.: Emerging properties in self-supervised vision transformers (2021). arXiv:2104.14294
Chen, K., Hong, L., Xu, H., Li, Z., Yeung, D.Y.: Multisiam: self-supervised multi-instance siamese representation learning for autonomous driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7546–7554 (2021)
Google Scholar
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119, pp. 1597–1607 (2020)
Google Scholar
Chen, X., Fan, H., Girshick, R., He, K.: Improved baselines with momentum contrastive learning (2020). arXiv:2003.04297
Chen, X., He, K.: Exploring simple siamese representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 15750–15758 (2021)
Google Scholar
Chen, X., He, K.: Simsiam: exploring simple siamese representation learning (2021). https://github.com/facebookresearch/simsiam
Chen, X., Xie, S., He, K.: An empirical study of training self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9640–9649 (2021)
Google Scholar
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding (2016). arXiv:1604.01685
Dai, Z., Cai, B., Lin, Y., Chen, J.: UP-DETR: unsupervised pre-training for object detection with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1601–1610 (2021)
Google Scholar
Ding, J., et al.: Unsupervised pretraining for object detection by patch reidentification (2021). arXiv:2103.04814
Doersch, C., Gupta, A., Efros, A.A.: Unsupervised visual representation learning by context prediction. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)
Google Scholar
Gidaris, S., Bursuc, A., Puy, G., Komodakis, N., Cord, M., Perez, P.: Obow: online bag-of-visual-words generation for self-supervised learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6830–6840 (2021)
Google Scholar
Gidaris, S., Singh, P., Komodakis, N.: Unsupervised representation learning by predicting image rotations. In: International Conference on Learning Representations ICLR (2018)
Google Scholar
Grill, J.B., et al.: Bootstrap your own latent: a new approach to self-supervised learning (2020). arXiv:2006.07733
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
Google Scholar
He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), October 2017
Google Scholar
Larsson, G., Maire, M., Shakhnarovich, G.: Colorization as a proxy task for visual understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection (2017). arXiv:1612.03144
Lin, T.Y., et al.: Microsoft coco: common objects in context (2015). arXiv:1405.0312
Liu, S., Li, Z., Sun, J.: Self-EMD: self-supervised object detection without imagenet. CoRR (2020). arXiv:2011.13677
Noroozi, M., Favaro, P.: Unsupervised learning of visual representations by solving jigsaw puzzles. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 69–84. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_5
Chapter Google Scholar
Pinheiro, P., Almahairi, A., Benmalek, R., Golemo, F., Courville, A.: Unsupervised learning of dense visual representations. In: Advances in Neural Information Processing Systems, vol. 33, pp. 4489–4500 (2020)
Google Scholar
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge (2015). arXiv:1409.0575
Tian, Y., Sun, C., Poole, B., Krishnan, D., Schmid, C., Isola, P.: What makes for good views for contrastive learning? In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 6827–6839. Curran Associates, Inc. (2020)
Google Scholar
Wang, X., Zhang, R., Shen, C., Kong, T., Li, L.: Dense contrastive learning for self-supervised visual pre-training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3024–3033 (2021)
Google Scholar
Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2 (2019). https://github.com/facebookresearch/detectron2
Wu, Z., Xiong, Y., Yu, S.X., Lin, D.: Unsupervised feature learning via non-parametric instance discrimination. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
Google Scholar
Xie, E., et al.: DetCo: unsupervised contrastive learning for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 8392–8401 (2021)
Google Scholar
Xie, Z., et al.: Self-supervised learning with swin transformers (2021). arXiv:2105.04553
Zbontar, J., Jing, L., Misra, I., LeCun, Y., Deny, S.: Barlow twins: self-supervised learning via redundancy reduction. In: International Conference on Machine Learning (ICML) (2021)
Google Scholar
Zhou, J., et al.: iBOT: image BERT pre-training with online tokenizer (2021). arXiv:2111.07832

Download references

Author information

Authors and Affiliations

Robert Bosch GmbH, Hildesheim, Germany
Daniel Pototzky & Azhar Sultan
Information Systems and Machine Learning Lab, University of Hildesheim, Hildesheim, Germany
Daniel Pototzky & Lars Schmidt-Thieme

Authors

Daniel Pototzky
View author publications
You can also search for this author in PubMed Google Scholar
Azhar Sultan
View author publications
You can also search for this author in PubMed Google Scholar
Lars Schmidt-Thieme
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Pototzky .

Editor information

Editors and Affiliations

TU Dresden, Dresden, Germany
Björn Andres
University of Bonn, Bonn, Germany
Florian Bernard
Technical University of Munich, Munich, Germany
Daniel Cremers
University of Hamburg, Hamburg, Germany
Simone Frintrop
University of Konstanz, Konstanz, Germany
Bastian Goldlücke
University of Siegen, Siegen, Germany
Ivo Ihrke

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 699 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pototzky, D., Sultan, A., Schmidt-Thieme, L. (2022). FastSiam: Resource-Efficient Self-supervised Learning on a Single GPU. In: Andres, B., Bernard, F., Cremers, D., Frintrop, S., Goldlücke, B., Ihrke, I. (eds) Pattern Recognition. DAGM GCPR 2022. Lecture Notes in Computer Science, vol 13485. Springer, Cham. https://doi.org/10.1007/978-3-031-16788-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-16788-1_4
Published: 20 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16787-4
Online ISBN: 978-3-031-16788-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

FastSiam: Resource-Efficient Self-supervised Learning on a Single GPU