Abstract
Self-supervised pretraining has shown impressive performance in recent years, matching or even outperforming ImageNet weights on a broad range of downstream tasks. Unfortunately, existing methods require massive amounts of computing power with large batch sizes and batch norm statistics synchronized across multiple GPUs. This effectively excludes substantial parts of the computer vision community from the benefits of self-supervised learning who do not have access to extensive computing resources.
To address that, we develop FastSiam with the aim of matching ImageNet weights given as little computing power as possible. We find that a core weakness of previous methods like SimSiam is that they compute the training target based on a single augmented crop (or “view”), leading to target instability. We show that by using multiple views per image instead of one, the training target can be stabilized, allowing for faster convergence and substantially reduced runtime. We evaluate FastSiam on multiple challenging downstream tasks including object detection, instance segmentation and keypoint detection and find that it matches ImageNet weights after 25 epochs of pretraining on a single GPU with a batch size of only 32.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bar, A., et al.: Detreg: unsupervised pretraining with region priors for object detection (2021). arXiv:2106.04550
Caron, M., Bojanowski, P., Joulin, A., Douze, M.: Deep clustering for unsupervised learning of visual features. In: Proceedings of the European Conference on Computer Vision (ECCV) (2018)
Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A.: Unsupervised learning of visual features by contrasting cluster assignments. In: Advances in Neural Information Processing Systems, vol. 33, pp. 9912–9924 (2020)
Caron, M., et al.: Emerging properties in self-supervised vision transformers (2021). arXiv:2104.14294
Chen, K., Hong, L., Xu, H., Li, Z., Yeung, D.Y.: Multisiam: self-supervised multi-instance siamese representation learning for autonomous driving. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 7546–7554 (2021)
Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: Proceedings of the 37th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 119, pp. 1597–1607 (2020)
Chen, X., Fan, H., Girshick, R., He, K.: Improved baselines with momentum contrastive learning (2020). arXiv:2003.04297
Chen, X., He, K.: Exploring simple siamese representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 15750–15758 (2021)
Chen, X., He, K.: Simsiam: exploring simple siamese representation learning (2021). https://github.com/facebookresearch/simsiam
Chen, X., Xie, S., He, K.: An empirical study of training self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 9640–9649 (2021)
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding (2016). arXiv:1604.01685
Dai, Z., Cai, B., Lin, Y., Chen, J.: UP-DETR: unsupervised pre-training for object detection with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1601–1610 (2021)
Ding, J., et al.: Unsupervised pretraining for object detection by patch reidentification (2021). arXiv:2103.04814
Doersch, C., Gupta, A., Efros, A.A.: Unsupervised visual representation learning by context prediction. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)
Gidaris, S., Bursuc, A., Puy, G., Komodakis, N., Cord, M., Perez, P.: Obow: online bag-of-visual-words generation for self-supervised learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6830–6840 (2021)
Gidaris, S., Singh, P., Komodakis, N.: Unsupervised representation learning by predicting image rotations. In: International Conference on Learning Representations ICLR (2018)
Grill, J.B., et al.: Bootstrap your own latent: a new approach to self-supervised learning (2020). arXiv:2006.07733
He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)
He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV), October 2017
Larsson, G., Maire, M., Shakhnarovich, G.: Colorization as a proxy task for visual understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection (2017). arXiv:1612.03144
Lin, T.Y., et al.: Microsoft coco: common objects in context (2015). arXiv:1405.0312
Liu, S., Li, Z., Sun, J.: Self-EMD: self-supervised object detection without imagenet. CoRR (2020). arXiv:2011.13677
Noroozi, M., Favaro, P.: Unsupervised learning of visual representations by solving jigsaw puzzles. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 69–84. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_5
Pinheiro, P., Almahairi, A., Benmalek, R., Golemo, F., Courville, A.: Unsupervised learning of dense visual representations. In: Advances in Neural Information Processing Systems, vol. 33, pp. 4489–4500 (2020)
Russakovsky, O., et al.: Imagenet large scale visual recognition challenge (2015). arXiv:1409.0575
Tian, Y., Sun, C., Poole, B., Krishnan, D., Schmid, C., Isola, P.: What makes for good views for contrastive learning? In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 6827–6839. Curran Associates, Inc. (2020)
Wang, X., Zhang, R., Shen, C., Kong, T., Li, L.: Dense contrastive learning for self-supervised visual pre-training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3024–3033 (2021)
Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2 (2019). https://github.com/facebookresearch/detectron2
Wu, Z., Xiong, Y., Yu, S.X., Lin, D.: Unsupervised feature learning via non-parametric instance discrimination. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2018
Xie, E., et al.: DetCo: unsupervised contrastive learning for object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 8392–8401 (2021)
Xie, Z., et al.: Self-supervised learning with swin transformers (2021). arXiv:2105.04553
Zbontar, J., Jing, L., Misra, I., LeCun, Y., Deny, S.: Barlow twins: self-supervised learning via redundancy reduction. In: International Conference on Machine Learning (ICML) (2021)
Zhou, J., et al.: iBOT: image BERT pre-training with online tokenizer (2021). arXiv:2111.07832
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Pototzky, D., Sultan, A., Schmidt-Thieme, L. (2022). FastSiam: Resource-Efficient Self-supervised Learning on a Single GPU. In: Andres, B., Bernard, F., Cremers, D., Frintrop, S., Goldlücke, B., Ihrke, I. (eds) Pattern Recognition. DAGM GCPR 2022. Lecture Notes in Computer Science, vol 13485. Springer, Cham. https://doi.org/10.1007/978-3-031-16788-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-16788-1_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-16787-4
Online ISBN: 978-3-031-16788-1
eBook Packages: Computer ScienceComputer Science (R0)