Skip to main content

Unifying Visual Contrastive Learning for Object Recognition from a Graph Perspective

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13686))

Included in the following conference series:

  • 3331 Accesses

Abstract

Recent contrastive based unsupervised object recognition methods leverage a Siamese architecture, which has two branches composed of a backbone, a projector layer, and an optional predictor layer in each branch. To learn the parameters of the backbone, existing methods have a similar projector layer design, while the major difference among them lies in the predictor layer. In this paper, we propose to Unify existing unsupervised Visual Contrastive Learning methods by using a GCN layer as the predictor layer (UniVCL), which deserves two merits to unsupervised learning in object recognition. First, by treating different designs of predictors in the existing methods as its special cases, our fair and comprehensive experiments reveal the critical importance of neighborhood aggregation in the GCN predictor. Second, by viewing the predictor from the graph perspective, we can bridge the vision self-supervised learning with the graph representation learning area, which facilitates us to introduce the augmentations from the graph representation learning to unsupervised object recognition and further improves the unsupervised object recognition accuracy. Extensive experiments on linear evaluation and the semi-supervised learning tasks demonstrate the effectiveness of UniVCL and the introduced graph augmentations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Agrawal, P., Carreira, J., Malik, J.: Learning to see by moving. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 37–45 (2015)

    Google Scholar 

  2. Cao, J., Lin, X., Guo, S., Liu, L., Liu, T., Wang, B.: Bipartite graph embedding via mutual information maximization. In: Proceedings of the 14th ACM International Conference on Web Search and Data Mining, pp. 635–643 (2021)

    Google Scholar 

  3. Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A.: Unsupervised learning of visual features by contrasting cluster assignments. arXiv preprint arXiv:2006.09882 (2020)

  4. Caron, M., et al.: Emerging properties in self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9650–9660 (2021)

    Google Scholar 

  5. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607. PMLR (2020)

    Google Scholar 

  6. Chen, T., Kornblith, S., Swersky, K., Norouzi, M., Hinton, G.: Big self-supervised models are strong semi-supervised learners. arXiv preprint arXiv:2006.10029 (2020)

  7. Chen, X., Fan, H., Girshick, R., He, K.: Improved baselines with momentum contrastive learning. arXiv preprint arXiv:2003.04297 (2020)

  8. Chen, X., He, K.: Exploring simple Siamese representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 15750–15758 (2021)

    Google Scholar 

  9. Chen, X., Xie, S., He, K.: An empirical study of training self-supervised vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9640–9649 (2021)

    Google Scholar 

  10. Cubuk, E.D., Zoph, B., Shlens, J., Le, Q.V.: Randaugment: practical automated data augmentation with a reduced search space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 702–703 (2020)

    Google Scholar 

  11. Doersch, C., Gupta, A., Efros, A.A.: Unsupervised visual representation learning by context prediction. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1422–1430 (2015)

    Google Scholar 

  12. Donoser, M., Bischof, H.: Diffusion processes for retrieval revisited. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1320–1327 (2013)

    Google Scholar 

  13. Dwibedi, D., Aytar, Y., Tompson, J., Sermanet, P., Zisserman, A.: With a little help from my friends: nearest-neighbor contrastive learning of visual representations. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9588–9597 (2021)

    Google Scholar 

  14. Fang, Z., Wang, J., Wang, L., Zhang, L., Yang, Y., Liu, Z.: Seed: self-supervised distillation for visual representation. arXiv preprint arXiv:2101.04731 (2021)

  15. Gao, Y., Yu, X., Zhang, H.: Graph clustering using triangle-aware measures in large networks. Inf. Sci. 584, 618–632 (2022)

    Article  Google Scholar 

  16. Grill, J.B., et al.: Bootstrap your own latent-a new approach to self-supervised learning. Adv. Neural Inf. Process. Syst. 33, 21271–21284 (2020)

    Google Scholar 

  17. Hassani, K., Khasahmadi, A.H.: Contrastive multi-view representation learning on graphs. In: International Conference on Machine Learning, pp. 4116–4126. PMLR (2020)

    Google Scholar 

  18. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.: Momentum contrast for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 9729–9738 (2020)

    Google Scholar 

  19. Henaff, O.: Data-efficient image recognition with contrastive predictive coding. In: International Conference on Machine Learning, pp. 4182–4192. PMLR (2020)

    Google Scholar 

  20. Hu, Q., Wang, X., Hu, W., Qi, G.J.: AdCo: adversarial contrast for efficient learning of unsupervised representations from self-trained negative adversaries. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1074–1083 (2021)

    Google Scholar 

  21. Hu, W., et al.: Strategies for pre-training graph neural networks. arXiv preprint arXiv:1905.12265 (2019)

  22. Hu, Z., Dong, Y., Wang, K., Chang, K.W., Sun, Y.: GPT-GNN: generative pre-training of graph neural networks. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 1857–1867 (2020)

    Google Scholar 

  23. Kim, D., Cho, D., Yoo, D., Kweon, I.S.: Learning image representations by completing damaged jigsaw puzzles. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 793–802. IEEE (2018)

    Google Scholar 

  24. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)

  25. Kipf, T.N., Welling, M.: Variational graph auto-encoders. arXiv preprint arXiv:1611.07308 (2016)

  26. Klicpera, J., Weißenberger, S., Günnemann, S.: Diffusion improves graph learning. In: Advances in Neural Information Processing Systems, vol. 32 (2019)

    Google Scholar 

  27. Koohpayegani, S.A., Tejankar, A., Pirsiavash, H.: Mean shift for self-supervised learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10326–10335 (2021)

    Google Scholar 

  28. Larsson, G., Maire, M., Shakhnarovich, G.: Learning representations for automatic colorization. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 577–593. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_35

    Chapter  Google Scholar 

  29. Lee, D.H., et al.: Pseudo-label: the simple and efficient semi-supervised learning method for deep neural networks. In: Workshop on Challenges in Representation Learning, ICML, vol. 3, p. 896 (2013)

    Google Scholar 

  30. Li, J., Zhou, P., Xiong, C., Socher, R., Hoi, S.C.: Prototypical contrastive learning of unsupervised representations. arXiv preprint arXiv:2005.04966 (2020)

  31. Li, Z., et al.: Hierarchical bipartite graph neural networks: towards large-scale e-commerce applications. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE), pp. 1677–1688. IEEE (2020)

    Google Scholar 

  32. Liu, Q., Allamanis, M., Brockschmidt, M., Gaunt, A.: Constrained graph variational autoencoders for molecule design. In: Advances in Neural Information Processing Systems, vol. 31 (2018)

    Google Scholar 

  33. Liu, X., et al.: Self-supervised learning: generative or contrastive. IEEE Tran. Knowl. Data Eng. (2021)

    Google Scholar 

  34. Misra, I., Maaten, L.V.D.: Self-supervised learning of pretext-invariant representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6707–6717 (2020)

    Google Scholar 

  35. Pham, H., Dai, Z., Xie, Q., Le, Q.V.: Meta pseudo labels. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11557–11568 (2021)

    Google Scholar 

  36. Qi, G.J., Zhang, L., Lin, F., Wang, X.: Learning generalized transformation equivariant representations via autoencoding transformations. IEEE Trans. Pattern Anal. Mach. Intell. 44(4), 2045–2057 (2020)

    Article  Google Scholar 

  37. Robinson, J., Chuang, C.Y., Sra, S., Jegelka, S.: Contrastive learning with hard negative samples. arXiv preprint arXiv:2010.04592 (2020)

  38. Sohn, K., et al.: FixMatch: simplifying semi-supervised learning with consistency and confidence. arXiv preprint arXiv:2001.07685 (2020)

  39. Soroush Abbasi, K., Tejankar, A., Pirsiavash, H.: Mean shift for self-supervised learning. In: International Conference on Computer Vision (ICCV) (2021)

    Google Scholar 

  40. Subramonian, A.: Motif-driven contrastive learning of graph representations. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 15980–15981 (2021)

    Google Scholar 

  41. Sun, Q., et al.: Sugar: subgraph neural network with reinforcement pooling and self-supervised mutual information mechanism. In: Proceedings of the Web Conference 2021, pp. 2081–2091 (2021)

    Google Scholar 

  42. Tao, C., et al.: Exploring the equivalence of Siamese self-supervised learning via a unified gradient framework. arXiv preprint arXiv:2112.05141 (2021)

  43. Thanou, D., Dong, X., Kressner, D., Frossard, P.: Learning heat diffusion graphs. IEEE Trans. Sig. Inf. Process. Netw. 3(3), 484–499 (2017)

    MathSciNet  Google Scholar 

  44. Tian, Yonglong, Krishnan, Dilip, Isola, Phillip: Contrastive multiview coding. In: Vedaldi, Andrea, Bischof, Horst, Brox, Thomas, Frahm, Jan-Michael. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 776–794. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_45

    Chapter  Google Scholar 

  45. Tian, Y., Sun, C., Poole, B., Krishnan, D., Schmid, C., Isola, P.: What makes for good views for contrastive learning. arXiv preprint arXiv:2005.10243 (2020)

  46. Veličković, P., Cucurull, G., Casanova, A., Romero, A., Lio, P., Bengio, Y.: Graph attention networks. arXiv preprint arXiv:1710.10903 (2017)

  47. Wang, C., Liu, Z.: Learning graph representation by aggregating subgraphs via mutual information maximization. arXiv preprint arXiv:2103.13125 (2021)

  48. Wang, C., Pan, S., Long, G., Zhu, X., Jiang, J.: MGAE: marginalized graph autoencoder for graph clustering. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 889–898 (2017)

    Google Scholar 

  49. Wang, F., Kong, T., Zhang, R., Liu, H., Li, H.: Self-supervised learning by estimating twin class distributions. arXiv preprint arXiv:2110.07402 (2021)

  50. Wei, C., Wang, H., Shen, W., Yuille, A.: Co2: consistent contrast for unsupervised visual representation learning. arXiv preprint arXiv:2010.02217 (2020)

  51. Wu, Z., Xiong, Y., Yu, S.X., Lin, D.: Unsupervised feature learning via non-parametric instance discrimination. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3733–3742 (2018)

    Google Scholar 

  52. Wu, Z., Pan, S., Long, G., Jiang, J., Zhang, C.: Graph wavenet for deep spatial-temporal graph modeling. arXiv preprint arXiv:1906.00121 (2019)

  53. Xie, Q., Dai, Z., Hovy, E., Luong, M.T., Le, Q.V.: Unsupervised data augmentation for consistency training. arXiv preprint arXiv:1904.12848 (2019)

  54. Xie, Z., Lin, Y., Zhang, Z., Cao, Y., Lin, S., Hu, H.: Propagate yourself: exploring pixel-level consistency for unsupervised visual representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16684–16693 (2021)

    Google Scholar 

  55. Xu, K., Hu, W., Leskovec, J., Jegelka, S.: How powerful are graph neural networks? arXiv preprint arXiv:1810.00826 (2018)

  56. You, Y., Gitman, I., Ginsburg, B.: Large batch training of convolutional networks. arXiv preprint arXiv:1708.03888 (2017)

  57. You, Y., Chen, T., Sui, Y., Chen, T., Wang, Z., Shen, Y.: Graph contrastive learning with augmentations. Adv. Neural Inf. Process. Syst. 33, 5812–5823 (2020)

    Google Scholar 

  58. You, Y., Chen, T., Wang, Z., Shen, Y.: When does self-supervision help graph convolutional networks? In: International Conference on Machine Learning, pp. 10871–10880. PMLR (2020)

    Google Scholar 

  59. Zbontar, J., Jing, L., Misra, I., LeCun, Y., Deny, S.: Barlow twins: self-supervised learning via redundancy reduction. In: International Conference on Machine Learning, pp. 12310–12320. PMLR (2021)

    Google Scholar 

  60. Zhai, X., Oliver, A., Kolesnikov, A., Beyer, L.: S4l: self-supervised semi-supervised learning. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1476–1485 (2019)

    Google Scholar 

  61. Zhan, X., Xie, J., Liu, Z., Ong, Y.S., Loy, C.C.: Online deep clustering for unsupervised representation learning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6688–6697 (2020)

    Google Scholar 

  62. Zhang, L., Qi, G.J., Wang, L., Luo, J.: AAET vs. AED: unsupervised representation learning by auto-encoding transformations rather than data. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2547–2555 (2019)

    Google Scholar 

  63. Zhu, Q., Yang, C., Xu, Y., Wang, H., Zhang, C., Han, J.: Transfer learning of graph neural networks with ego-graph information maximization. Adv. Neural Inf. Process. Syst. 34, 1766–1779 (2021)

    Google Scholar 

  64. Zhu, Y., Xu, Y., Yu, F., Liu, Q., Wu, S., Wang, L.: Deep graph contrastive representation learning. arXiv preprint arXiv:2006.04131 (2020)

  65. Zhuang, C., Zhai, A.L., Yamins, D.: Local aggregation for unsupervised learning of visual embeddings. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6002–6012 (2019)

    Google Scholar 

Download references

Acknowledgement

This work was supported by the Australian Research Council Grant DP200103223, Australian Medical Research Future Fund MRFAI000085, CRC-P Smart Material Recovery Facility (SMRF) - Curby Soft Plastics, and CRC-P ARIA - Bionic Visual-Spatial Prosthesis for the Blind.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Bai .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 169 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tang, S., Zhu, F., Bai, L., Zhao, R., Wang, C., Ouyang, W. (2022). Unifying Visual Contrastive Learning for Object Recognition from a Graph Perspective. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13686. Springer, Cham. https://doi.org/10.1007/978-3-031-19809-0_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19809-0_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19808-3

  • Online ISBN: 978-3-031-19809-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics