Skip to main content

GKGNet: Group K-Nearest Neighbor Based Graph Convolutional Network for Multi-label Image Recognition

  • Conference paper
  • First Online:
Computer Vision – ECCV 2024 (ECCV 2024)

Abstract

Multi-Label Image Recognition (MLIR) is a challenging task that aims to predict multiple object labels in a single image while modeling the complex relationships between labels and image regions. Although convolutional neural networks and vision transformers have succeeded in processing images as regular grids of pixels or patches, these representations are sub-optimal for capturing irregular and discontinuous regions of interest. In this work, we present the first fully graph convolutional model, Group K-nearest neighbor based Graph convolutional Network (GKGNet), which models the connections between semantic label embeddings and image patches in a flexible and unified graph structure. To address the scale variance of different objects and to capture information from multiple perspectives, we propose the Group KGCN module for dynamic graph construction and message passing. Our experiments demonstrate that GKGNet achieves state-of-the-art performance with significantly lower computational costs on the challenging multi-label datasets, i.e. MS-COCO and VOC2007 datasets. Codes are available at https://github.com/jin-s13/GKGNet.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Chen, T., Lin, L., Hui, X., Chen, R., Wu, H.: Knowledge-guided multi-label few-shot learning for general image recognition. IEEE Trans. Pattern Anal. Mach. Intell. (2020)

    Google Scholar 

  2. Chen, T., Xu, M., Hui, X., Wu, H., Lin, L.: Learning semantic-specific graph representation for multi-label image recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 522–531 (2019)

    Google Scholar 

  3. Chen, Z.M., Wei, X.S., Jin, X., Guo, Y.: Multi-label image recognition with joint class-aware map disentangling and label correlation embedding. In: 2019 IEEE International Conference on Multimedia and Expo (ICME), pp. 622–627. IEEE (2019)

    Google Scholar 

  4. Chen, Z.M., Wei, X.S., Wang, P., Guo, Y.: Multi-label image recognition with graph convolutional networks. In: IEEE Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5177–5186 (2019)

    Google Scholar 

  5. Chen, Z., Wei, X.S., Wang, P., Guo, Y.: Learning graph convolutional networks for multi-label recognition and applications. IEEE Trans. Pattern Anal. Mach. Intell. (2021)

    Google Scholar 

  6. Cheng, X., et al.: MLTR: multi-label classification with transformer. In: 2022 IEEE International Conference on Multimedia And Expo (ICME), pp. 1–6. IEEE (2022)

    Google Scholar 

  7. Contributors, M.: Openmmlab’s image classification toolbox and benchmark. https://github.com/open-mmlab/mmclassification (2020)

  8. Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: IEEE 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)

    Google Scholar 

  9. Everingham, M., Eslami, S., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes challenge: a retrospective. Int. J. Comput. Vis. 111(1), 98–136 (2015)

    Article  Google Scholar 

  10. Gao, B.B., Zhou, H.Y.: Learning to discover multi-class attentional regions for multi-label image recognition. IEEE Trans. Image Process. 30, 5920–5932 (2021)

    Article  Google Scholar 

  11. Han, K., Wang, Y., Guo, J., Tang, Y., Wu, E.: Vision GNN: an image is worth graph of nodes. Adv. Neural Inform. Process. Syst. (2022)

    Google Scholar 

  12. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

    Google Scholar 

  13. Henaff, M., Bruna, J., LeCun, Y.: Deep convolutional networks on graph-structured data. arXiv preprint arXiv:1506.05163 (2015)

  14. Jia, J., Chen, X., Huang, K.: Spatial and semantic consistency regularizations for pedestrian attribute recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 962–971 (2021)

    Google Scholar 

  15. Jin, S., et al.: Differentiable hierarchical graph grouping for multi-person pose estimation. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part VII 16, pp. 718–734 (2020). https://doi.org/10.1007/978-3-030-58571-6_42

  16. Krizhevsky, A., et al.: Learning multiple layers of features from tiny images. Technical Report (2009)

    Google Scholar 

  17. Lanchantin, J., Wang, T., Ordonez, V., Qi, Y.: General multi-label image classification with transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16478–16488 (2021)

    Google Scholar 

  18. Li, G., Muller, M., Thabet, A., Ghanem, B.: Deepgcns: Can GCNS go as deep as CNNS? In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 9267–9276 (2019)

    Google Scholar 

  19. Li, Y., Huang, C., Loy, C.C., Tang, X.: Human attribute recognition by deep hierarchical contexts. In: Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11-14, 2016, Proceedings, Part VI 14, pp. 684–700 (2016). https://doi.org/10.1007/978-3-319-46466-4_41

  20. Lin, T.Y., et al.: Microsoft coco: common objects in context. In: Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp. 740–755 (2014)

    Google Scholar 

  21. Liu, R., Huang, J., Li, T.H., Li, G.: Causality compensated attention for contextual biased visual recognition. In: The Eleventh International Conference on Learning Representations (2022)

    Google Scholar 

  22. Liu, S., Zhang, L., Yang, X., Su, H., Zhu, J.: Query2label: A simple transformer way to multi-label classification. arXiv preprint arXiv:2107.10834 (2021)

  23. Loshchilov, I., Hutter, F.: Decoupled weight decay regularization. In: International Conference Learning Representation (2018)

    Google Scholar 

  24. Nilsback, M.E., Zisserman, A.: Automated flower classification over a large number of classes. In: 2008 Sixth Indian Conference on Computer Vision, Graphics and Image Processing, pp. 722–729. IEEE (2008)

    Google Scholar 

  25. Ridnik, T., et al.: Asymmetric loss for multi-label classification. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 82–91 (2021)

    Google Scholar 

  26. Shao, J., Kang, K., Change Loy, C., Wang, X.: Deeply learned attributes for crowded scene understanding. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4657–4666 (2015)

    Google Scholar 

  27. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)

    Google Scholar 

  28. Wang, C., Samari, B., Kim, V.G., Chaudhuri, S., Siddiqi, K.: Affinity graph supervision for visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8247–8255 (2020)

    Google Scholar 

  29. Wang, Y., et al.: Multi-label classification with label graph superimposing. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 12265–12272 (2020)

    Google Scholar 

  30. Ye, J., He, J., Peng, X., Wu, W., Qiao, Y.: Attention-driven dynamic graph convolutional network for multi-label image recognition. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXI 16, pp. 649–665 (2020).

    Google Scholar 

  31. You, R., Guo, Z., Cui, L., Long, X., Bao, Y., Wen, S.: Cross-modality attention with semantic graph embedding for multi-label classification. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 12709–12716 (2020)

    Google Scholar 

  32. Zhao, J., Yan, K., Zhao, Y., Guo, X., Huang, F., Li, J.: Transformer-based dual relation graph for multi-label image recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 163–172 (2021)

    Google Scholar 

  33. Zhou, J., et al.: Graph neural networks: a review of methods and applications. AI Open 1, 57–81 (2020)

    Article  Google Scholar 

  34. Zhu, F., Li, H., Ouyang, W., Yu, N., Wang, X.: Learning spatial regularization with image-level supervisions for multi-label image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5513–5522 (2017)

    Google Scholar 

  35. Zhu, G., et al.: Scene graph generation: a comprehensive survey. Neurocomputing (2024)

    Google Scholar 

Download references

Acknowledgement

This paper is partially supported by the National Key R&D Program of China No.2022ZD0161000 and the General Research Fund of Hong Kong No.17200622 and 17209324.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Chen Qian or Ji Wu .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 15866 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yao, R. et al. (2025). GKGNet: Group K-Nearest Neighbor Based Graph Convolutional Network for Multi-label Image Recognition. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds) Computer Vision – ECCV 2024. ECCV 2024. Lecture Notes in Computer Science, vol 15076. Springer, Cham. https://doi.org/10.1007/978-3-031-72649-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-72649-1_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-72648-4

  • Online ISBN: 978-3-031-72649-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics