Abstract
In recent years, deep learning has achieved remarkable success thanks to advanced neural network architectures and large-scale datasets manually labeled by humans. However, accurately and efficiently labeling large datasets is often costly and challenging, particularly in fields requiring specialized labeling expertise, such as healthcare. In this context, building a model capable of large-scale image retrieval without extensive manual labeling is a crucial need. This study proposes a multi-label learning method based on attentive graph convolutions called GLGM (Graph network combined with Local and Global features based on Multi-label techniques) to address the issue of detailed classification with coarsely labeled datasets. Specifically, within the framework of contrastive learning, our method generates labels interconnected through graph convolutions. Unlike self-supervised contrastive learning methods that link global and local image features to create a graph that represents specific object characteristics, GLGM introduces a common search space that supports image retrieval in the educational field and image retrieval in general based on advanced sample distance search algorithms. We demonstrate that the GLGM method can encompass many state-of-the-art approaches as special cases. Experiments show that GLGM achieves significant improvements over existing advanced methods on various datasets, including CIFAR-10 and MLIC-Edu (a dataset we collected ourselves for the educational image retrieval domain).
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03630-2/MediaObjects/11760_2024_3630_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03630-2/MediaObjects/11760_2024_3630_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03630-2/MediaObjects/11760_2024_3630_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs11760-024-03630-2/MediaObjects/11760_2024_3630_Fig4_HTML.png)
Similar content being viewed by others
Data availability
The paper uses public data obtained from CIFAR10 and our dataset MLIC-Edu. The use of data in this study follows the guidelines by the dataset’s authors.
References
Obschonka, M., Audretsch, D.B.: Artificial intelligence and big data in entrepreneurship: a new era has begun. Small Bus. Econ. 55, 529–539 (2020). https://doi.org/10.1007/s11187-019-00202-4
Hu, X., Chu, L., Pei, J., et al.: Model complexity of deep learning: a survey. Knowl. Inf. Syst. 63, 2585–2619 (2021). https://doi.org/10.1007/s10115-021-01605-0
Venugopalan, J., Tong, L., Hassanzadeh, H.R., et al.: Multimodal deep learning models for early detection of Alzheimer’s disease stage. Sci. Rep. 11, 3254 (2021). https://doi.org/10.1038/s41598-020-74399-w
Gong, J., et al.: Hierarchical graph transformer-based deep learning model for large-scale multi-label text classification. IEEE Access 8, 30885–30896 (2020). https://doi.org/10.1109/ACCESS.2020.2972751
Zhan, X., et al.: Rapid estimation of entire brain strain using deep learning models. IEEE Trans. Biomed. Eng. 68(11), 3424–3434 (2021). https://doi.org/10.1109/TBME.2021.3073380
Zhang, K., Yang, Z., Başar, T.: Multi-agent reinforcement learning: a selective overview of theories and algorithms. In: Vamvoudakis, K.G., Wan, Y., Lewis, F.L., Cansever, D. (eds) Handbook of Reinforcement Learning and Control. Studies in Systems, Decision and Control, vol 325. Springer, Cham. https://doi.org/10.1007/978-3-030-60990-0_12 (2021)
Rajeswaran, A., Mordatch, I., Kumar, V.: A game theoretic framework for model based reinforcement learning, 37th Int. Conf. Mach. Learn. ICML 2020, vol. PartF168147-11, pp. 7909-7919 (2020)
Su, J.C., Maji, S., Hariharan, B.: When does self-supervision improve few-shot learning?. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision - ECCV 2020. ECCV 2020. Lecture Notes in Computer Science, vol 12352. Springer, Cham. https://doi.org/10.1007/978-3-030-58571-6_38 (2020)
Simon, C., Koniusz, P., Nock, R., Harandi, M.: Adaptive subspaces for few-shot learning, Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 4135-4144, (2020) https://doi.org/10.1109/CVPR42600.2020.00419.
Pourpanah, F., et al.: A Review of Generalized Zero-Shot Learning Methods, In: IEEE Transactions on pattern analysis and machine intelligence, vol. 45, no. 4, pp. 4051-4070, 1 April (2023), https://doi.org/10.1109/TPAMI.2022.3191696.
Ren, W., Tang, Y., Sun, Q., Zhao, C., Han, Q.-L.: Visual semantic segmentation based on Few/Zero-shot learning: an overview. IEEE/CAA J. Automat. Sinica 11(5), 1106–1126 (2024). https://doi.org/10.1109/JAS.2023.123207
Ye, J., Zhao, J., Ye, K., Xu, C.: How to build a graph-based deep learning architecture in traffic domain: a survey. IEEE Trans. Intell. Transp. Syst. 23(5), 3904–3924 (2022). https://doi.org/10.1109/TITS.2020.3043250
Ghodratnama, S., Abrishami Moghaddam, H.: Content-based image retrieval using feature weighting and C-means clustering in a multi-label classification framework. Pattern. Anal. Applic. 24, 1–10 (2021). https://doi.org/10.1007/s10044-020-00887-4
Hong, D., Gao, L., Yao, J., Zhang, B., Plaza, A., Chanussot, J.: Graph convolutional networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 59(7), 5966–5978 (2021). https://doi.org/10.1109/TGRS.2020.3015157
Chen, X., Ding, M., Wang, X., et al.: Context autoencoder for self-supervised representation learning. Int. J. Comput. Vis. 132, 208–223 (2024). https://doi.org/10.1007/s11263-023-01852-4
Yang, X., Song, Z., King, I., Xu, Z.: "A Survey on Deep Semi-Supervised Learning. IEEE Transactions on Knowledge and Data Engineering 35(9), 8934–8954 (2023). https://doi.org/10.1109/TKDE.2022.3220219
Zhang, P.-F., Li, Y., Huang, Z., Xu, X.-S.: Aggregation-Based Graph Convolutional Hashing for Unsupervised Cross-Modal Retrieval. IEEE Trans. Multimedia 24, 466–479 (2022). https://doi.org/10.1109/TMM.2021.3053766
Zhang, B., Kannan, R., Prasanna, V., BoostGCN: a framework for optimizing GCN inference on FPGA, IEEE 29th Annual international symposium on field-programmable custom computing machines (FCCM). Orlando, FL, USA 2021, 29–39 (2021). https://doi.org/10.1109/FCCM51124.2021.00012
Hu, B., Guo, K., Wang, X., Zhang, J., Zhou, D.: RRL-GAT: Graph Attention Network-Driven Multilabel Image Robust Representation Learning. IEEE Internet of Things Journal 9(12), 9167–9178 (2022). https://doi.org/10.1109/JIOT.2021.3089180
Yan, W., Tong, W., Zhi, X.: S-GAT: accelerating graph attention networks inference on FPGA platform with shift operation, 2020 IEEE 26th International conference on parallel and distributed systems (ICPADS), Hong Kong (2020), pp. 661-666 https://doi.org/10.1109/ICPADS51040.2020.00093.
Yu, Z., Feng, B., He, D., Wang, Z., Huang, Y., Feng, Z.: LG-GNN: local-global adaptive graph neural network for modeling both homophily and heterophily
Barceló, P., Geerts, F., Reutter, J., Ryschkov, M.: Graph neural networks with local graph parameters. Adv. Neural. Inf. Process. Syst. 34, 25280–25293 (2021)
Zhang, L., Li, X., Arnab, A., Yang, K., Tong, Y., Torr, P.H.: Dual graph convolutional network for semantic segmentation. arXiv preprint (2019) arXiv:1909.06121
Li, X., Li, X., You, A., Zhang, L., Cheng, G., Yang, K., Lin, Z.: Towards efficient scene understanding via squeeze reasoning. IEEE Trans. Image Proc. 30, 7050–7063 (2021)
Xu, K., Huang, H., Deng, P., Li, Y.: Deep feature aggregation framework driven by graph convolutional network for scene classification in remote sensing. IEEE Trans. Neural Netw. Learn. Syst. 33(10), 5751–5765 (2022). https://doi.org/10.1109/TNNLS.2021.3071369
Alfke, D., Stoll, M.: Pseudoinverse graph convolutional networks. Data Min. Knowl. Disc. 35, 1318–1341 (2021). https://doi.org/10.1007/s10618-021-00752-w
Wu, Z., Chen, Z., Du, S., Huang, S., Wang, S.: Graph convolutional network with elastic topology. Pattern Recognit. 151, 110364 (2024). https://doi.org/10.1016/j.patcog.2024.110364
Feng, M., et al.: Exploring Hierarchical Spatial Layout Cues for 3D Point Cloud based Scene Graph Prediction. IEEE Transactions on Multimedia (2023). https://doi.org/10.1109/TMM.2023.3277736
Sariyildiz, M.B., Alahari, K., Larlus, D., Kalantidis, Y.: Fake it till you make it: learning transferable representations from synthetic imagenet clones (2023). https://doi.org/10.1109/cvpr52729.2023.00774.
Smith, R.J., Amaral, R., Heywood, M.I.: Evolving simple solutions to the CIFAR-10 benchmark using tangled program graphs, IEEE congress on evolutionary computation (CEC). Kraków, Poland 2021, 2061–2068 (2021). https://doi.org/10.1109/CEC45853.2021.9504998
Chen, Z.M., Wei, X.S., Wang, P., Guo, Y.: Learning graph convolutional networks for multi-label recognition and applications. IEEE Trans. Pattern Anal. Mach. Intell. 45(6), 6969–6983 (2023). https://doi.org/10.1109/TPAMI.2021.3063496
Rodrigues, J., Cristo, M., Colonna, J.G.: Deep hashing for multi-label image retrieval: a survey. Artif. Intell. Rev. 53, 5261–5307 (2020). https://doi.org/10.1007/s10462-020-09820-x
Shen, X., Dong, G., Zheng, Y., Lan, L., Tsang, I.W., Sun, Q.-S.: Deep co-image-label hashing for multi-label image retrieval. IEEE Trans. Multimed. 24, 1116–1126 (2022). https://doi.org/10.1109/TMM.2021.3119868
Chen, T., Lin, L., Chen, R., Hui, X., Wu, H.: Knowledge-guided multi-label few-shot learning for general image recognition. IEEE Trans. Pattern Anal. Mach. Intell. 44(3), 1371–1384 (2022). https://doi.org/10.1109/TPAMI.2020.3025814
Min, W., et al.: Large scale visual food recognition. IEEE Trans. Pattern Anal. Mach. Intell. 45(8), 9932–9949 (2023). https://doi.org/10.1109/TPAMI.2023.3237871
Ji, Z., et al.: Deep ranking for image zero-shot multi-label classification. IEEE Trans. Image Process. 29, 6549–6560 (2020). https://doi.org/10.1109/TIP.2020.2991527
Zhang, J., Ren, J., Zhang, Q., Liu, J., Jiang, X.: Spatial context-aware object-attentional network for multi-label image classification. IEEE Trans. Image Process. 32, 3000–3012 (2023). https://doi.org/10.1109/TIP.2023.3266161
Zhang, Q.: A novel ResNet101 model based on dense dilated convolution for image classification. SN Appl. Sci. 4, 9 (2022). https://doi.org/10.1007/s42452-021-04897-7
Ni, R., Cao, H.: Sentiment analysis based on GloVe and LSTM-GRU, 39th Chinese control conference (CCC). Shenyang, China 2020, 7492–7497 (2020). https://doi.org/10.23919/CCC50068.2020.9188578
Ghadekar, P.P., Mohite, S., More, O., Patil, P., Sayantika, Mangrule, S.: Sentence meaning similarity detector using FAISS, 7th International conference on computing, communication, control and automation (ICCUBEA). Pune, India 2023, 1–6 (2023). https://doi.org/10.1109/ICCUBEA58933.2023.10392009
Li, L., Doroslovački, M., Loew, M.H.: Approximating the Gradient of Cross-Entropy Loss Function. IEEE Access 8, 111626–111635 (2020). https://doi.org/10.1109/ACCESS.2020.3001531
Taguchi, H., Liu, X., Murata, T.: Graph convolutional networks for graphs containing missing features. Futur. Gener. Comput. Syst. 117, 155–168 (2021). https://doi.org/10.1016/j.future.2020.11.016
Ieamsaard, J., Charoensook, S.N., Yammen, S.: Deep learning-based face mask detection using YoloV5. 9th International electrical engineering congress (iEECON). Pattaya, Thailand 2021, 428–431 (2021). https://doi.org/10.1109/iEECON51072.2021.9440346
Acknowledgements
This research is funded by the Posts and Telecommunications Institute of Technology (PTIT), Vietnam under grant number ‘12-2024-HV-CNTT1’. The authors would like to thank PTIT for the financial support.
Author information
Authors and Affiliations
Contributions
All authors contributed to study’s conception and design. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
No Conflict of interest exits in the submission of this manuscript, and manuscript is approved by all authors for publication.
Ethics approval
Consent was obtained from all participants prior tho their involvements in the study, and they were informed of their right to withdraw at any time without consequence.
Consent to participate
All authors agreed to participate in the construction and development of this research topic.
Consent to publication
All authors agreed to make this study public.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Nguyen, V.T., Nguyen, H.Q., Tran, A.D. et al. Multi-label guided graph attention network for education image retrieval. SIViP 19, 19 (2025). https://doi.org/10.1007/s11760-024-03630-2
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11760-024-03630-2