Multi-label guided graph attention network for education image retrieval

Nguyen, Van Thanh; Nguyen, Huu Quynh; Tran, Anh Dat; Dao, Thi Thuy Quynh

doi:10.1007/s11760-024-03630-2

Multi-label guided graph attention network for education image retrieval

Original Paper
Published: 01 December 2024

Volume 19, article number 19, (2025)
Cite this article

Signal, Image and Video Processing Aims and scope Submit manuscript

Van Thanh Nguyen¹,
Huu Quynh Nguyen²,
Anh Dat Tran ORCID: orcid.org/0000-0002-8924-4356³^na1 &
…
Thi Thuy Quynh Dao⁴^na1

118 Accesses
Explore all metrics

Abstract

In recent years, deep learning has achieved remarkable success thanks to advanced neural network architectures and large-scale datasets manually labeled by humans. However, accurately and efficiently labeling large datasets is often costly and challenging, particularly in fields requiring specialized labeling expertise, such as healthcare. In this context, building a model capable of large-scale image retrieval without extensive manual labeling is a crucial need. This study proposes a multi-label learning method based on attentive graph convolutions called GLGM (Graph network combined with Local and Global features based on Multi-label techniques) to address the issue of detailed classification with coarsely labeled datasets. Specifically, within the framework of contrastive learning, our method generates labels interconnected through graph convolutions. Unlike self-supervised contrastive learning methods that link global and local image features to create a graph that represents specific object characteristics, GLGM introduces a common search space that supports image retrieval in the educational field and image retrieval in general based on advanced sample distance search algorithms. We demonstrate that the GLGM method can encompass many state-of-the-art approaches as special cases. Experiments show that GLGM achieves significant improvements over existing advanced methods on various datasets, including CIFAR-10 and MLIC-Edu (a dataset we collected ourselves for the educational image retrieval domain).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

GREEN: a Graph REsidual rE-ranking Network for Grading Diabetic Retinopathy

Self-knowledge distillation enhanced binary neural networks derived from underutilized information

Article 01 March 2024

A Cross-Modal View to Utilize Label Semantics for Enhancing Student Network in Multi-label Classification

Data availability

The paper uses public data obtained from CIFAR10 and our dataset MLIC-Edu. The use of data in this study follows the guidelines by the dataset’s authors.

References

Obschonka, M., Audretsch, D.B.: Artificial intelligence and big data in entrepreneurship: a new era has begun. Small Bus. Econ. 55, 529–539 (2020). https://doi.org/10.1007/s11187-019-00202-4
Article MATH Google Scholar
Hu, X., Chu, L., Pei, J., et al.: Model complexity of deep learning: a survey. Knowl. Inf. Syst. 63, 2585–2619 (2021). https://doi.org/10.1007/s10115-021-01605-0
Article MATH Google Scholar
Venugopalan, J., Tong, L., Hassanzadeh, H.R., et al.: Multimodal deep learning models for early detection of Alzheimer’s disease stage. Sci. Rep. 11, 3254 (2021). https://doi.org/10.1038/s41598-020-74399-w
Article MATH Google Scholar
Gong, J., et al.: Hierarchical graph transformer-based deep learning model for large-scale multi-label text classification. IEEE Access 8, 30885–30896 (2020). https://doi.org/10.1109/ACCESS.2020.2972751
Article Google Scholar
Zhan, X., et al.: Rapid estimation of entire brain strain using deep learning models. IEEE Trans. Biomed. Eng. 68(11), 3424–3434 (2021). https://doi.org/10.1109/TBME.2021.3073380
Article MATH Google Scholar
Zhang, K., Yang, Z., Başar, T.: Multi-agent reinforcement learning: a selective overview of theories and algorithms. In: Vamvoudakis, K.G., Wan, Y., Lewis, F.L., Cansever, D. (eds) Handbook of Reinforcement Learning and Control. Studies in Systems, Decision and Control, vol 325. Springer, Cham. https://doi.org/10.1007/978-3-030-60990-0_12 (2021)
Rajeswaran, A., Mordatch, I., Kumar, V.: A game theoretic framework for model based reinforcement learning, 37th Int. Conf. Mach. Learn. ICML 2020, vol. PartF168147-11, pp. 7909-7919 (2020)
Su, J.C., Maji, S., Hariharan, B.: When does self-supervision improve few-shot learning?. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, JM. (eds) Computer Vision - ECCV 2020. ECCV 2020. Lecture Notes in Computer Science, vol 12352. Springer, Cham. https://doi.org/10.1007/978-3-030-58571-6_38 (2020)
Simon, C., Koniusz, P., Nock, R., Harandi, M.: Adaptive subspaces for few-shot learning, Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit., pp. 4135-4144, (2020) https://doi.org/10.1109/CVPR42600.2020.00419.
Pourpanah, F., et al.: A Review of Generalized Zero-Shot Learning Methods, In: IEEE Transactions on pattern analysis and machine intelligence, vol. 45, no. 4, pp. 4051-4070, 1 April (2023), https://doi.org/10.1109/TPAMI.2022.3191696.
Ren, W., Tang, Y., Sun, Q., Zhao, C., Han, Q.-L.: Visual semantic segmentation based on Few/Zero-shot learning: an overview. IEEE/CAA J. Automat. Sinica 11(5), 1106–1126 (2024). https://doi.org/10.1109/JAS.2023.123207
Article MATH Google Scholar
Ye, J., Zhao, J., Ye, K., Xu, C.: How to build a graph-based deep learning architecture in traffic domain: a survey. IEEE Trans. Intell. Transp. Syst. 23(5), 3904–3924 (2022). https://doi.org/10.1109/TITS.2020.3043250
Article MATH Google Scholar
Ghodratnama, S., Abrishami Moghaddam, H.: Content-based image retrieval using feature weighting and C-means clustering in a multi-label classification framework. Pattern. Anal. Applic. 24, 1–10 (2021). https://doi.org/10.1007/s10044-020-00887-4
Article MATH Google Scholar
Hong, D., Gao, L., Yao, J., Zhang, B., Plaza, A., Chanussot, J.: Graph convolutional networks for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 59(7), 5966–5978 (2021). https://doi.org/10.1109/TGRS.2020.3015157
Article Google Scholar
Chen, X., Ding, M., Wang, X., et al.: Context autoencoder for self-supervised representation learning. Int. J. Comput. Vis. 132, 208–223 (2024). https://doi.org/10.1007/s11263-023-01852-4
Article MATH Google Scholar
Yang, X., Song, Z., King, I., Xu, Z.: "A Survey on Deep Semi-Supervised Learning. IEEE Transactions on Knowledge and Data Engineering 35(9), 8934–8954 (2023). https://doi.org/10.1109/TKDE.2022.3220219
Article MATH Google Scholar
Zhang, P.-F., Li, Y., Huang, Z., Xu, X.-S.: Aggregation-Based Graph Convolutional Hashing for Unsupervised Cross-Modal Retrieval. IEEE Trans. Multimedia 24, 466–479 (2022). https://doi.org/10.1109/TMM.2021.3053766
Article MATH Google Scholar
Zhang, B., Kannan, R., Prasanna, V., BoostGCN: a framework for optimizing GCN inference on FPGA, IEEE 29th Annual international symposium on field-programmable custom computing machines (FCCM). Orlando, FL, USA 2021, 29–39 (2021). https://doi.org/10.1109/FCCM51124.2021.00012
Hu, B., Guo, K., Wang, X., Zhang, J., Zhou, D.: RRL-GAT: Graph Attention Network-Driven Multilabel Image Robust Representation Learning. IEEE Internet of Things Journal 9(12), 9167–9178 (2022). https://doi.org/10.1109/JIOT.2021.3089180
Article Google Scholar
Yan, W., Tong, W., Zhi, X.: S-GAT: accelerating graph attention networks inference on FPGA platform with shift operation, 2020 IEEE 26th International conference on parallel and distributed systems (ICPADS), Hong Kong (2020), pp. 661-666 https://doi.org/10.1109/ICPADS51040.2020.00093.
Yu, Z., Feng, B., He, D., Wang, Z., Huang, Y., Feng, Z.: LG-GNN: local-global adaptive graph neural network for modeling both homophily and heterophily
Barceló, P., Geerts, F., Reutter, J., Ryschkov, M.: Graph neural networks with local graph parameters. Adv. Neural. Inf. Process. Syst. 34, 25280–25293 (2021)
MATH Google Scholar
Zhang, L., Li, X., Arnab, A., Yang, K., Tong, Y., Torr, P.H.: Dual graph convolutional network for semantic segmentation. arXiv preprint (2019) arXiv:1909.06121
Li, X., Li, X., You, A., Zhang, L., Cheng, G., Yang, K., Lin, Z.: Towards efficient scene understanding via squeeze reasoning. IEEE Trans. Image Proc. 30, 7050–7063 (2021)
Article MATH Google Scholar
Xu, K., Huang, H., Deng, P., Li, Y.: Deep feature aggregation framework driven by graph convolutional network for scene classification in remote sensing. IEEE Trans. Neural Netw. Learn. Syst. 33(10), 5751–5765 (2022). https://doi.org/10.1109/TNNLS.2021.3071369
Article MATH Google Scholar
Alfke, D., Stoll, M.: Pseudoinverse graph convolutional networks. Data Min. Knowl. Disc. 35, 1318–1341 (2021). https://doi.org/10.1007/s10618-021-00752-w
Article MATH Google Scholar
Wu, Z., Chen, Z., Du, S., Huang, S., Wang, S.: Graph convolutional network with elastic topology. Pattern Recognit. 151, 110364 (2024). https://doi.org/10.1016/j.patcog.2024.110364
Article MATH Google Scholar
Feng, M., et al.: Exploring Hierarchical Spatial Layout Cues for 3D Point Cloud based Scene Graph Prediction. IEEE Transactions on Multimedia (2023). https://doi.org/10.1109/TMM.2023.3277736
Article Google Scholar
Sariyildiz, M.B., Alahari, K., Larlus, D., Kalantidis, Y.: Fake it till you make it: learning transferable representations from synthetic imagenet clones (2023). https://doi.org/10.1109/cvpr52729.2023.00774.
Smith, R.J., Amaral, R., Heywood, M.I.: Evolving simple solutions to the CIFAR-10 benchmark using tangled program graphs, IEEE congress on evolutionary computation (CEC). Kraków, Poland 2021, 2061–2068 (2021). https://doi.org/10.1109/CEC45853.2021.9504998
Article Google Scholar
Chen, Z.M., Wei, X.S., Wang, P., Guo, Y.: Learning graph convolutional networks for multi-label recognition and applications. IEEE Trans. Pattern Anal. Mach. Intell. 45(6), 6969–6983 (2023). https://doi.org/10.1109/TPAMI.2021.3063496
Article MATH Google Scholar
Rodrigues, J., Cristo, M., Colonna, J.G.: Deep hashing for multi-label image retrieval: a survey. Artif. Intell. Rev. 53, 5261–5307 (2020). https://doi.org/10.1007/s10462-020-09820-x
Article MATH Google Scholar
Shen, X., Dong, G., Zheng, Y., Lan, L., Tsang, I.W., Sun, Q.-S.: Deep co-image-label hashing for multi-label image retrieval. IEEE Trans. Multimed. 24, 1116–1126 (2022). https://doi.org/10.1109/TMM.2021.3119868
Article Google Scholar
Chen, T., Lin, L., Chen, R., Hui, X., Wu, H.: Knowledge-guided multi-label few-shot learning for general image recognition. IEEE Trans. Pattern Anal. Mach. Intell. 44(3), 1371–1384 (2022). https://doi.org/10.1109/TPAMI.2020.3025814
Article MATH Google Scholar
Min, W., et al.: Large scale visual food recognition. IEEE Trans. Pattern Anal. Mach. Intell. 45(8), 9932–9949 (2023). https://doi.org/10.1109/TPAMI.2023.3237871
Article Google Scholar
Ji, Z., et al.: Deep ranking for image zero-shot multi-label classification. IEEE Trans. Image Process. 29, 6549–6560 (2020). https://doi.org/10.1109/TIP.2020.2991527
Article MathSciNet MATH Google Scholar
Zhang, J., Ren, J., Zhang, Q., Liu, J., Jiang, X.: Spatial context-aware object-attentional network for multi-label image classification. IEEE Trans. Image Process. 32, 3000–3012 (2023). https://doi.org/10.1109/TIP.2023.3266161
Article MATH Google Scholar
Zhang, Q.: A novel ResNet101 model based on dense dilated convolution for image classification. SN Appl. Sci. 4, 9 (2022). https://doi.org/10.1007/s42452-021-04897-7
Article MATH Google Scholar
Ni, R., Cao, H.: Sentiment analysis based on GloVe and LSTM-GRU, 39th Chinese control conference (CCC). Shenyang, China 2020, 7492–7497 (2020). https://doi.org/10.23919/CCC50068.2020.9188578
Ghadekar, P.P., Mohite, S., More, O., Patil, P., Sayantika, Mangrule, S.: Sentence meaning similarity detector using FAISS, 7th International conference on computing, communication, control and automation (ICCUBEA). Pune, India 2023, 1–6 (2023). https://doi.org/10.1109/ICCUBEA58933.2023.10392009
Li, L., Doroslovački, M., Loew, M.H.: Approximating the Gradient of Cross-Entropy Loss Function. IEEE Access 8, 111626–111635 (2020). https://doi.org/10.1109/ACCESS.2020.3001531
Article MATH Google Scholar
Taguchi, H., Liu, X., Murata, T.: Graph convolutional networks for graphs containing missing features. Futur. Gener. Comput. Syst. 117, 155–168 (2021). https://doi.org/10.1016/j.future.2020.11.016
Article MATH Google Scholar
Ieamsaard, J., Charoensook, S.N., Yammen, S.: Deep learning-based face mask detection using YoloV5. 9th International electrical engineering congress (iEECON). Pattaya, Thailand 2021, 428–431 (2021). https://doi.org/10.1109/iEECON51072.2021.9440346

Download references

Acknowledgements

This research is funded by the Posts and Telecommunications Institute of Technology (PTIT), Vietnam under grant number ‘12-2024-HV-CNTT1’. The authors would like to thank PTIT for the financial support.

Author information

Anh Dat Tran and Thi Thuy Quynh Dao have contributed equally to this work.

Authors and Affiliations

Academy of Finance, Faculty of Information Technology of ThuyLoi University, 11398, Ha Noi, Viet Nam
Van Thanh Nguyen
CMC University, 11398, Ha Noi, Viet Nam
Huu Quynh Nguyen
Faculty of Information Technology, ThuyLoi University, 11398, Ha Noi, Viet Nam
Anh Dat Tran
Faculty of Information Technology, Posts and Telecommunications Institute of Technology, 11398, Ha Noi, Viet Nam
Thi Thuy Quynh Dao

Authors

Van Thanh Nguyen
View author publications
You can also search for this author inPubMed Google Scholar
Huu Quynh Nguyen
View author publications
You can also search for this author inPubMed Google Scholar
Anh Dat Tran
View author publications
You can also search for this author inPubMed Google Scholar
Thi Thuy Quynh Dao
View author publications
You can also search for this author inPubMed Google Scholar

Contributions

All authors contributed to study’s conception and design. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Huu Quynh Nguyen.

Ethics declarations

Conflict of interest

No Conflict of interest exits in the submission of this manuscript, and manuscript is approved by all authors for publication.

Ethics approval

Consent was obtained from all participants prior tho their involvements in the study, and they were informed of their right to withdraw at any time without consequence.

Consent to participate

All authors agreed to participate in the construction and development of this research topic.

Consent to publication

All authors agreed to make this study public.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Nguyen, V.T., Nguyen, H.Q., Tran, A.D. et al. Multi-label guided graph attention network for education image retrieval. SIViP 19, 19 (2025). https://doi.org/10.1007/s11760-024-03630-2

Download citation

Received: 05 September 2024
Revised: 06 October 2024
Accepted: 16 October 2024
Published: 01 December 2024
DOI: https://doi.org/10.1007/s11760-024-03630-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-label guided graph attention network for education image retrieval

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

GREEN: a Graph REsidual rE-ranking Network for Grading Diabetic Retinopathy

Self-knowledge distillation enhanced binary neural networks derived from underutilized information

A Cross-Modal View to Utilize Label Semantics for Enhancing Student Network in Multi-label Classification

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethics approval

Consent to participate

Consent to publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now