research-article

A Semi-supervised Learning Approach Based on Adaptive Weighted Fusion for Automatic Image Annotation

Authors:

Zhiping ShiAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 17, Issue 1

Article No.: 37, Pages 1 - 23

https://doi.org/10.1145/3426974

Published: 16 April 2021 Publication History

Abstract

To learn a well-performed image annotation model, a large number of labeled samples are usually required. Although the unlabeled samples are readily available and abundant, it is a difficult task for humans to annotate large numbers of images manually. In this article, we propose a novel semi-supervised approach based on adaptive weighted fusion for automatic image annotation that can simultaneously utilize the labeled data and unlabeled data to improve the annotation performance. At first, two different classifiers, constructed based on support vector machine and covolutional neural network, respectively, are trained by different features extracted from the labeled data. Therefore, these two classifiers are independently represented as different feature views. Then, the corresponding features of unlabeled images are extracted and input into these two classifiers, and the semantic annotation of images can be obtained respectively. At the same time, the confidence of corresponding image annotation can be measured by an adaptive weighted fusion strategy. After that, the images and its semantic annotations with high confidence are submitted to the classifiers for retraining until a certain stop condition is reached. As a result, we can obtain a strong classifier that can make full use of unlabeled data. Finally, we conduct experiments on four datasets, namely, Corel 5K, IAPR TC12, ESP Game, and NUS-WIDE. In addition, we measure the performance of our approach with standard criteria, including precision, recall, F-measure, N+, and mAP. The experimental results show that our approach has superior performance and outperforms many state-of-the-art approaches.

References

[1]

David M. Blei and Michael I. Jordan. 2003. Modeling annotated data. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval (SIGIR). ACM, 127–134.

[2]

David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Jan. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3 (Jan. 2003), 993–1022.

[3]

Avrim Blum and Tom Mitchell. 1998. Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory (COLT’98). 92–100.

Digital Library

[4]

Gustavo Carneiro, Antoni B. Chan, Pedro J. Moreno, and Nuno Vasconcelos. 2007. Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 29, 3 (2007), 394–410.

Digital Library

[5]

Hakan Cevikalp, Burak Benligiray, and Omer Nezih Gerek. Apr. 2020. Semi-supervised robust deep neural networks for multi-label image classification. Pattern Recogn. 100, Article 107164 (Apr. 2020), 9 pages.

[6]

Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 3 (2011), 1–27.

Digital Library

[7]

Xiangyu Chen, Yadong Mu, Shuicheng Yan, and Tat-Seng Chua. 2010. Efficient large-scale image annotation by probabilistic collaborative multi-label propagation. In Proceedings of the 18th ACM International Conference on Multimedia (MM’10). ACM, 35–44.

Digital Library

[8]

Tat-Seng Chua, Jinhui Tang, Richang Hong, Haojie Li, Zhiping Luo, and Yantao Zheng. 2009. NUS-WIDE: A real-world web image database from National University of Singapore. In Proceedings of the ACM International Conference on Image and Video Retrieval (CIVR’09). ACM, 1–9.

Digital Library

[9]

Ritendra Datta, Dhiraj Joshi, Jia Li, and James Z. Wang. 2008. Image retrieval: Ideas, influences, and trends of the new age. Comput. Surv. 40, 2 (2008), 1–60.

Digital Library

[10]

Janez Demšar. Jun. 2006. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7 (Jun. 2006), 1–30.

[11]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’09). IEEE, 248–255.

[12]

Pinar Duygulu, Kobus Barnard, Joao F. G. de Freitas, and David A. Forsyth. 2002. Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In Proceedings of the European Conference on Computer Vision (ECCV’02). Springer, 97–112.

[13]

Hugo Jair Escalante, Carlos A. Hernández, Jesus A. Gonzalez, Aurelio López-López, Manuel Montes, Eduardo F. Morales, L. Enrique Sucar, Luis Villaseñor, and Michael Grubinger. 2010. The segmented and annotated IAPR TC-12 benchmark. Comput. Vis. Image Understand. 114, 4 (2010), 419–428.

Digital Library

[14]

Shao Lei Feng, Raghavan Manmatha, and Victor Lavrenko. 2004. Multiple Bernoulli relevance models for image and video annotation. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’04), Vol. 2. IEEE, 1002–1009.

[15]

King Shy Goh, Edward Y. Chang, and Beitao Li. 2005. Using one-class and two-class SVMs for multiclass image annotation. IEEE Trans. Knowl. Data Eng. 17, 10 (2005), 1333–1346.

Digital Library

[16]

Sally A Goldman and Yan Zhou. 2000. Enhancing supervised learning with unlabeled data. In Proceedings of the 17th International Conference on Machine Learning (ICML’00). ACM, 327–334.

[17]

Yunchao Gong, Yangqing Jia, Thomas Leung, Alexander Toshev, and Sergey Ioffe. 2013. Deep convolutional ranking for multilabel image annotation. arXiv:1312.4894. Retrieved from https://arxiv.org/abs/1312.4894.

[18]

Matthieu Guillaumin, Thomas Mensink, Jakob Verbeek, and Cordelia Schmid. 2009. Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation. In Proceedings of the 12th IEEE International Conference on Computer Vision (ICCV’09). IEEE, 309–316.

[19]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37, 9 (2015), 1904–1916.

Digital Library

[20]

Thomas Hofmann. 2001. Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42, 1–2 (2001), 177–196.

[21]

Jiwoon Jeon, Victor Lavrenko, and Raghavan Manmatha. 2003. Automatic image annotation and retrieval using cross-media relevance models. In Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Informaion Retrieval (SIGIR’03). ACM, 119–126.

Digital Library

[22]

Liping Jing, Chenyang Shen, Liu Yang, Jian Yu, and Michael K. Ng. 2017. Multi-label classification by semi-supervised singular value decomposition. IEEE Trans. Image Process. 26, 10 (2017), 4612–4625.

Digital Library

[23]

Xiao-Yuan Jing, Fei Wu, Zhiqiang Li, Ruimin Hu, and David Zhang. 2016. Multi-label dictionary learning for image annotation. IEEE Trans. Image Process. 25, 6 (2016), 2712–2725.

Digital Library

[24]

Justin Johnson, Lamberto Ballan, and Li Fei-Fei. 2015. Love thy neighbors: Image annotation by exploiting image metadata. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’15). IEEE, 4624–4632.

Digital Library

[25]

Xiao Ke, Mingke Zhou, Yuzhen Niu, and Wenzhong Guo. Nov. 2017. Data equilibrium based automatic image annotation by fusing deep model and semantic propagation. Pattern Recogn. 71 (Nov. 2017), 60–77.

[26]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS’12). MIT Press, 1106–1114.

Digital Library

[27]

Victor Lavrenko, Raghavan Manmatha, and Jiwoon Jeon. 2004. A model for learning the semantics of pictures. In Proceedings of the Conference on Advances in Neural Information Processing Systems (NIPS’04). MIT Press, 553–560.

[28]

Zhixin Li, Lingzhi Li, Kaobi Yan, and Canlong Zhang. 2017. Automatic image annotation using fuzzy association rules and decision tree. Multimedia Syst. 23, 6 (2017), 679–690.

Digital Library

[29]

Zhixin Li, Lan Lin, Canlong Zhang, Huifang Ma, and Weizhong Zhao. 2019. Collaborating CNN and SVM for automatic image annotation. In Proceedings of the 9th ACM International Conference on Multimedia Retrieval (ICMR’19). ACM, 63–67.

Digital Library

[30]

Zechao Li, Jing Liu, Changsheng Xu, and Hanqing Lu. 2013. MLRank: Multi-correlation learning to rank for image annotation. Pattern Recogn. 46, 10 (2013), 2700–2710.

Digital Library

[31]

Zhixin Li, Zhiping Shi, Xi Liu, and Zhongzhi Shi. 2011. Modeling continuous visual features for semantic image annotation and retrieval. Pattern Recogn. Lett. 32, 3 (2011), 516–523.

Digital Library

[32]

Zhixin Li, Zhongzhi Shi, Weizhong Zhao, Zhiqing Li, and Zhenjun Tang. 2013. Learning semantic concepts from image database with hybrid generative/discriminative approach. Eng. Appl. Artif. Intell. 26, 9 (2013), 2143–2152.

Digital Library

[33]

Ameesh Makadia, Vladimir Pavlovic, and Sanjiv Kumar. 2010. Baselines for image annotation. Int. J. Comput. Vis. 90, 1 (2010), 88–105.

Digital Library

[34]

Florent Monay and Daniel Gatica-Perez. 2007. Modeling semantic aspects for cross-media image indexing. IEEE Trans. Pattern Anal. Mach. Intell. 29, 10 (2007), 1802–1817.

Digital Library

[35]

Venkatesh N. Murthy, Subhransu Maji, and R. Manmatha. 2015. Automatic image annotation using deep learning representations. In Proceedings of the 5th ACM International Conference on Multimedia Retrieval (ICMR’15). ACM, 603–606.

[36]

Kamal Nigam and Rayid Ghani. 2000. Analyzing the effectiveness and applicability of co-training. In Proceedings of the 9th ACM International Conference on Information and Knowledge Management (CIKM’00). ACM, 86–93.

Digital Library

[37]

Maxime Oquab, Leon Bottou, Ivan Laptev, and Josef Sivic. 2014. Learning and transferring mid-level image representations using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’14). 1717–1724.

Digital Library

[38]

Xiaojun Qi and Yutao Han. 2007. Incorporating multiple SVMs for automatic image annotation. Pattern Recogn. 40, 2 (2007), 728–741.

Digital Library

[39]

Jesse Read, Bernhard Pfahringer, Geoff Holmes, and Eibe Frank. 2011. Classifier chains for multi-label classification. Machine Learn. 85, 3 (2011), 254–269.

Digital Library

[40]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556. Retrieved from https://arxiv.org/abs/1409.1556.

[41]

Chang Tang, Xinwang Liu, Pichao Wang, Changqing Zhang, Miaomiao Li, and Lizhe Wang. 2019. Adaptive hypergraph embedded semi-supervised multi-label image annotation. IEEE Trans. Multimedia 21, 11 (2019), 2837–2849.

[42]

Amara Tariq and Hassan Foroosh. Jan. 2018. Designing a symmetric classifier for image annotation using multi-layer sparse coding. Image Vis. Comput. 69 (Jan. 2018), 33–43.

[43]

Tiberio Uricchio, Lamberto Ballan, Lorenzo Seidenari, and Alberto Del Bimbo. Nov. 2017. Automatic image annotation via label transfer in the semantic space. Pattern Recogn. 71 (Nov. 2017), 144–157.

[44]

Yashaswi Verma. Sept. 2019. Diverse image annotation with missing labels. Pattern Recogn. 93 (Sept. 2019), 470–484.

[45]

Yashaswi Verma and C. V. Jawahar. 2017. Image annotation by propagating labels from semantic neighbourhoods. Int. J. Comput. Vis. 121, 1 (2017), 126–148.

Digital Library

[46]

Luis Von Ahn and Laura Dabbish. 2004. Labeling images with a computer game. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 319–326.

Digital Library

[47]

Fei Wu, Zhuhao Wang, Zhongfei Zhang, Yi Yang, Jiebo Luo, Wenwu Zhu, and Yueting Zhuang. 2015. Weakly semi-supervised deep learning for multi-label image annotation. IEEE Trans. Big Data 1, 3 (2015), 109–122.

[48]

Yuying Xing, Guoxian Yu, Carlotta Domeniconi, Jun Wang, and Zili Zhang. 2018. Multi-label co-training. In Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI’18). 2882–2888.

Digital Library

[49]

Wang Zhan and Minling Zhang. 2017. Inductive semi-supervised multi-label learning with co-training. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD’17). ACM, 1305–1314.

Digital Library

[50]

Dengsheng Zhang, Md Monirul Islam, and Guojun Lu. 2012. A review on automatic image annotation techniques. Pattern Recogn. 45, 1 (2012), 346–362.

Digital Library

[51]

Shaoting Zhang, Junzhou Huang, Yuchi Huang, Yang Yu, Hongsheng Li, and Dimitris N. Metaxas. 2010. Automatic image annotation using group sparsity. In Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’10). IEEE, 3312–3319.

[52]

Mingbo Zhao, Tommy W. S. Chow, Zhao Zhang, and Bing Li. Mar. 2015. Automatic image annotation via compact graph based semi-supervised learning. Knowl.-Based Syst. 76 (Mar. 2015), 148–165.

[53]

Tao Zhou, Zhixin Li, Canlong Zhang, and Huifang Ma. 2020. Classify multi-label images via improved CNN model with adversarial network. Multimedia Tools Appl. 79, 9–10 (2020), 6871–6890.

[54]

Xiaojin Zhu. 2007. Semi-supervised Learning Literature Survey. Technical Report. University of Wisconsin—Madison.

Cited By

Kuang WLi Z(2024)Multi-label image classification with multi-layered multi-perspective dynamic semantic representationMachine Language10.1007/s10994-023-06440-8113:6(3443-3461)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1007/s10994-023-06440-8
Salar AAhmadi A(2024)Improving loss function for deep convolutional neural network applied in automatic image annotationThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-02873-340:3(1617-1629)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1007/s00371-023-02873-3
Oussama AKhaldi BKherfi M(2023)A fast weighted multi-view Bayesian learning scheme with deep learning for text-based image retrieval from unlabeled galleriesMultimedia Tools and Applications10.1007/s11042-022-13788-x82:7(10795-10812)Online publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1007/s11042-022-13788-x
Show More Cited By

Index Terms

A Semi-supervised Learning Approach Based on Adaptive Weighted Fusion for Automatic Image Annotation
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Visual content-based indexing and retrieval
  2. Machine learning

Recommendations

Collaborating CNN and SVM for Automatic Image Annotation
ICMR '19: Proceedings of the 2019 on International Conference on Multimedia Retrieval

To learn a well-performed image annotation model, a large number of labeled samples are usually required. In this paper, we propose a novel semi-supervised approach based on adaptive weighted fusion for automatic image annotation, which can utilize the ...
Inductive Semi-supervised Multi-Label Learning with Co-Training
KDD '17: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

In multi-label learning, each training example is associated with multiple class labels and the task is to learn a mapping from the feature space to the power set of label space. It is generally demanding and time-consuming to obtain labels for training ...
A Novel Region-based Image Annotation Using Multi-instance Learning
WKDD '09: Proceedings of the 2009 Second International Workshop on Knowledge Discovery and Data Mining

In this paper, we formulate image annotation as a semi-supervised learning problem under multi-instance learning framework. A novel graph based semi-supervised learning approach to image annotation using multiple instances is presented, which extends ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 17, Issue 1

February 2021

392 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3453992

Editor:
Alberto Del Bimbo
University of Firenze, Italy

Issue’s Table of Contents

Copyright © 2021 Association for Computing Machinery.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 April 2021

Accepted: 01 September 2020

Revised: 01 July 2020

Received: 01 February 2020

Published in TOMM Volume 17, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

National Natural Science Foundation of China
Guangxi Natural Science Foundation

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
248
Total Downloads

Downloads (Last 12 months)51
Downloads (Last 6 weeks)17

Reflects downloads up to 03 Mar 2025

Other Metrics

View Author Metrics

Citations

Cited By

Kuang WLi Z(2024)Multi-label image classification with multi-layered multi-perspective dynamic semantic representationMachine Language10.1007/s10994-023-06440-8113:6(3443-3461)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1007/s10994-023-06440-8
Salar AAhmadi A(2024)Improving loss function for deep convolutional neural network applied in automatic image annotationThe Visual Computer: International Journal of Computer Graphics10.1007/s00371-023-02873-340:3(1617-1629)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.1007/s00371-023-02873-3
Oussama AKhaldi BKherfi M(2023)A fast weighted multi-view Bayesian learning scheme with deep learning for text-based image retrieval from unlabeled galleriesMultimedia Tools and Applications10.1007/s11042-022-13788-x82:7(10795-10812)Online publication date: 1-Mar-2023
https://dl.acm.org/doi/10.1007/s11042-022-13788-x
Zhu QLi ZKuang WMa H(2023)A multichannel location-aware interaction network for visual classificationApplied Intelligence10.1007/s10489-023-04734-x53:20(23049-23066)Online publication date: 1-Oct-2023
https://dl.acm.org/doi/10.1007/s10489-023-04734-x
Zhu QKuang WLi Z(2023)Fusing bilinear multi-channel gated vector for fine-grained classificationMachine Vision and Applications10.1007/s00138-023-01378-234:2Online publication date: 7-Feb-2023
https://dl.acm.org/doi/10.1007/s00138-023-01378-2
Kuang WZhu QLi Z(2023)Multi-label Image Classification with Multi-scale Global-Local Semantic Graph NetworkMachine Learning and Knowledge Discovery in Databases: Research Track10.1007/978-3-031-43418-1_4(53-69)Online publication date: 18-Sep-2023
https://dl.acm.org/doi/10.1007/978-3-031-43418-1_4
Zhou WXia ZDou PSu THu H(2022)Aligning Image Semantics and Label Concepts for Image Multi-Label ClassificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/355027819:2(1-23)Online publication date: 21-Jul-2022
https://dl.acm.org/doi/10.1145/3550278
Zhou WXia ZDou PSu THu H(2022)Double Attention Based on Graph Attention Network for Image Multi-Label ClassificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/351903019:1(1-23)Online publication date: 12-Mar-2022
https://dl.acm.org/doi/10.1145/3519030

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Figures

Tables

Media

View Issue’s Table of Contents