Abstract
Image automatic annotation is a significant and challenging problem in pattern recognition and computer vision. Current image annotation models almost used all the training images to estimate joint generation probabilities between images and keywords, which would inevitably bring a lot of irrelevant images. To solve the above problem, we propose a hierarchical image annotation model which combines advantages of discriminative model and generative model. In first annotation layer, discriminative model is used to assign topic annotations to unlabeled images, and then relevant image set corresponding to each unlabeled image is obtained. In second annotation layer, we propose a keywords-oriented method to establish links between images and keywords, and then our iterative algorithm is used to expand relevant image sets. Candidate labels will be given higher weights by using our method based on visual keywords. Finally, generative model is used to assign detailed annotations to unlabeled images on expanded relevant image sets. Experiments conducted on Corel 5K datasets verify the effectiveness of our hierarchical image annotation model.
Similar content being viewed by others
Notes
References
Andriluka M, Roth S, Schiele B (2008) People-tracking-by-detection and people-detection-by-tracking. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Alaska, America, 1–8
Blei D, Ng AY, Jordan M (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
Chang CC, Lin CJ (2010) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Dollar P, Wojek C, Schiele B, Perona P (2009) Pedestrian detection: a benchmark. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, Florida, America, 304–311
Duygulu P, Barnard K, Freitas J, Forsyth D (2002) Object recognition as machine translation: learning a lexicon for a fixed image vocabulary, Proceedings of the 7th European Conference on Computer Vision, Copenhagen, Denmark, pp. 97–112
Feng SL, Manmatha R, Lavrenko V (2004) Multiple Bernoulli relevance models for image and video annotation, Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Washington D.C., USA, 1002–1009
Gustavo C, Antoni BC, Pedro JM, Nuno V (2007) Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans Pattern Anal Mach Intell 29(3):394–410
Jeon J, Lavrenko V, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models, Proceedings of the 26th Annual International ACM SIGIR, Toronto, Canada, pp. 119–126
Jim M, David GL (2008) Object class recognition and localization using sparse features with limited receptive fields. Int J Comput Vis 80(1):45–57
Kalanit G, Rory S (2009) Object recognition: insights from advances in fMRI methods. Curr Dir Psychol Sci 17(2):73–79
Kang F, Jin R, Sukthankar R (2006) Correlated label propagation with application to multi-label learning. Proceedings of the 2006 IEEE Computer Society conference on Computer Vision and Pattern Recognition, New York, USA, 1719–1726
Lavrenko V, Manmatha R, Jeon J (2003) A model for learning the semantics of pictures, Proceedings of Advance in Neutral Information Processing, Vancouver/Whistler, Canada
Liu J, Li MJ, Ma WY, Liu QS, Lu HQ (2006) An adaptive graph model for automatic image annotation, Proceedings of the ACM SIGMM Workshop on Multimedia Information Retrieval, Santa Barbara, USA, 61–69
Sabine B, Salvatore T (2009) Modeling, classifying and annotating weakly annotated images using bayesian network. Proceedings of the 2009 10th International Conference on Document Analysis and Recognition, Barcelona, 1201–1205
Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905
Stefanie L, Roland M, Robert S, Viktoria P, Georg T (2009) Automatic image annotation using visual content and folksonomies. Multimedia Tools and Applications 42:97–113
Xiaojun Q, Yutao H (2007) Incorporating multiple SVMs for automatic image annotation. Pattern Recognit 40:728–741
Yong W, Tao M, Shaogang G, Xian-Sheng H (2009) Combining global, regional and contextual features for automatic image annotation. Pattern recogn 42(2):259–266
Yufeng Z, Yao Z, Zhenfeng Z (2009) TSVM-HMM: transductive SVM based hidden Markov model for automatic image annotation. Expert Syst Appl 36:9813–9818
Acknowledgment
This work is partially supported by National Natural Science Foundation of China (No. 60873179, No. 60803078), Research Fund for the Doctoral Program of Higher Education of China (No. 20090121110032) and Shenzhen Municipal Science and Technology Planning Program for Basic Research of China (No. JC200903180630A). The authors also gratefully acknowledge the helpful comments and suggestions of the reviewers, which have improved the paper.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ke, X., Li, S. & Cao, D. A two-level model for automatic image annotation. Multimed Tools Appl 61, 195–212 (2012). https://doi.org/10.1007/s11042-010-0706-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-010-0706-9