Abstract
Automated image annotation (AIA) is an important issue in computer vision and pattern recognition, and plays an extremely important role in retrieving large-scale images. In many image annotation approaches, different regions of the image are processed equally, which is inconsistent with the mechanism by which humans understand images. In order to improve the annotation performance of existing AIA approaches, a hybrid AIA approach based on visual attention mechanism (VAM) and the conditional random field (CRF) is proposed. First, since people pay more attention to the salient region of an image during the image recognition process, VAM is implemented for acquiring the salient and non salient regions of the image. Second, support vector machine (SVM) is used to annotate the salient region, and k nearest neighbor (kNN) voting algorithm is used to annotate the non salient regions. Finally, due to the existence of a certain relationship between any two annotation words (also called labels), CRF is calculated to obtain the final label set of each given image. The experimental results confirm that the proposed hybrid AIA approach has ideal annotation performance.






Similar content being viewed by others
References
Abkenar MR, Ahmad MO (2017) Salient region detection using efficient wavelet-based textural feature maps. Multimedia Tools & Applications 14:1–27
Aksac A, Ozyer T, Alhajj R (2017) Complex networks driven salient region detection based on superpixel segmentation. Pattern Recogn 66:268–279
Alham NK, Li M, Liu Y, Yang L (2011) A MapReduce-based distributed SVM algorithm for automatic image annotation. Computers & Mathematics with Applications 62(7):2801–2811
Belaid S, Mellit A (2016) Prediction of daily and mean monthly global solar radiation using support vector machine in an arid climate. Energy Convers Manag 118:105–118
Charte F, Rivera AJ, Jesus MJD et al (2015) MLSMOTE: Approaching imbalanced multilabel learning through synthetic instance generation. Knowl-Based Syst 89:385–397
Chen M, Zheng A, Weinberger KQ (2013) Fast image tagging. International Conference on International Conference on Machine Learning 28:1274–1282
Duygulu P, Barnard K, Freitas JFGD, et al (2002) Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. Computer Vision-ECCV 2002. Springer, 97-112
Fan WT, Bouguila N (2013) Variational learning of a Dirichlet process of generalized Dirichlet distributions for simultaneous clustering and feature selection. Pattern Recogn 46(10):2754–2769
Fareed MMS, Ahmed G et al (2015) Salient region detection through sparse reconstruction and graph-based ranking. J Vis Commun Image Represent 32:144–155
Feng S L, Manmatha R, Lavrenko V (2004) Multiple Bernoulli relevance models for image and video annotation. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR2004), 1002-1009
Ghosh N, Agrawal S, Motwani M (2018) A survey of feature extraction for content-based image retrieval system. International Conference on Recent Advancement on Computer and Communication. Springer, Singapore, pp 305–313
Gu Y, Qian X, Li Q et al (2015) Image annotation by latent community detection and multikernel learning. IEEE Trans Image Process 24(11):3450–3463
Guillaumin M, Mensink T, Verbeek J, et al (2010) TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation. International Conference on Computer Vision. IEEE, 309-316
Guo CL, Zhang LM (2010) A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression. IEEE Trans Image Process 19(1):185–198
However (2018) Saliency aggregation: multifeature and neighbor based salient region detection for social images. Applied Computational Intelligence and Soft Computing 2018:1–16
Jeon J, Lavrenko V, Manmatha R, et al (2003) Automatic image annotation and retrieval using cross-media relevance models. The 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 119-126
Ji P, Gao X, Hu X (2017) Automatic image annotation by combining generative and discriminant models. Neurocomputing 236:48–55
Jia C, Qi C, Li X et al (2016) Saliency detection via a unified generative and discriminative model. Neurocomputing 173:406–417
Jin C (2017) Jin S W. Content-based image retrieval based on shape similarity calculation. 3D Res 8(3):23. https://doi.org/10.1007/s13319-017-0132-0
Jin C, Jin SW (2014) Automatic discovery approach of digital image topic. Applied Mechanics and Materials. Trans Tech Publications 598:382–386
Jin C, Jin SW (2015) Automatic image annotation using feature selection based on improving quantum particle swarm optimization. Signal Process 109:172–181
Jin C, Jin SW (2016) Image distance metric learning based on neighborhood sets for automatic image annotation. J Vis Commun Image Represent 34:167–175
Jin C, Jin SW (2017) A multi-label image annotation scheme based on improved SVM multiple kernel learning. The Eighth International Conference on Graphic and Image Processing. International Society for Optics and Photonics, 1022510-1022510-6
Jin C, Jin SW (2018) Content-based image retrieval model based on cost sensitive learning. Journal of Visual Communication and Image Representation. 55:720–728
Jin C, Liu JA, Guo JL (2015) A hybrid model based on mutual information and support vector machine for automatic image annotation. Artificial Intelligence Perspectives and Applications. Springer, Cham, pp 29–38
Kalayeh MM, Idrees H, Shah M (2014) NMF-KNN: image annotation using weighted multi-view non-negative matrix factorization. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 184-191
Khorsheed AA, Chible H, Giacinto G et al (2014) Automatic de-noising for image annotation using latent semantic analysis. International Journal of Electronics & Communication Engineering & Technology 5(1):113–118
Kuric E, Bielikova M (2015) ANNOR: Efficient image annotation based on combining local and global features. Comput Graph 47:1–15
Lavrenko V, Manmatha R, Jeon J (2003) A model for learning the semantics of pictures. Advances in Neural Information Processing Systems 125-129
Liu L, Cheng L et al (2016) Recognizing complex activities by a probabilistic interval-based model. Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) 30:1266–1272
Liu J, Li M, Liu Q et al (2009) Image annotation via graph learning. Pattern Recogn 42(2):218–228
Liu X, Liu R, Li F, Cao Q (2012) Graph-based dimensionality reduction for KNN-based image annotation. 21st International Conference on Pattern Recognition (ICPR), 1253-1256
Liu Y, Nie L, et al (2015) Action2Activity: recognizing complex activities from sensor data. The Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI2015), 1617-1623
Liu Y, Nie L et al (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115
Liu Y, Zhang L, et al (2016) Fortune teller: predicting your career path. Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16), 201-207
Loog M (2014) Semi-supervised linear discriminant analysis through moment-constraint parameter estimation. Pattern Recogn Lett 37:24–31
Makadia A, Pavlovic V, Kumar S (2008) A new baseline for image annotation. Computer Vision-ECCV, 316-329
Makadia A, Pavlovic V, Kumar S (2010) Baselines for Image Annotation. Int J Comput Vis 90(1):88–105
Mcparlane PJ, Moshfeghi Y, Jose JM (2014) Collections for automatic image annotation and photo tag recommendation. MultiMedia Modeling. Springer, 133-145
Mehmood Z, Mahmood T, Javid MA (2017) Content-based image retrieval and semantic automatic image annotation based on the weighted average of triangular histograms using support vector machine. Appl Intell 48(1):1–16
Moran S, Lavrenko V (2014) Sparse kernel learning for image annotation. International Conference on Multimedia Retrieval. ACM, 113-120
Nakayama H (2011) Linear distance metric learning for large-scale generic image recognition, PhD thesis, The University of Tokyo
Nello C, John ST (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge
Olaode AA, Naghdy G, Todd CA (2015) Unsupervised image classification by probabilistic latent semantic analysis for the annotation of images. International Conference on Digital Image Computing: Techniques and Applications. IEEE, 1-8
Presti LL, Cascia ML (2016) 3D skeleton-based human action classification: a survey. Pattern Recogn 53(C):130–147
Pruteanu MI, Majoros WH, Ohler U (2013) Automated annotation of gene expression image sequences via non-parametric factor analysis and conditional random fields. Bioinformatics 29(13):127–135
Qian Z, Zhong P, Chen J (2016) Integrating global and local visual features with semantic hierarchies for two-level image annotation. Neurocomputing 171:1167–1174
Rasiwasia N, Pereira J C, Coviello E, et al (2010) A new approach to cross-modal multimedia retrieval. International Conference on Multimedia. ACM, 251-260
Shakhnarovich G, Darrell T, Indyk P (2006) Nearest-neighbor methods in learning and vision: theory and practice (Neural information processing). The MIT Press, Cambridge
Snavely N, Seitz SM, Szeliski R (2006) Photo tourism: exploring photo collections in 3D. ACM Transactions on Graphics (TOG) 25(3):835–846
Solomon CJ, Breckon TP (2010) Fundamentals of digital image processing: a practical approach with examples in matlab. Wiley-Blackwell, Hoboken
Sudarshan B, Manjunatha R (2015) Image storage and retrieval in graded memory. International Journal of Advances in Engineering & Technology 8(1):2123–2128
Tran DN, Phan DD (2017) Human activities recognition in android smart phone using support vector machine. International Conference on Intelligent Systems, Modeling and Simulation. IEEE, 64-68
Verma Y, Jawahar CV (2013) Exploring SVM for image annotation in presence of confusing labels. British Machine Vision Conference, Newcastle Upon Tyne, pp 1–25
Verma Y, Jawahar CV (2017) Image annotation by propagating labels from semantic neighborhoods. Int J Comput Vis 121:126–148
Wang C, Yan S, Zhang L et al (2009) Multi-label sparse coding for automatic image annotation. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Hefei, pp 1643–1650
Zhou B, Lapedriza A, Xiao J, et al (2014) Learning deep features for scene recognition using places database. Advances in Neural Information Processing Systems, 487-495
Zhuang YT, Han YH, Wu F et al (2011) Stable multi-label boosting for image annotation with structural feature selection. SCIENCE CHINA Inf Sci 54(12):2508–2521
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Jin, C., Sun, QM. & Jin, SW. A hybrid automatic image annotation approach. Multimed Tools Appl 78, 11815–11834 (2019). https://doi.org/10.1007/s11042-018-6742-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-018-6742-6