Skip to main content
Log in

A hybrid automatic image annotation approach

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Automated image annotation (AIA) is an important issue in computer vision and pattern recognition, and plays an extremely important role in retrieving large-scale images. In many image annotation approaches, different regions of the image are processed equally, which is inconsistent with the mechanism by which humans understand images. In order to improve the annotation performance of existing AIA approaches, a hybrid AIA approach based on visual attention mechanism (VAM) and the conditional random field (CRF) is proposed. First, since people pay more attention to the salient region of an image during the image recognition process, VAM is implemented for acquiring the salient and non salient regions of the image. Second, support vector machine (SVM) is used to annotate the salient region, and k nearest neighbor (kNN) voting algorithm is used to annotate the non salient regions. Finally, due to the existence of a certain relationship between any two annotation words (also called labels), CRF is calculated to obtain the final label set of each given image. The experimental results confirm that the proposed hybrid AIA approach has ideal annotation performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Abkenar MR, Ahmad MO (2017) Salient region detection using efficient wavelet-based textural feature maps. Multimedia Tools & Applications 14:1–27

    Google Scholar 

  2. Aksac A, Ozyer T, Alhajj R (2017) Complex networks driven salient region detection based on superpixel segmentation. Pattern Recogn 66:268–279

    Article  Google Scholar 

  3. Alham NK, Li M, Liu Y, Yang L (2011) A MapReduce-based distributed SVM algorithm for automatic image annotation. Computers & Mathematics with Applications 62(7):2801–2811

    Article  MATH  Google Scholar 

  4. Belaid S, Mellit A (2016) Prediction of daily and mean monthly global solar radiation using support vector machine in an arid climate. Energy Convers Manag 118:105–118

    Article  Google Scholar 

  5. Charte F, Rivera AJ, Jesus MJD et al (2015) MLSMOTE: Approaching imbalanced multilabel learning through synthetic instance generation. Knowl-Based Syst 89:385–397

    Article  Google Scholar 

  6. Chen M, Zheng A, Weinberger KQ (2013) Fast image tagging. International Conference on International Conference on Machine Learning 28:1274–1282

    Google Scholar 

  7. Duygulu P, Barnard K, Freitas JFGD, et al (2002) Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. Computer Vision-ECCV 2002. Springer, 97-112

  8. Fan WT, Bouguila N (2013) Variational learning of a Dirichlet process of generalized Dirichlet distributions for simultaneous clustering and feature selection. Pattern Recogn 46(10):2754–2769

    Article  MATH  Google Scholar 

  9. Fareed MMS, Ahmed G et al (2015) Salient region detection through sparse reconstruction and graph-based ranking. J Vis Commun Image Represent 32:144–155

    Article  Google Scholar 

  10. Feng S L, Manmatha R, Lavrenko V (2004) Multiple Bernoulli relevance models for image and video annotation. IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR2004), 1002-1009

  11. Ghosh N, Agrawal S, Motwani M (2018) A survey of feature extraction for content-based image retrieval system. International Conference on Recent Advancement on Computer and Communication. Springer, Singapore, pp 305–313

    Google Scholar 

  12. Gu Y, Qian X, Li Q et al (2015) Image annotation by latent community detection and multikernel learning. IEEE Trans Image Process 24(11):3450–3463

    Article  MathSciNet  MATH  Google Scholar 

  13. Guillaumin M, Mensink T, Verbeek J, et al (2010) TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation. International Conference on Computer Vision. IEEE, 309-316

  14. Guo CL, Zhang LM (2010) A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression. IEEE Trans Image Process 19(1):185–198

    Article  MathSciNet  MATH  Google Scholar 

  15. However (2018) Saliency aggregation: multifeature and neighbor based salient region detection for social images. Applied Computational Intelligence and Soft Computing 2018:1–16

    Google Scholar 

  16. Jeon J, Lavrenko V, Manmatha R, et al (2003) Automatic image annotation and retrieval using cross-media relevance models. The 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 119-126

  17. Ji P, Gao X, Hu X (2017) Automatic image annotation by combining generative and discriminant models. Neurocomputing 236:48–55

    Article  Google Scholar 

  18. Jia C, Qi C, Li X et al (2016) Saliency detection via a unified generative and discriminative model. Neurocomputing 173:406–417

    Article  Google Scholar 

  19. Jin C (2017) Jin S W. Content-based image retrieval based on shape similarity calculation. 3D Res 8(3):23. https://doi.org/10.1007/s13319-017-0132-0

    Article  Google Scholar 

  20. Jin C, Jin SW (2014) Automatic discovery approach of digital image topic. Applied Mechanics and Materials. Trans Tech Publications 598:382–386

    Google Scholar 

  21. Jin C, Jin SW (2015) Automatic image annotation using feature selection based on improving quantum particle swarm optimization. Signal Process 109:172–181

    Article  Google Scholar 

  22. Jin C, Jin SW (2016) Image distance metric learning based on neighborhood sets for automatic image annotation. J Vis Commun Image Represent 34:167–175

    Article  Google Scholar 

  23. Jin C, Jin SW (2017) A multi-label image annotation scheme based on improved SVM multiple kernel learning. The Eighth International Conference on Graphic and Image Processing. International Society for Optics and Photonics, 1022510-1022510-6

  24. Jin C, Jin SW (2018) Content-based image retrieval model based on cost sensitive learning. Journal of Visual Communication and Image Representation. 55:720–728

    Article  Google Scholar 

  25. Jin C, Liu JA, Guo JL (2015) A hybrid model based on mutual information and support vector machine for automatic image annotation. Artificial Intelligence Perspectives and Applications. Springer, Cham, pp 29–38

    Google Scholar 

  26. Kalayeh MM, Idrees H, Shah M (2014) NMF-KNN: image annotation using weighted multi-view non-negative matrix factorization. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 184-191

  27. Khorsheed AA, Chible H, Giacinto G et al (2014) Automatic de-noising for image annotation using latent semantic analysis. International Journal of Electronics & Communication Engineering & Technology 5(1):113–118

    Google Scholar 

  28. Kuric E, Bielikova M (2015) ANNOR: Efficient image annotation based on combining local and global features. Comput Graph 47:1–15

    Article  Google Scholar 

  29. Lavrenko V, Manmatha R, Jeon J (2003) A model for learning the semantics of pictures. Advances in Neural Information Processing Systems 125-129

  30. Liu L, Cheng L et al (2016) Recognizing complex activities by a probabilistic interval-based model. Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16) 30:1266–1272

    Google Scholar 

  31. Liu J, Li M, Liu Q et al (2009) Image annotation via graph learning. Pattern Recogn 42(2):218–228

    Article  MathSciNet  MATH  Google Scholar 

  32. Liu X, Liu R, Li F, Cao Q (2012) Graph-based dimensionality reduction for KNN-based image annotation. 21st International Conference on Pattern Recognition (ICPR), 1253-1256

  33. Liu Y, Nie L, et al (2015) Action2Activity: recognizing complex activities from sensor data. The Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI2015), 1617-1623

  34. Liu Y, Nie L et al (2016) From action to activity: sensor-based activity recognition. Neurocomputing 181:108–115

    Article  Google Scholar 

  35. Liu Y, Zhang L, et al (2016) Fortune teller: predicting your career path. Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16), 201-207

  36. Loog M (2014) Semi-supervised linear discriminant analysis through moment-constraint parameter estimation. Pattern Recogn Lett 37:24–31

    Article  Google Scholar 

  37. Makadia A, Pavlovic V, Kumar S (2008) A new baseline for image annotation. Computer Vision-ECCV, 316-329

  38. Makadia A, Pavlovic V, Kumar S (2010) Baselines for Image Annotation. Int J Comput Vis 90(1):88–105

    Article  Google Scholar 

  39. Mcparlane PJ, Moshfeghi Y, Jose JM (2014) Collections for automatic image annotation and photo tag recommendation. MultiMedia Modeling. Springer, 133-145

  40. Mehmood Z, Mahmood T, Javid MA (2017) Content-based image retrieval and semantic automatic image annotation based on the weighted average of triangular histograms using support vector machine. Appl Intell 48(1):1–16

    Google Scholar 

  41. Moran S, Lavrenko V (2014) Sparse kernel learning for image annotation. International Conference on Multimedia Retrieval. ACM, 113-120

  42. Nakayama H (2011) Linear distance metric learning for large-scale generic image recognition, PhD thesis, The University of Tokyo

  43. Nello C, John ST (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  44. Olaode AA, Naghdy G, Todd CA (2015) Unsupervised image classification by probabilistic latent semantic analysis for the annotation of images. International Conference on Digital Image Computing: Techniques and Applications. IEEE, 1-8

  45. Presti LL, Cascia ML (2016) 3D skeleton-based human action classification: a survey. Pattern Recogn 53(C):130–147

    Article  Google Scholar 

  46. Pruteanu MI, Majoros WH, Ohler U (2013) Automated annotation of gene expression image sequences via non-parametric factor analysis and conditional random fields. Bioinformatics 29(13):127–135

    Google Scholar 

  47. Qian Z, Zhong P, Chen J (2016) Integrating global and local visual features with semantic hierarchies for two-level image annotation. Neurocomputing 171:1167–1174

    Article  Google Scholar 

  48. Rasiwasia N, Pereira J C, Coviello E, et al (2010) A new approach to cross-modal multimedia retrieval. International Conference on Multimedia. ACM, 251-260

  49. Shakhnarovich G, Darrell T, Indyk P (2006) Nearest-neighbor methods in learning and vision: theory and practice (Neural information processing). The MIT Press, Cambridge

    Book  Google Scholar 

  50. Snavely N, Seitz SM, Szeliski R (2006) Photo tourism: exploring photo collections in 3D. ACM Transactions on Graphics (TOG) 25(3):835–846

    Article  Google Scholar 

  51. Solomon CJ, Breckon TP (2010) Fundamentals of digital image processing: a practical approach with examples in matlab. Wiley-Blackwell, Hoboken

    Book  Google Scholar 

  52. Sudarshan B, Manjunatha R (2015) Image storage and retrieval in graded memory. International Journal of Advances in Engineering & Technology 8(1):2123–2128

    Google Scholar 

  53. Tran DN, Phan DD (2017) Human activities recognition in android smart phone using support vector machine. International Conference on Intelligent Systems, Modeling and Simulation. IEEE, 64-68

  54. Verma Y, Jawahar CV (2013) Exploring SVM for image annotation in presence of confusing labels. British Machine Vision Conference, Newcastle Upon Tyne, pp 1–25

    Google Scholar 

  55. Verma Y, Jawahar CV (2017) Image annotation by propagating labels from semantic neighborhoods. Int J Comput Vis 121:126–148

    Article  Google Scholar 

  56. Wang C, Yan S, Zhang L et al (2009) Multi-label sparse coding for automatic image annotation. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Hefei, pp 1643–1650

    Google Scholar 

  57. Zhou B, Lapedriza A, Xiao J, et al (2014) Learning deep features for scene recognition using places database. Advances in Neural Information Processing Systems, 487-495

  58. Zhuang YT, Han YH, Wu F et al (2011) Stable multi-label boosting for image annotation with structural feature selection. SCIENCE CHINA Inf Sci 54(12):2508–2521

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cong Jin.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jin, C., Sun, QM. & Jin, SW. A hybrid automatic image annotation approach. Multimed Tools Appl 78, 11815–11834 (2019). https://doi.org/10.1007/s11042-018-6742-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-018-6742-6

Keywords

Navigation