Skip to main content
Log in

A sparse kernel relevance model for automatic image annotation

  • Regular Paper
  • Published:
International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Abstract

In this paper, we introduce a new form of the continuous relevance model (CRM), dubbed the SKL-CRM, that adaptively selects the best performing kernel per feature type for automatic image annotation. Previous image annotation models apply a standard selection of kernels to model the distribution of image features. Popular examples include a Gaussian kernel for modelling GIST features or a Laplacian kernel for global colour histograms. In this work, we demonstrate that this standard assignment of kernels to feature types is sub-optimal and a substantially higher image annotation accuracy can be attained by adapting the kernel-feature assignment. We formulate an efficient greedy algorithm to find the best kernel-feature alignment and show that it is able to rapidly find a sparse subset of features that maximises annotation \(F_{1}\) score. In a second contribution, we introduce two data-adaptive kernels for image annotation—the generalised Gaussian and multinomial kernels—which we demonstrate can better model the distribution of image features as compared to standard kernels. Evaluation is conducted on three standard image datasets across a selection of different feature representations. The proposed SKL-CRM model is found to attain performance that is competitive to a suite of state-of-the-art image annotation models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. Users are known to find it particularly difficult to represent their image needs via abstract image features [23].

  2. In preliminary experiments, we also found that z-score normalisation has a similar effect, but for simplicity we report the max–min normalisation results in this paper.

  3. We use Minkowski kernel and generalised Gaussian interchangeably to refer to the same kernel in this work.

  4. Features computed in a spatial arrangement are denoted with a V3H1 suffix in this paper.

References

  1. von Ahn L, Dabbish L (2005) Esp: labeling images with a computer game. In: AAAI spring symposium: knowledge cfrom volunteer contributors, pp 91–98

  2. Ames M, Naaman M (2007) Why we tag: motivations for annotation in mobile and online media. In: Proceedings of the SIGCHI conference on human factors in computing systems, CHI ’07ACM, New York, NY, USA, pp 971–980

  3. Arandjelovic R, Zisserman A (2012) Three things everyone should know to improve object retrieval. In: CVPR. IEEE, New York, pp 2911–2918

  4. Barnard K, Duygulu P, Forsyth D, de Freitas N, Blei DM, Jordan MI (2003) Matching words and pictures. J Mach Learn Res 3:1107–1135

    MATH  Google Scholar 

  5. Blei DM, Jordan MI (2003) Modeling annotated data. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval, SIGIR ’03ACM, New York, NY, USA, pp 127–134

  6. Carneiro G, Chan AB, Moreno PJ, Vasconcelos N (2007) Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans Pattern Anal Mach Intell 29(3):394–410

    Article  Google Scholar 

  7. Chapelle O, Haffner P, Vapnik VN (1999) Support vector machines for histogram-based image classification. Trans Neural Netw 10(5):1055–1064

    Article  Google Scholar 

  8. Chen M, Zheng A, Weinberger KQ (2013) Fast image tagging. In: Dasgupta S, Mcallester D (eds) Proceedings of the 30th international conference on machine learning (ICML-13), vol 28, pp 1274–1282. JMLR workshop and conference proceedings

  9. Cooper WS (1995) Some inconsistencies and misidentified modeling assumptions in probabilistic information retrieval. ACM Trans Inf Syst 13(1):100–111

    Article  Google Scholar 

  10. Cusano C, Ciocca G, Schettini R (2003) Image annotation using SVM. In: Santini S, Schettini R (eds) Internet imaging V, society of photo-optical instrumentation engineers (SPIE) conference Series, vol 5304, pp 330–338

  11. Duygulu P, Barnard K, de Freitas JFG, Forsyth DA (2002) Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: Proceedings of the 7th European conference on computer vision-part IV, ECCV ’02. Springer, London, pp 97–112

  12. Enser P, Sandom C, Lewis P (2005) Automatic annotation of images from the practitioner perspective. In: Image and video retrieval, pp 497–506

  13. Feng SL, Manmatha R, Lavrenko V (2004) Multiple bernoulli relevance models for image and video annotation. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, CVPR’04. IEEE Computer Society, Washington, DC, pp 1002–1009

  14. Fu H, Zhang Q, Qiu G (2012) Random forest for image annotation. In: Proceedings of the 12th European conference on computer vision, , ECCV’12, vol Part VI. Springer, Berlin, pp 86–99

  15. Grangier D, Bengio S (2008) A discriminative kernel-based approach to rank images from text queries. IEEE Trans Pattern Anal Mach Intell 30(8):1371–1384. doi:10.1109/TPAMI.2007.70791

  16. Grubinger M (2007) Analysis and evaluation of visual information systems performance. PhD thesis, School of Computer Science and Mathematics, Faculty of Health, Engineering and Science, Victoria University, Melbourne, Australia

  17. Guillaumin M, Mensink T, Verbeek J, Schmid C (2009) Tagprop: discriminative metric learning in nearest neighbor models for image auto-annotation. In: International conference on computer vision, pp 309–316

  18. Hentschel C, Stober S, Nrnberger A, Detyniecki M (2007) Automatic image annotation using a visual dictionary based on reliable image segmentation. In: Adaptive multimedia retrieval. Lecture Notes in Computer Science, vol 4918. Springer, Berlin, pp 45–56

  19. Howarth P, Rüger S (2005) Fractional distance measures for content-based image retrieval. In: Proceedings of the 27th European conference on advances in information retrieval research, ECIR’05. Springer, Berlin, pp 447–456

  20. Huang J, Kumar SR, Zabih R (1998) An automatic hierarchical image classification scheme. In: Proceedings of the Sixth ACM international conference on multimedia, MULTIMEDIA ’98. ACM, New York, pp 219–228

  21. Indyk P, Motwani R (1998) Approximate nearest neighbors: Towards removing the curse of dimensionality. In: Proceedings of the thirtieth annual ACM symposium on theory of computing, STOC ’98. ACM, New York, pp 604–613

  22. Jeon J, Lavrenko V, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in Information retrieval, SIGIR ’03. ACM, New York, pp 119–126

  23. Jeon J, Manmatha R (2004) Using maximum entropy for automatic image annotation. In: CIVR. Lecture Notes in Computer Science, vol 3115. Springer, Berlin, pp. 24–32

  24. Joachims T (2002) Optimizing search engines using clickthrough data. In: Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining, KDD ’02. ACM, New York, pp 133–142

  25. Lavrenko V, Feng S, Manmatha R (2004) Statistical models for automatic video annotation and retrieval. ICASSP 3:1044–1047

    Google Scholar 

  26. Lavrenko V, Manmatha R, Jeon J (2003) A model for learning the semantics of pictures. NIPS

  27. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the 2006 IEEE computer society conference on computer vision and pattern recognition, CVPR ’06, vol 2. IEEE Computer Society, Washington, DC, pp 2169–2178

  28. Liu J, Li M, Liu Q, Lu H, Ma S (2009) Image annotation via graph learning. Pattern Recognit 42(2):218–228

    Article  MATH  Google Scholar 

  29. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  30. Makadia A, Pavlovic V, Kumar S (2008) A new baseline for image annotation. In: Proceedings of the 10th European conference on computer vision: part III, ECCV ’08. Springer, Berlin, pp 316–329

  31. Markkula M, Sormunen E (2000) End-user searching challenges indexing practices in the digital newspaper photo archive. Inf Retr 1(4):259–285

    Article  MATH  Google Scholar 

  32. Metzler D, Manmatha R (2004) An inference network approach to image retrieval. In: Proceedings of the international conference on image and video retrieval. Springer, Berlin, pp 42–50.

  33. Mittelman R, Lee H, Kuipers B, Savarese S (2013) Weakly supervised learning of mid-level features with beta-bernoulli process restricted boltzmann machines. In: Proceedings of the 2013 IEEE conference on computer vision and pattern recognition, CVPR ’13. IEEE Computer Society, Washington, DC, pp 476–483

  34. Moran S, Lavrenko V (2011) Optimal tag sets for automatic image annotation. In: Proceedings of the British machine vision conference. BMVA Press, London, pp 1.1–1.11

  35. Moran S, Lavrenko V (2014) Sparse kernel learning for image annotation. In: Proceedings of international conference on multimedia retrieval, ICMR ’14. ACM, New York, pp 113:113–113:120

  36. Moran S, Lavrenko V, Osborne M (2013) Variable bit quantisation for lsh. In: Proceedings of the 51st annual meeting of the association for computational linguistics (vol 2: short papers). Association for Computational Linguistics, Sofia, pp. 753–758

  37. Mori Y, Takahashi H, Oka R (1999) Image-to-word transformation based on dividing and vector quantizing images with words. In: MISRM’99 first international workshop on multimedia intelligent storage and retrieval management

  38. Nakayama H (2011) Linear distance metric learning for large-scale generic image recognition. PhD thesis, The University of Tokyo, Japan

  39. Oliva A, Schyns P (2000) Diagnostic colors mediate scene recognition. Cogn Psychol 41(2):176–210

    Article  Google Scholar 

  40. Oliva A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3):145–175

  41. Richtárik P, Takác M (2013) Distributed coordinate descent method for learning with big data. In: CoRR’13

  42. Shi J, Malik J (2000) Normalized cuts and image segmentation. IEEE Trans Pattern Anal Mach Intell 22(8):888–905

    Article  Google Scholar 

  43. Smucker MD, Allan J, Carterette B (2007) A comparison of statistical significance tests for information retrieval evaluation. In: Proceedings of the sixteenth ACM conference on information and knowledge management, CIKM ’07. ACM, New York, pp 623–632

  44. Ulz MH, Moran SJ (2013) Optimal kernel shape and bandwidth for atomistic support of continuum stress. Model Simul Mater Sci Eng 21(8):085, 017

  45. Verma Y, Jawahar CV (2012) Image annotation using metric learning in semantic neighbourhoods. In: Proceedings of the 12th European conference on computer vision, ECCV’12, vol Part III. Springer, Berlin, pp 836–849

  46. Wang B, Li ZW, Yu N, Li M (2007) Image annotation in a progressive way. In: Proceedings of ICME, pp 811–814

  47. van de Weijer J, Schmid C (2006) Coloring local feature extraction. In: Proceedings of the 9th European conference on computer vision, ECCV’06, vol Part II. Springer, Berlin, pp 334–348

  48. Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244

    MATH  Google Scholar 

  49. Weston J, Bengio S, Usunier N (2010) Large scale image annotation: learning to rank with joint word-image embeddings. Mach Learn 81(1):21–35

    Article  MathSciNet  Google Scholar 

  50. Xiang Y, Zhou X, Unviersity F, seng Chua T, wah Ngo C (2009) A revisit of generative model for automatic image annotation using markov random fields. In: Proceedings of IEEE computer vision and pattern recognition, pp 1153–1160

  51. Yakhnenko O, Honavar V (2008) Annotating images and image objects using a hierarchical dirichlet process model. In: Proceedings of the 9th international workshop on multimedia data mining: held in conjunction with the ACM SIGKDD 2008, MDM ’08. ACM, New York, pp 1–7

  52. Yashaswi Verma CJ (2013)Exploring svm for image annotation in presence of confusing labels. In: Proceedings of the British machine vision conference. BMVA Press, London

  53. Yavlinsky A, Schofield E, Rüger S (2005) Automated image annotation using global features and robust nonparametric density estimation. In: Proceedings of the 4th international conference on image and video retrieval, CIVR’05. Springer, Berlin, pp 507–517

  54. Zhang S, Huang J, Huang Y, Yu Y, Li H, Metaxas DN (2010) Automatic image annotation using group sparsity. In: CVPR. IEEE, New York, pp 3312–3319

  55. Zhu S, Liu Y (2008) Image annotation refinement using semantic similarity correlation. In: ICPR’08

Download references

Acknowledgments

We thank the anonymous reviewer for their helpful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sean Moran.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Moran, S., Lavrenko, V. A sparse kernel relevance model for automatic image annotation. Int J Multimed Info Retr 3, 209–229 (2014). https://doi.org/10.1007/s13735-014-0063-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13735-014-0063-y

Keywords

Navigation