Skip to main content
Log in

ImageCLEF annotation with explicit context-aware kernel maps

  • Regular Paper
  • Published:
International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Abstract

The general recipe of kernel methods, such as support vector machines (SVMs), includes a preliminary step of hand-crafting or designing similarity kernels. This process, which has been extensively studied during the last decade, has proven to be relatively successful in solving many pattern recognition problems including image annotation. However, many proposed solutions for kernel design, consider similarity between data by taking into account only their content and without context. In this paper, we propose an alternative that upgrades and further enhances usual kernels by making them context-aware. The method is based on the optimization of an objective function mixing content, regularization and also context. We will show that the underlying kernel solution converges to a positive semi-definite fixed-point, which can also be expressed as a dot product involving “explicit” kernel maps. When plugging these explicit context-aware kernel maps into support vector machines, performances substantially improve and outperform competitors for the hard task of image annotation using a recent ImageCLEF annotation benchmark.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. This also happens in the ImageCLEF database, used in our experiments, where “image–image” links correspond to other kinds of relationships rather than visual continuity.

  2. See for instance the periodic and the challenging ImageCLEF benchmark [59, 82].

  3. Such as stochastic gradient descent [10]. When the kernel map is explicit, the complexity of this method is linear in the size of the training data instead of quadratic (as in [67, 68]).

  4. It is important to emphasize that used tags are different from the concepts (classes) used for training and annotation. Indeed, each image belongs to one or multiple concepts which are different from these tags (see experiments).

  5. This can be achieved by clustering an interconnected image database into disconnected groups, such as communities in social networks.

  6. see http://www.imageclef.org/2013/photo, for information about this challenge and its official results.

  7. Again, this includes tags different from classes/concepts.

  8. http://imageclef.org/2013/photo/annotation/results.

References

  1. Bahlmann C, Haasdonk B, Burkhardt H (2002) On-line handwriting recognition with support vector machines, a kernel approach. In: Proceedings of IWFHR, pp 49–54

  2. Barnard K, Duygululu P, Forsyth D, Blei D, Jordan M (2003) Matching words and pictures. J Mach Learn Res 3:1107–1135

  3. Belkin M, Niyogi P (2004) Semi-supervised learning on manifolds. Mach Learn 56:209–239

  4. Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comp 15(6):1373–1396

    Article  MATH  Google Scholar 

  5. Belkin M, Niyogi P (2006) Manifold regularization: a geometric framework for learning from examples. J Mach Learn Res 7:2399–2434

  6. Benavent X, Castellanos A, de Ves E, Hernández-Aranda D, Granados R, Garcia-Serrano A (2013) A multimedia ir-based system for the photo annotation task at imageclef2013. In: Proceedings of CLEF 2013 evaluation labs and workshop, online working notes. Valencia, Spain, 23–26 Sept 2013

  7. Bertelli L, Yu T, Vu D, Gokturk B (2011) Kernelized structural svm learning for supervised object segmentation. In: Proceedings of computer vision and pattern recognition (CVPR), IEEE Conference, IEEE, pp 2153–2160

  8. Blei DM, Jordan MI (2003) Modeling annotated data. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval, SIGIR ’03, ACM, New York, pp 127–134

  9. Borgne H, Popescua A, Znaidia A (2013) Cea list@imageclef 2013: scalable concept image annotation. In: Proceedings of CLEF 2013 evaluation labs andworkshop, online working notes. Valencia, Spain, 23–26 Sept 2013

  10. Bottou L (2010) Large scale machine learning with stochastic gradient descent. In: Proceedings of the 19th international conference on computational statistics, pp 177–187

  11. Boughorbel S, Tarel J, Boujemaa N (2005) The intermediate matching kernel for image local features. In: Proceedings of IEEE international joint conference on neural networks, vol 2, pp 889–894

  12. Boykov Y, Veksler O, Zabih R (2001) Fast approximate energy minimization via graph cuts. Pattern Anal Mach Intell IEEE Trans 23(11):1222–1239

    Article  Google Scholar 

  13. Cao L, Luo J, Huang T (2008) Annotating photo collection by label propagation according to multiple similarity cues. ACM Multimedia

  14. Carneiro G, Chan AB, Moreno PJ, Vasconcelos N (2007) Supervised learning of semantic classes for image annotation and retrieval. Pattern Anal Mach Intell IEEE Trans 29(3):394–410

    Article  Google Scholar 

  15. Carson C, Thomas M, Belongie S, Hellerstein JM, Malik J (1999) Blobworld: a system for region-based image indexing and retrieval. In: Proceedings of third international conference on visual information systems, pp 509–516

  16. Chang E, Goh K, Sychay G, Wu G (2003) Cbsa: content-based soft annotation for multimodal image retrieval using bayes point machines. Circuits Syst Video Technol IEEE Trans 13(1):26–38

    Article  Google Scholar 

  17. Cusano C, Ciocca G, Schettini R (2003) Image annotation using svm. In: Proceedings of electronic imaging 2004, International Society for Optics and Photonics, pp 330–338

  18. Davis M, King S, Good N, Sarvas R (2004) From context to content: leveraging context to infer media metadata. In: Proceedings of 12th annual ACM international conference on multimedia, MM 2004, Brave new topics session on from context to content: leveraging contextual metadata to infer multimedia Content, ACM Press, New York, pp 188–195

  19. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings of computer vision and pattern recognition, CVPR 2009. IEEE Conference, IEEE, pp 248–255

  20. Duygulu P, Barnard K, deFreitas J, Forsyth D (2002) Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: Heyden A, Sparr G, Nielsen M, Johansen P (eds) ECCV 2002, LNCS, vol 2353. Springer, Heidelberg, pp 97–112

  21. Feng S, Manmatha R, Lavrenko V (2004) Multiple bernoulli relevance models for image and video annotation. In: Proceedings of ICCV, pp 1002–1009

  22. Gallagher A, Neustaedter C, Cao L, Luo J, Chen T (2008) Image annotation using personal calendars as context. ACM Multimedia

  23. Gao Y, Fan J, Xue X, Jain R (2006) Automatic image annotation by incorporating feature hierarchy and boosting to scale up svm classifiers. In: Proceedings of ACM Multimedia

  24. Gartner T (2003) A survey of kernels for structured data. Multi Relat Data Min 5(1):49–58

    MathSciNet  Google Scholar 

  25. Gómez-Chova L, Camps-Valls G, Munoz-Mari J, Calpe J (2008) Semisupervised image classification with laplacian support vector machines. Geosci Remote Sens Lett IEEE 5(3):336–340

    Article  Google Scholar 

  26. Grana C, Serra G, Manfredi M, Cucchiara R, Martoglia R, Mandreoli F (2013) Unimore at imageclef 2013: scalable concept image annotation. In: Proceedings of CLEF 2013 evaluation labs and workshop, online working notes, Valencia, Spain, 23–26 Sept 2013

  27. Grangier D, Bengio S (2008) A discriminative kernel-based approach to rank images from text queries. Pattern Anal Mach Intell IEEE Trans 30(8):1371–1384

    Article  Google Scholar 

  28. Grauman K, Darrell T (2007) The pyramid match kernel: efficient learning with sets of features. J Mach Learn Res (JMLR) 8:725–760

    MATH  Google Scholar 

  29. Guillaumin M, Mensink T, Verbeek J, Schmid C (2009) Tagprop: discriminative metric learning in nearest neighbor models for image auto-annotation. In: Proceedings of computer vision, IEEE 12th international conference, IEEE, pp 309–316

  30. Gupta M, Li R, Yin Z, Han J (2010) Survey on social tagging techniques. SIGKDD Explor 12(1):58–72

    Article  Google Scholar 

  31. Hanjalic A (2012) A new gap to bridge: where to go next in social media retrieval? In: Schoeffmann K, Mérialdo B, Hauptmann AG, Ngo C-W, Andreopoulos Y, Breiteneder C (eds) Advances in Multimedia Modeling, 18th International Conference, MMM 2012. Lecture notes in Computer Science, vol 7131. Springer, Heidelberg

  32. He X, Zemel RS, Carreira-Perpindn MA (2004) Multiscale conditional random fields for image labeling. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, CVPR 2004, vol 2. IEEE, pp 695–702

  33. Hidaka M, Gunji N, Harada T (2013) Mil at imageclef 2013: scalable system for image annotation. In: Proceedings of CLEF 2013 evaluation labs and workshop, online working notes. Valencia, Spain, 23–26 Sept 2013

  34. Hironobu YM, Takahashi H, Oka R (1999) Image-to-word transformation based on dividing and vector quantizing images with words. In: Proceedings of Boltzmann machines, neural networks, pp 405–409

  35. Jeon J, Lavrenko V, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of ACM SIGIR, pp 119–126

  36. Jin R, Chai JY, Si L (2004) Effective automatic image annotation via a coherent language model and active learning. In: Proceedings of the 12th annual ACM international conference on Multimedia, ACM, pp 892–899

  37. Jin Y, Khan L, Wang L, Awad M (2005) Image annotations by combining multiple evidence and wordnet. In: Proceedings of ACM Multimedia, pp 706–715

  38. Kang F, Jin R, Sukthankar R (2006) Correlated label propagation with application to multi-label learning. In: Proceedings of computer vision and pattern recognition, IEEE Computer Society Conference, vol 2. IEEE, pp 1719–1726

  39. Kondor R, Jebara T (2003) A kernel between sets of vectors. In: Proceedings of the 20th international conference on machine learning

  40. Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1106–1114

    Google Scholar 

  41. Lavrenko V, Manmatha R, Jeon J (2004) A model for learning the semantics of pictures. In: Proceedings of NIPS

  42. Li J, Wang JZ (2003) Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Trans PAMI 25(9):1075–1088

    Article  Google Scholar 

  43. Li J, Wang JZ (2008) Real-time computerized annotation of pictures. Pattern Anal Mach Intell IEEE Trans 30(6):985–1002

    Article  Google Scholar 

  44. Li X, Liao S, Liu B, Yang G, Jin Q, Xu J, Du X (2013) Renmin University of China at imageclef 2013 scalable concept image annotation. In: Proceedings of CLEF 2013 evaluation labs and workshop, online working notes. Valencia, Spain, 23–26 Sept 2013

  45. Li X, Snoek C, Worring M (2008) Learning tag relevance by neighbor voting for social image retrieval. In: Proceedings of MIR conference

  46. Liu J, Li M, Liu Q, Lu H, Ma S (2009) Image annotation via graph learning. Pattern Recognit 42(2):218–228

    Article  MATH  Google Scholar 

  47. Liu J, Wang B, Li M, Li Z, Ma W, Lu H, Ma S (2007) Dual cross-media relevance model for image annotation. In: Proceedings of ACM Multimedia, pp 605–614

  48. Liu W, Tao D (2013) Multiview hessian regularization for image annotation. Image Process IEEE Trans 22(7):2676–2687

    Article  MathSciNet  Google Scholar 

  49. Liu W, Tao D, Cheng J, Tang Y (2014) Multiview hessian discriminative sparse coding for image annotation. Comput Vis Image Underst 118:50–60

    Article  Google Scholar 

  50. Lyu S (2005) Mercer kernels for object recognition with local features. In: Proceedings of the IEEE computer vision and pattern recognition

  51. Maji S, Berg AC, Malik J (2013) Efficient classification for additive kernel svms. IEEE PAMI 35(1):66–77

    Article  Google Scholar 

  52. Makadia A, Pavlovic V, Kumar S (2008) A new baseline for image annotation. In: Forsyth D, Torr P, Zisserman A (eds) Computer vision—ECCV 2008, 10th European conference on computer vision. Lecture notes in computer science, vol 5304. Springer, Heidelberg, pp 316–329

  53. Mei T, Wang Y, Hua X-S, Gong S, Li S (2008) Coherent image annotation by learning semantic distance. In: Proceedings of computer vision and pattern recognition, CVPR, IEEE conference, IEEE, pp 1–8

  54. Monay F, Gatica Perez D (2004) Plsa-based image autoannotation: constraining the latent space. In: Proceedings of ACM international conference on multimedia

  55. Moran S, Lavrenko V (2014) A sparse kernel relevance model for automatic image annotation. Int J Multimed Inf Retr 3(4):209– 229

    Article  Google Scholar 

  56. Moreno P, Ho P, Vasconcelos N (2003) A kullback-leibler divergence based kernel for svm classfication in multimedia applications. In: Proceedings of neural information processing systems

  57. Moser G, Serpico B (2012) Combining support vector machines and markov random fields in an integrated framework for contextual image classification. In: Proceedings of TGRS

  58. Narayanan H, Belkin M, Niyogi P (2006) On the relation between low density separation, spectral clustering and graph cuts. In: Proceedings of advances in neural information processing systems, pp 1025–1032

  59. Nowak S, Huiskes M (2010) New strategies for image annotation: overview of the photo annotation task at imageclef 2010. In: Proceedings of the working notes of CLEF 2010

  60. Nowozin S, Lampert CH (2011) Structured learning and prediction in computer vision. Found Trends Comput Gr Vis 6(3–4):185–365

    MATH  Google Scholar 

  61. Pan J-Y, Yang H-J, Faloutsos C, Duygulu P (2004) Automatic multimedia cross-modal correlation discovery. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 653–658

  62. Rakotomamonjy A, Bach F, Canu S, Grandvalet Y (2008) SimpleMKL. JMLR 9:2491–2521

    MATH  MathSciNet  Google Scholar 

  63. Reshma IA, Ullah MZ, Aono M (2013) Kdevir at imageclef 2013 image annotation subtask. In: Proceedings of CLEF 2013 evaluation labs and workshop, online working notes, Valencia, Spain, 23–26 Sept 2013

  64. Ritendra D, Joshi D, Li J, Wang J (2008) Image retrieval: ideas, influences, and trends of the new age. In: Proceedings of ACM computing surveys

  65. Sahbi H (2013) Explicit context-aware kernel map learning for image annotation. In: Proceedings of the 9th international conference on computer vision systems

  66. Sahbi H, Audibert J, Keriven R (2007) Graph cut transducers for relevance feedback in content based image retrieval. In: Proceedings of the IEEE conference on computer vision

  67. Sahbi H, Audibert J-Y, Keriven R (2011) Context-dependent kernels for object classification. In: Proceedings of pattern analysis and machine intelligence (PAMI), vol 4, issue 33

  68. Sahbi H, Li X (2010) Context based support vector machines for interconnected image annotation (the Saburo Tsuji best regular paper award). In: Proceedings of the Asian conference on computer vision (ACCV)

  69. Sánchez-Oro J, Montalvo S, Montemayor AS, Pantrigo JJ, Duarte A, Fresno V, Martınez R (2013) Urjc&uned at imageclef 2013 photo annotation task. In: Proceedings of CLEF 2013 evaluation labs and workshop, online working notes. Valencia, Spain, 23–26 Sept 2013

  70. Semenovich D, Sowmya A (2010) Geometry aware local kernels for object recognition. In: Proceedings of ACCV

  71. Shawe-Taylor J, Cristianini N (2000) Support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge

  72. Singhal A, Jiebo L, Weiyu Z (2003) Probabilistic spatial context models for scene content understanding. In: Proceedings of CVPR

  73. Srikanth M, Varner J, Bowden M, Moldovan D (2005) Exploiting ontologies for automatic image annotation. In: Proceedings of SIGIR, pp 552–558

  74. Stone Z, Zickler T, Darrell T (2008) Auto-tagging facebook: social network context improves photo annotation. In: Proceedings of IVW

  75. Taskar B, Chatalbashev V, Koller D, Guestrin C (2005) Learning structured prediction models: a large margin approach. In: Proceedings of the 22nd international conference on machine learning, ACM, pp 896–903

  76. Tong W, Jin R (2007) Semi-supervised learning by mixed label propagation. Proc Natl Conf Artif Intell 22(1):651

    MathSciNet  Google Scholar 

  77. Torralba A, Murphy K, Freeman W (2007) Sharing visual features for multiclass and multiview object detection. In: Proceedings of IEEE transactions on pattern analysis and machine intelligence (PAMI) vol 25, issue 5

  78. Tsochantaridis I, Joachims T, Hofmann T, Altun Y (2005) Large margin methods for structured and interdependent output variables. In: Proceedings of journal of machine learning research, pp 1453–1484

  79. Uricchio T, Bertini M, Ballan L, Del Bimbo A (2013) Micc-unifi at imageclef 2013 scalable concept image annotation. In: Proceedings of CLEF 2013 evaluation labs and workshop, online working notes, Valencia, Spain, 23–26 Sept 2013

  80. Vapnik VN (1998) Statistical learning theory. Wiley-Interscience Publication, New York

  81. Vedaldi A, Zisserman A (2012) Efficient additive kernels via explicit feature maps. IEEE PAMI 34(3):480–492

    Article  Google Scholar 

  82. Villegas M, Paredes R, Thomee B (2013) Overview of the imageclef 2013 scalable concept image annotation subtask. In: Proceedings of CLEF 2013 evaluation labs and workshop, online working notes

  83. Vo P, Sahbi H (2012) Transductive kernel map learning and its application to image annotation. In: Proceedings of the British machine vision conference (BMVC)

  84. Wallraven C, Caputo B, Graf A (2003) Recognition with local features: the kernel recipe. In: Proceedings of ICCV, pp 257–264

  85. Wang C, Jing F, Zhang L, Zhang H (2006) Image annotation refinement using random walk with restarts. In: Proceedings of ACM Multimedia, pp 647–650

  86. Wang Y, Gong S (2007) Translating topics to words for image annotation. In: Proceedings of ACM CIKM

  87. Wu L, Hua X-S, Yu N, Ma W-Y, Li S (2008) Flickr distance. In: Proceedings of the 16th ACM international conference on multimedia, ACM, pp 31–40

  88. Wu L, Hua X-S, Yu N, Ma W-Y, Li S (2012) Flickr distance: a relationship measure for visual concepts. IEEE Trans Pattern Anal Mach Intell 34(5):863–875

    Article  Google Scholar 

  89. Yakhnenko O, Honavar V (2008) Annotating images and image objects using a hierarchical dirichlet process model. In: Proceedings of the 9th international workshop on multimedia data mining: held in conjunction with the ACM SIGKDD, ACM, pp 1–7

  90. Yang Y-H, Wu P-T, Lee C-W, Lin K-H, Hsu W, Chen H (2008) Contextseer: context search and recommendation at query time for shared consumer photos. In: Proceedings of ACM Multimedia

  91. Zhang H, Berg AC, Maire M, Malik J (2006) Svm-knn: discriminative nearest neighbor classification for visual category recognition. In: Proceedings of computer vision and pattern recognition, 2006 IEEE computer society conference, vol 2. IEEE, pp 2126–2136

  92. Zhang J, Marszalek M, Lazebnik S, Schmid C (2006) Local features and kernels for classification of texture and object categories: a comprehensive study. In: Proceedings of the beyond patches workshop, in conjunction with CVPR2006

  93. Zhou D, Bian J, Zheng S, Zha H, Giles CL (2008) Exploring social annotations for information retrieval. In: Proceedings of the 17th international conference on World Wide Web, ACM, pp 715–724

Download references

Acknowledgments

This work was supported in part by a Grant from the Research Agency ANR (Agence Nationale de la Recherche) under the MLVIS project ANR-11-BS02-0017.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hichem Sahbi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sahbi, H. ImageCLEF annotation with explicit context-aware kernel maps. Int J Multimed Info Retr 4, 113–128 (2015). https://doi.org/10.1007/s13735-015-0082-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13735-015-0082-3

Keywords

Navigation