Abstract
The general recipe of kernel methods, such as support vector machines (SVMs), includes a preliminary step of hand-crafting or designing similarity kernels. This process, which has been extensively studied during the last decade, has proven to be relatively successful in solving many pattern recognition problems including image annotation. However, many proposed solutions for kernel design, consider similarity between data by taking into account only their content and without context. In this paper, we propose an alternative that upgrades and further enhances usual kernels by making them context-aware. The method is based on the optimization of an objective function mixing content, regularization and also context. We will show that the underlying kernel solution converges to a positive semi-definite fixed-point, which can also be expressed as a dot product involving “explicit” kernel maps. When plugging these explicit context-aware kernel maps into support vector machines, performances substantially improve and outperform competitors for the hard task of image annotation using a recent ImageCLEF annotation benchmark.
Similar content being viewed by others
Notes
This also happens in the ImageCLEF database, used in our experiments, where “image–image” links correspond to other kinds of relationships rather than visual continuity.
It is important to emphasize that used tags are different from the concepts (classes) used for training and annotation. Indeed, each image belongs to one or multiple concepts which are different from these tags (see experiments).
This can be achieved by clustering an interconnected image database into disconnected groups, such as communities in social networks.
see http://www.imageclef.org/2013/photo, for information about this challenge and its official results.
Again, this includes tags different from classes/concepts.
References
Bahlmann C, Haasdonk B, Burkhardt H (2002) On-line handwriting recognition with support vector machines, a kernel approach. In: Proceedings of IWFHR, pp 49–54
Barnard K, Duygululu P, Forsyth D, Blei D, Jordan M (2003) Matching words and pictures. J Mach Learn Res 3:1107–1135
Belkin M, Niyogi P (2004) Semi-supervised learning on manifolds. Mach Learn 56:209–239
Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comp 15(6):1373–1396
Belkin M, Niyogi P (2006) Manifold regularization: a geometric framework for learning from examples. J Mach Learn Res 7:2399–2434
Benavent X, Castellanos A, de Ves E, Hernández-Aranda D, Granados R, Garcia-Serrano A (2013) A multimedia ir-based system for the photo annotation task at imageclef2013. In: Proceedings of CLEF 2013 evaluation labs and workshop, online working notes. Valencia, Spain, 23–26 Sept 2013
Bertelli L, Yu T, Vu D, Gokturk B (2011) Kernelized structural svm learning for supervised object segmentation. In: Proceedings of computer vision and pattern recognition (CVPR), IEEE Conference, IEEE, pp 2153–2160
Blei DM, Jordan MI (2003) Modeling annotated data. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval, SIGIR ’03, ACM, New York, pp 127–134
Borgne H, Popescua A, Znaidia A (2013) Cea list@imageclef 2013: scalable concept image annotation. In: Proceedings of CLEF 2013 evaluation labs andworkshop, online working notes. Valencia, Spain, 23–26 Sept 2013
Bottou L (2010) Large scale machine learning with stochastic gradient descent. In: Proceedings of the 19th international conference on computational statistics, pp 177–187
Boughorbel S, Tarel J, Boujemaa N (2005) The intermediate matching kernel for image local features. In: Proceedings of IEEE international joint conference on neural networks, vol 2, pp 889–894
Boykov Y, Veksler O, Zabih R (2001) Fast approximate energy minimization via graph cuts. Pattern Anal Mach Intell IEEE Trans 23(11):1222–1239
Cao L, Luo J, Huang T (2008) Annotating photo collection by label propagation according to multiple similarity cues. ACM Multimedia
Carneiro G, Chan AB, Moreno PJ, Vasconcelos N (2007) Supervised learning of semantic classes for image annotation and retrieval. Pattern Anal Mach Intell IEEE Trans 29(3):394–410
Carson C, Thomas M, Belongie S, Hellerstein JM, Malik J (1999) Blobworld: a system for region-based image indexing and retrieval. In: Proceedings of third international conference on visual information systems, pp 509–516
Chang E, Goh K, Sychay G, Wu G (2003) Cbsa: content-based soft annotation for multimodal image retrieval using bayes point machines. Circuits Syst Video Technol IEEE Trans 13(1):26–38
Cusano C, Ciocca G, Schettini R (2003) Image annotation using svm. In: Proceedings of electronic imaging 2004, International Society for Optics and Photonics, pp 330–338
Davis M, King S, Good N, Sarvas R (2004) From context to content: leveraging context to infer media metadata. In: Proceedings of 12th annual ACM international conference on multimedia, MM 2004, Brave new topics session on from context to content: leveraging contextual metadata to infer multimedia Content, ACM Press, New York, pp 188–195
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: Proceedings of computer vision and pattern recognition, CVPR 2009. IEEE Conference, IEEE, pp 248–255
Duygulu P, Barnard K, deFreitas J, Forsyth D (2002) Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: Heyden A, Sparr G, Nielsen M, Johansen P (eds) ECCV 2002, LNCS, vol 2353. Springer, Heidelberg, pp 97–112
Feng S, Manmatha R, Lavrenko V (2004) Multiple bernoulli relevance models for image and video annotation. In: Proceedings of ICCV, pp 1002–1009
Gallagher A, Neustaedter C, Cao L, Luo J, Chen T (2008) Image annotation using personal calendars as context. ACM Multimedia
Gao Y, Fan J, Xue X, Jain R (2006) Automatic image annotation by incorporating feature hierarchy and boosting to scale up svm classifiers. In: Proceedings of ACM Multimedia
Gartner T (2003) A survey of kernels for structured data. Multi Relat Data Min 5(1):49–58
Gómez-Chova L, Camps-Valls G, Munoz-Mari J, Calpe J (2008) Semisupervised image classification with laplacian support vector machines. Geosci Remote Sens Lett IEEE 5(3):336–340
Grana C, Serra G, Manfredi M, Cucchiara R, Martoglia R, Mandreoli F (2013) Unimore at imageclef 2013: scalable concept image annotation. In: Proceedings of CLEF 2013 evaluation labs and workshop, online working notes, Valencia, Spain, 23–26 Sept 2013
Grangier D, Bengio S (2008) A discriminative kernel-based approach to rank images from text queries. Pattern Anal Mach Intell IEEE Trans 30(8):1371–1384
Grauman K, Darrell T (2007) The pyramid match kernel: efficient learning with sets of features. J Mach Learn Res (JMLR) 8:725–760
Guillaumin M, Mensink T, Verbeek J, Schmid C (2009) Tagprop: discriminative metric learning in nearest neighbor models for image auto-annotation. In: Proceedings of computer vision, IEEE 12th international conference, IEEE, pp 309–316
Gupta M, Li R, Yin Z, Han J (2010) Survey on social tagging techniques. SIGKDD Explor 12(1):58–72
Hanjalic A (2012) A new gap to bridge: where to go next in social media retrieval? In: Schoeffmann K, Mérialdo B, Hauptmann AG, Ngo C-W, Andreopoulos Y, Breiteneder C (eds) Advances in Multimedia Modeling, 18th International Conference, MMM 2012. Lecture notes in Computer Science, vol 7131. Springer, Heidelberg
He X, Zemel RS, Carreira-Perpindn MA (2004) Multiscale conditional random fields for image labeling. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, CVPR 2004, vol 2. IEEE, pp 695–702
Hidaka M, Gunji N, Harada T (2013) Mil at imageclef 2013: scalable system for image annotation. In: Proceedings of CLEF 2013 evaluation labs and workshop, online working notes. Valencia, Spain, 23–26 Sept 2013
Hironobu YM, Takahashi H, Oka R (1999) Image-to-word transformation based on dividing and vector quantizing images with words. In: Proceedings of Boltzmann machines, neural networks, pp 405–409
Jeon J, Lavrenko V, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of ACM SIGIR, pp 119–126
Jin R, Chai JY, Si L (2004) Effective automatic image annotation via a coherent language model and active learning. In: Proceedings of the 12th annual ACM international conference on Multimedia, ACM, pp 892–899
Jin Y, Khan L, Wang L, Awad M (2005) Image annotations by combining multiple evidence and wordnet. In: Proceedings of ACM Multimedia, pp 706–715
Kang F, Jin R, Sukthankar R (2006) Correlated label propagation with application to multi-label learning. In: Proceedings of computer vision and pattern recognition, IEEE Computer Society Conference, vol 2. IEEE, pp 1719–1726
Kondor R, Jebara T (2003) A kernel between sets of vectors. In: Proceedings of the 20th international conference on machine learning
Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25:1106–1114
Lavrenko V, Manmatha R, Jeon J (2004) A model for learning the semantics of pictures. In: Proceedings of NIPS
Li J, Wang JZ (2003) Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Trans PAMI 25(9):1075–1088
Li J, Wang JZ (2008) Real-time computerized annotation of pictures. Pattern Anal Mach Intell IEEE Trans 30(6):985–1002
Li X, Liao S, Liu B, Yang G, Jin Q, Xu J, Du X (2013) Renmin University of China at imageclef 2013 scalable concept image annotation. In: Proceedings of CLEF 2013 evaluation labs and workshop, online working notes. Valencia, Spain, 23–26 Sept 2013
Li X, Snoek C, Worring M (2008) Learning tag relevance by neighbor voting for social image retrieval. In: Proceedings of MIR conference
Liu J, Li M, Liu Q, Lu H, Ma S (2009) Image annotation via graph learning. Pattern Recognit 42(2):218–228
Liu J, Wang B, Li M, Li Z, Ma W, Lu H, Ma S (2007) Dual cross-media relevance model for image annotation. In: Proceedings of ACM Multimedia, pp 605–614
Liu W, Tao D (2013) Multiview hessian regularization for image annotation. Image Process IEEE Trans 22(7):2676–2687
Liu W, Tao D, Cheng J, Tang Y (2014) Multiview hessian discriminative sparse coding for image annotation. Comput Vis Image Underst 118:50–60
Lyu S (2005) Mercer kernels for object recognition with local features. In: Proceedings of the IEEE computer vision and pattern recognition
Maji S, Berg AC, Malik J (2013) Efficient classification for additive kernel svms. IEEE PAMI 35(1):66–77
Makadia A, Pavlovic V, Kumar S (2008) A new baseline for image annotation. In: Forsyth D, Torr P, Zisserman A (eds) Computer vision—ECCV 2008, 10th European conference on computer vision. Lecture notes in computer science, vol 5304. Springer, Heidelberg, pp 316–329
Mei T, Wang Y, Hua X-S, Gong S, Li S (2008) Coherent image annotation by learning semantic distance. In: Proceedings of computer vision and pattern recognition, CVPR, IEEE conference, IEEE, pp 1–8
Monay F, Gatica Perez D (2004) Plsa-based image autoannotation: constraining the latent space. In: Proceedings of ACM international conference on multimedia
Moran S, Lavrenko V (2014) A sparse kernel relevance model for automatic image annotation. Int J Multimed Inf Retr 3(4):209– 229
Moreno P, Ho P, Vasconcelos N (2003) A kullback-leibler divergence based kernel for svm classfication in multimedia applications. In: Proceedings of neural information processing systems
Moser G, Serpico B (2012) Combining support vector machines and markov random fields in an integrated framework for contextual image classification. In: Proceedings of TGRS
Narayanan H, Belkin M, Niyogi P (2006) On the relation between low density separation, spectral clustering and graph cuts. In: Proceedings of advances in neural information processing systems, pp 1025–1032
Nowak S, Huiskes M (2010) New strategies for image annotation: overview of the photo annotation task at imageclef 2010. In: Proceedings of the working notes of CLEF 2010
Nowozin S, Lampert CH (2011) Structured learning and prediction in computer vision. Found Trends Comput Gr Vis 6(3–4):185–365
Pan J-Y, Yang H-J, Faloutsos C, Duygulu P (2004) Automatic multimedia cross-modal correlation discovery. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 653–658
Rakotomamonjy A, Bach F, Canu S, Grandvalet Y (2008) SimpleMKL. JMLR 9:2491–2521
Reshma IA, Ullah MZ, Aono M (2013) Kdevir at imageclef 2013 image annotation subtask. In: Proceedings of CLEF 2013 evaluation labs and workshop, online working notes, Valencia, Spain, 23–26 Sept 2013
Ritendra D, Joshi D, Li J, Wang J (2008) Image retrieval: ideas, influences, and trends of the new age. In: Proceedings of ACM computing surveys
Sahbi H (2013) Explicit context-aware kernel map learning for image annotation. In: Proceedings of the 9th international conference on computer vision systems
Sahbi H, Audibert J, Keriven R (2007) Graph cut transducers for relevance feedback in content based image retrieval. In: Proceedings of the IEEE conference on computer vision
Sahbi H, Audibert J-Y, Keriven R (2011) Context-dependent kernels for object classification. In: Proceedings of pattern analysis and machine intelligence (PAMI), vol 4, issue 33
Sahbi H, Li X (2010) Context based support vector machines for interconnected image annotation (the Saburo Tsuji best regular paper award). In: Proceedings of the Asian conference on computer vision (ACCV)
Sánchez-Oro J, Montalvo S, Montemayor AS, Pantrigo JJ, Duarte A, Fresno V, Martınez R (2013) Urjc&uned at imageclef 2013 photo annotation task. In: Proceedings of CLEF 2013 evaluation labs and workshop, online working notes. Valencia, Spain, 23–26 Sept 2013
Semenovich D, Sowmya A (2010) Geometry aware local kernels for object recognition. In: Proceedings of ACCV
Shawe-Taylor J, Cristianini N (2000) Support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge
Singhal A, Jiebo L, Weiyu Z (2003) Probabilistic spatial context models for scene content understanding. In: Proceedings of CVPR
Srikanth M, Varner J, Bowden M, Moldovan D (2005) Exploiting ontologies for automatic image annotation. In: Proceedings of SIGIR, pp 552–558
Stone Z, Zickler T, Darrell T (2008) Auto-tagging facebook: social network context improves photo annotation. In: Proceedings of IVW
Taskar B, Chatalbashev V, Koller D, Guestrin C (2005) Learning structured prediction models: a large margin approach. In: Proceedings of the 22nd international conference on machine learning, ACM, pp 896–903
Tong W, Jin R (2007) Semi-supervised learning by mixed label propagation. Proc Natl Conf Artif Intell 22(1):651
Torralba A, Murphy K, Freeman W (2007) Sharing visual features for multiclass and multiview object detection. In: Proceedings of IEEE transactions on pattern analysis and machine intelligence (PAMI) vol 25, issue 5
Tsochantaridis I, Joachims T, Hofmann T, Altun Y (2005) Large margin methods for structured and interdependent output variables. In: Proceedings of journal of machine learning research, pp 1453–1484
Uricchio T, Bertini M, Ballan L, Del Bimbo A (2013) Micc-unifi at imageclef 2013 scalable concept image annotation. In: Proceedings of CLEF 2013 evaluation labs and workshop, online working notes, Valencia, Spain, 23–26 Sept 2013
Vapnik VN (1998) Statistical learning theory. Wiley-Interscience Publication, New York
Vedaldi A, Zisserman A (2012) Efficient additive kernels via explicit feature maps. IEEE PAMI 34(3):480–492
Villegas M, Paredes R, Thomee B (2013) Overview of the imageclef 2013 scalable concept image annotation subtask. In: Proceedings of CLEF 2013 evaluation labs and workshop, online working notes
Vo P, Sahbi H (2012) Transductive kernel map learning and its application to image annotation. In: Proceedings of the British machine vision conference (BMVC)
Wallraven C, Caputo B, Graf A (2003) Recognition with local features: the kernel recipe. In: Proceedings of ICCV, pp 257–264
Wang C, Jing F, Zhang L, Zhang H (2006) Image annotation refinement using random walk with restarts. In: Proceedings of ACM Multimedia, pp 647–650
Wang Y, Gong S (2007) Translating topics to words for image annotation. In: Proceedings of ACM CIKM
Wu L, Hua X-S, Yu N, Ma W-Y, Li S (2008) Flickr distance. In: Proceedings of the 16th ACM international conference on multimedia, ACM, pp 31–40
Wu L, Hua X-S, Yu N, Ma W-Y, Li S (2012) Flickr distance: a relationship measure for visual concepts. IEEE Trans Pattern Anal Mach Intell 34(5):863–875
Yakhnenko O, Honavar V (2008) Annotating images and image objects using a hierarchical dirichlet process model. In: Proceedings of the 9th international workshop on multimedia data mining: held in conjunction with the ACM SIGKDD, ACM, pp 1–7
Yang Y-H, Wu P-T, Lee C-W, Lin K-H, Hsu W, Chen H (2008) Contextseer: context search and recommendation at query time for shared consumer photos. In: Proceedings of ACM Multimedia
Zhang H, Berg AC, Maire M, Malik J (2006) Svm-knn: discriminative nearest neighbor classification for visual category recognition. In: Proceedings of computer vision and pattern recognition, 2006 IEEE computer society conference, vol 2. IEEE, pp 2126–2136
Zhang J, Marszalek M, Lazebnik S, Schmid C (2006) Local features and kernels for classification of texture and object categories: a comprehensive study. In: Proceedings of the beyond patches workshop, in conjunction with CVPR2006
Zhou D, Bian J, Zheng S, Zha H, Giles CL (2008) Exploring social annotations for information retrieval. In: Proceedings of the 17th international conference on World Wide Web, ACM, pp 715–724
Acknowledgments
This work was supported in part by a Grant from the Research Agency ANR (Agence Nationale de la Recherche) under the MLVIS project ANR-11-BS02-0017.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Sahbi, H. ImageCLEF annotation with explicit context-aware kernel maps. Int J Multimed Info Retr 4, 113–128 (2015). https://doi.org/10.1007/s13735-015-0082-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13735-015-0082-3