Skip to main content
Log in

A context-aware semantic modeling framework for efficient image retrieval

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

In recent years, high-level image representation is gaining popularity in image classification and retrieval tasks. This paper proposes an efficient scheme known as semantic context model to derive high-level image descriptors well suited for the retrieval operation. Semantic context model uses an undirected graphical model based formulation which jointly exploits low-level visual features and contextual information for classifying local image blocks into some predefined concept classes. Contextual information involves concept co-occurrences and their spatial correlation statistics. More expressive potential functions are introduced to capture the structural dependencies among various semantic concepts. The proposed framework proceeds in three steps. Initially, optimal values of model parameters that impose spatial consistency of concept labels among local image blocks are learned from the training data. Then, the semantics associated with the constituent blocks of an unseen image are inferred using an improved message-passing algorithm. Finally, a compact but discriminative image signature is derived by integrating the frequency of occurrence of various regional semantics. Experimental results on various benchmark datasets show that semantic context model can effectively resolve local ambiguities and consequently improve concept recognition performance in complex images. Moreover, the retrieval efficiency of the new semantics based image feature is found to be much better than state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proceedings of Ninth IEEE international conference on computer vision, vol 2, pp 1470–1477

  2. Duan M, Wu X (2010) Visual polysemy and synonymy: toward near-duplicate image retrieval. Front Electr Electron Eng China 5(4):419–429

    Article  Google Scholar 

  3. Hofmann T (2001) Unsupervised learning by probabilistic latent semantic analysis. Mach Learn 42(1–2):177–196

    Article  MATH  Google Scholar 

  4. Zhang R, Zhang Z (2007) Effective image retrieval based on hidden concept discovery in image database. IEEE Trans Image Process 16(2):562–572

    Article  MathSciNet  Google Scholar 

  5. Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022

    MATH  Google Scholar 

  6. Biederman I, Mezzanotte R, Rabinowitz J (1982) Scene perception: detecting and judging objects undergoing relational violations. Cogn Psychol 14(2):143–177

    Article  Google Scholar 

  7. Kumar S, Hebert M (2006) Discriminative random fields. Int J Comput Vis 68(2):179–201

    Article  Google Scholar 

  8. Yu L, Xie J, Chen S (2012) Conditional random field-based image labelling combining features of pixels, segments and regions. IET Comput Vis 6(5):459–467

    Article  MathSciNet  Google Scholar 

  9. Vogel J, Schiele B (2007) Semantic modeling of natural scenes for content-based image retrieval. Int J Comput Vis 72(2):133–157

    Article  Google Scholar 

  10. Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  11. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: Proceedings of IEEE conference on computer vision and pattern recognition, vol 1, pp 886–893

  12. Bay, H., Tuytelaars, T., Van Gool, L (2006) Surf: speeded up robust features. In: Proceedings of the 9th European conference on computer vision, pp 404-417

  13. Tola E, Lepetit V, Fua P (2010) Daisy: an efficient dense descriptor applied to wide-baseline stereo. IEEE Trans Pattern Anal Mach Intell 32(5):815–830

    Article  Google Scholar 

  14. Li LJ, Su H, Lim Y, Fei-Fei L (2014) Object bank: an object-level image representation for high-level visual recognition. Int J Comput Vis 107(1):20–39

    Article  Google Scholar 

  15. Torresani L, Szummer M, Fitzgibbon A (2010) Efficient object category recognition using classemes. In: Proceedings of 11th European conference on computer vision. Springer, Berlin, Heidelberg, pp 776–789

  16. Chan A, A., Vasconcelos., N, (2005) Probabilistic kernels for the classification of auto-regressive visual processes. In: Proceedings of IEEE conference on computer vision and pattern recognition, vol 1, pp 846–851

  17. Zhang H, Berg A, Maire M, Malik J (2006) Svm-knn: discriminative nearest neighbor classification for visual category recognition. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 2126–2136

  18. Cai D, He X, Han J (2007) Efficient kernel discriminant analysis via spectral regression. In: Proceedings of Seventh IEEE international conference on data mining, pp 427–432

  19. Grauman K, Darrell T (2007) The pyramid match kernel: efficient learning with sets of features. J Mach Learn Res 8:725–760

    MATH  Google Scholar 

  20. Bosch A, Zisserman A, Munoz X (2008) Scene classification using a hybrid generative/discriminative approach. IEEE Trans Pattern Anal Mach Intell 30(4):712–727

    Article  Google Scholar 

  21. Jeon J, Lavrenko V, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval, pp 119–126

  22. Fei-Fei L, Perona P (2005) A bayesian hierarchical model for learning natural scene categories. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 524–531

  23. Sivic J, Russell B, Efros A, Zisserman A, Freeman W (2005) Discovering object and their localization in images. In: Proceedings of the tenth IEEE international conference on computer vision, vol 1, pp 370–377

  24. Sudderth E, Torralba A, Freeman W, Willsky A (2005) Learning hierarchical models of scenes, objects and parts. In: Proceedings of the tenth IEEE international conference on computer vision, vol 2, pp 1331–1338

  25. Carneiro G, Chan A, Moreno P, Vasconcelos N (2007) Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans Pattern Anal Mach Intell 29(3):394–410

    Article  Google Scholar 

  26. Rasiwasia N, Vasconcelos N (2012) Holistic context models for visual recognition. IEEE Trans Pattern Anal Mach Intell 34(5):902–917

    Article  Google Scholar 

  27. Bar M (2004) Visual objects in context. Nat Rev Neurosci 5(8):617–629

    Article  Google Scholar 

  28. Bar M, Ullman S (1993) Spatial context in recognition. Perception 25:343–352

    Article  Google Scholar 

  29. Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT Press, Cambridge, p 1280

  30. Lafferty J, McCallum A, Pereira FC (2001) Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th international conference on machine learning, pp 282–289

  31. Kohli P, Torr PH (2009) Robust higher order potentials for enforcing label consistency. Int J Comput Vis 82(3):302–324

    Article  Google Scholar 

  32. He X, Zemel RS, Carreira-Perpindn MA (2004) Multiscale conditional random fields for image labeling. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition, vol 2, pp 695–702

  33. Krhenbhl P, Koltun V (2012) Efficient inference in fully connected crfs with Gaussian edge potentials. arXiv:1210.5644

  34. Efron B (1975) The efficiency of logistic regression compared to normal discriminant analysis. J Am Stat Assoc 70(352):892–898

    Article  MathSciNet  MATH  Google Scholar 

  35. Kindermann R, Snell JL (1980) Markov random fields and their applications, vol 1. American Mathematical Society, Providence

    Book  MATH  Google Scholar 

  36. Dagli C, Huang TS (2004) A framework for grid-based image retrieval. In: Proceedings of the 17th IEEE international conference on pattern recognition, vol 2, pp 1021–1024

  37. Huiskes MJ, Lew MS (2008) The MIR Flickr retrieval evaluation. In: Proceedings of the 1st ACM international conference on multimedia information retrieval, pp 39–43

  38. Bruna J, Mallat S (2013) Invariant scattering convolution networks. IEEE Trans Pattern Anal Mach Intell 35(8):1872–1886

    Article  Google Scholar 

  39. Mallat S (2012) Group invariant scattering. Commun Pure Appl Math 65(10):1331–1398

    Article  MathSciNet  MATH  Google Scholar 

  40. Andn J, Mallat S (2011) Multiscale scattering for audio classification. In: ISMIR, pp 657–662

  41. Oyallon E, Mallat S, Sifre L (2013) Generic deep networks with wavelet scattering. arXiv:1312.5940v3

  42. Lee TS (1996) Image representation using 2D Gabor wavelets. IEEE Trans Pattern Anal Mach Intell 18(10):959–971

    Article  Google Scholar 

  43. Platt J (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Adv Large Margin Classif 10(3):61–74

    Google Scholar 

  44. Wu TF, Lin CJ, Weng RC (2004) Probability estimates for multi-class classification by pairwise coupling. J Mach Learn Res 5:975–1005

    MathSciNet  MATH  Google Scholar 

  45. Sutton C, McCallum A (2007) Piecewise pseudo likelihood for efficient training of conditional random fields. In: Proceedings of the 24th ACM international conference on machine learning, pp 863–870

  46. Beck A, Ben-Tal A (2006) On the solution of the Tikhonov regularization of the total least squares problem. SIAM J Optim 17(1):98–118

    Article  MathSciNet  MATH  Google Scholar 

  47. Kelley CT (1999) Iterative methods for optimization. Frontiers in applied mathematics. Siam, Philadelphia, PA

  48. Gill PE, Murray W, Wright MH (1981) Practical optimization, vol 5. Academic press, London

    MATH  Google Scholar 

  49. Lempitsky V, Rother C, Roth S, Blake A (2010) Fusion moves for markov random field optimization. IEEE Trans Pattern Anal Mach Intell 32(8):1392–1405

    Article  Google Scholar 

  50. Murphy KP, Weiss Y, Jordan MI (1999) Loopy belief propagation for approximate inference: an empirical study. In: Proceedings of the Fifteenth International conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers Inc, pp 467–475

  51. Murray I, Ghahramani Z (2004) Bayesian learning in undirected graphical models: approximate MCMC algorithms. In: Proceedings of the 20th International conference on uncertainty in artificial intelligence. AUAI Press, pp 392–399

  52. Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, San Francisco, CA

  53. Johnson D, Sinanovic S (2001) Symmetrizing the kullback-leibler distance. http://www-dsp.rice.edu/~dhj/resistor.pdf

  54. Barla A, Odone F, Verri A (2003) Histogram intersection kernel for image classification. In: Proceedings of international conference on image processing, vol 3, pp 513–516

  55. Zobel J, Moffat A, Ramamohanarao K (1998) Inverted files versus signature files for text indexing. ACM Trans Database Syst 23(4):453–490

    Article  Google Scholar 

  56. van Hateren JH, van der Schaaf A (1998) Independent component filters of natural images compared with simple cells in primary visual cortex. Proc R Soc Lond B Biol Sci 265(1394):359–366

    Article  Google Scholar 

  57. Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A (2014) Learning deep features for scene recognition using places database. In: Advances in Neural Information Processing Systems (NIPS), pp 487–495

  58. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2014) Imagenet large scale visual recognition challenge. Int J Comput Vis, pp 1–42

  59. Kohavi R, Provost F (1998) Glossary of terms. Mach Learn 30(2–3):271–274

    Google Scholar 

  60. Chum O, Philbin J, Zisserman A (2008) Near duplicate image detection: min-Hash and tf-idf weighting. In: Proceedings of British machine vision conference, vol 810, pp 812–815

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. S. Arun.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Arun, K.S., Govindan, V.K. A context-aware semantic modeling framework for efficient image retrieval. Int. J. Mach. Learn. & Cyber. 8, 1259–1285 (2017). https://doi.org/10.1007/s13042-016-0498-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-016-0498-y

Keywords

Navigation