Skip to main content
Log in

Optimizing visual dictionaries for effective image retrieval

  • Regular Paper
  • Published:
International Journal of Multimedia Information Retrieval Aims and scope Submit manuscript

Abstract

Characterizing images by high-level concepts from a learned visual dictionary is extensively used in image classification and retrieval. This paper deals with inferring discriminative visual dictionaries for effective image retrieval and examines a non-negative visual dictionary learning scheme towards this direction. More specifically, a non-negative matrix factorization framework with \(\ell _0\)-sparseness constraint on the coefficient matrix for optimizing the dictionary is proposed. It is a two-step iterative process composed of sparse encoding and dictionary enhancement stages. An initial estimate of the visual dictionary is updated in each iteration with the proposed \(\ell _0\)-constraint gradient projection algorithm. A desirable attribute of this formulation is an adaptive sequential dictionary initialization procedure. This leads to a sharp drop down of the approximation error and a faster convergence. Finally, the proposed dictionary optimization scheme is used to derive a compact image representation for the retrieval task. A new image signature is obtained by projecting local descriptors on to the basis elements of the optimized visual dictionary and then aggregating the resulting sparse encodings in to a single feature vector. Experimental results on various benchmark datasets show that the proposed system can infer enhanced visual dictionaries and the derived image feature vector can achieve better retrieval results as compared to state-of-the-art techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Rebollo-Neira L (2004) Dictionary redundancy elimination. IEE Proc Vis Image Signal Process 151(1):31–34

    Article  Google Scholar 

  2. Lewicki M, Sejnowski T (2000) Learning overcomplete representations. Neural Comput 12(2):337–365

    Article  Google Scholar 

  3. Lee DD, Seung HS (1999) Learning the parts of objects by nonnegative matrix factorization. Nature 401:788–791

    Article  Google Scholar 

  4. Berry M, Browne M, Langville A, Pauca P, Plemmons R (2007) Algorithms and applications for approximate nonnegative matrix factorization. Comput Stat Data Anal 52:55–173

    Article  MathSciNet  Google Scholar 

  5. Spratling MW (2006) Learning image components for object recognition. J Mach Learn Res 7:793–815

    MathSciNet  MATH  Google Scholar 

  6. Xinhui H, Ryosuke I, Hisashi K Satoshi N (2010) Clustered-based language model for spoken document retrieval using NMF-based document clustering. In: Interspeech proceeding, pp 705–708

  7. Dhillon IS, Modha DM (2001) Concept decompositions for large sparse text data using clustering. Mach Learn 42:143–175

    Article  MATH  Google Scholar 

  8. Cadzow JA (2002) Minimum \(\ell _1\), \(\ell _2\) and \(\ell _{\infty }\) norm approximate solutions to an overdetermined system of linear equations. Digit Signal Process 12(4):524–560

    Article  Google Scholar 

  9. Aharon M, Elad M, Bruckstein A (2005) K-SVD and its non-negative variant for dictionary design. In: Proceedings of the SPIE conference on curvelet, directional, and sparse representations, vol 5914, pp 11.1–11.13

  10. Peharz R, Pernkopf F (2012) Sparse nonnegative matrix factorization with \(\ell ^0\)-constraints. Neurocomput Spec Issue Mach Learn Signal Process 80(1):38–46

    Google Scholar 

  11. Bevilacqua M, Roumy A, Guillemot C, Morel MLA (2013) K-WEB: nonnegative dictionary learning for sparse image representations. In: Proceedings of the IEEE international conference on image processing

  12. Shneier M, Abdel-Mottaleb M (1996) Exploiting the JPEG compression scheme for image retrieval. IEEE Trans Pattern Anal Mach Intell 18(8):849–853

    Article  Google Scholar 

  13. Jacobs CE, Finkelstein A, Salesin DH (1995) Fast multi resolution image querying. In: Proceedings of the 22nd ACM annual conference on computer graphics and interactive techniques, pp 277–286

  14. Zhou W, Sei-ichiro K (2013) Face recognition with learned local curvelet patterns and 2-directional l1-norm based 2DPCA. In: Proceedings of the 10th Asian conference on computer vision

  15. Mallat S, Pennec EL (2005) Bandelet image approximation and compression. SIAM Multiscale Model Simul 4(3):992–1039

    Article  MATH  Google Scholar 

  16. Mairal J, Bach F, Ponce J, Sapiro G (2010) Online learning for matrix factorization and sparse coding. J Mach Learn Res 11:19–60

    MathSciNet  MATH  Google Scholar 

  17. Lu G, Teng S (1999) A novel image retrieval technique based on vector quantization. In: Proceedings of the international conference on computational intelligence for modelling, control and automation, pp 36–41

  18. Belhumeur PN, Hespanha JP, Kriegman D (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720

    Article  Google Scholar 

  19. Bartlett MS, Movellan JR, Sejnowski TJ (2002) Face recognition by independent component analysis. IEEE Trans Neural Netw 13(6):1450–1464

    Article  Google Scholar 

  20. Wang N, Jingdong W, Yeung DY (2013) Online robust non-negative dictionary learning for visual tracking. In: Proceedings of IEEE international conference on computer vision, pp 657–664

  21. Ross DA, Zemel RS (2006) Learning parts-based representations of data. J Mach Learn Res 7:2369–2397

    MathSciNet  MATH  Google Scholar 

  22. Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791

    Article  Google Scholar 

  23. Lee H, Battle A, Raina R, Ng AY (2006) Efficient sparse coding algorithms. In: Advances in neural information processing systems, pp 801–808

  24. Olshausen BA, Field DJ (1997) Sparse coding with an over complete basis set: a strategy employed by V1? Vis Res 37(23):3311–3325

    Article  Google Scholar 

  25. Hoyer PO (2004) Non-negative matrix factorization with sparseness constraints. J Mach Learn Res 5:1457–1469

    MathSciNet  MATH  Google Scholar 

  26. Lee DD, Seung HS (2000) Algorithms for non-negative matrix factorization. In: Proceedings of advances in neural information processing systems, pp 556–562

  27. Kim H, Park H (2008) Non negative matrix factorization based on alternating non negativity constrained least squares and active set method. SIAM J Matrix Anal Appl 30(2):713–730

    Article  MathSciNet  MATH  Google Scholar 

  28. Lin CJ (2007) Projected gradient methods for non negative matrix factorization. Neural Comput 19(10):2756–2779

    Article  MathSciNet  MATH  Google Scholar 

  29. Mallat S, Zhang Z (1993) Matching pursuits with time–frequency dictionaries. IEEE Trans Signal Process 41:3397–3415

    Article  MATH  Google Scholar 

  30. Pati YC, Rezaiifar R, Krishnaprasad PS (1993) Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition. In: Proceedings of the twenty-seventh IEEE conference on signals, systems and computers, pp 40–44

  31. Chen S, Donoho D, Saunders M (1998) Automatic decomposition by basis pursuit. SIAM J Sci Comput 1(3):33–61

    Article  MathSciNet  Google Scholar 

  32. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58(1):267–288

    MathSciNet  MATH  Google Scholar 

  33. Gorodnitsky IF, Rao BD (1997) Sparse signal reconstruction from limited data using FOCUSS: a re-weighted minimum norm algorithm. IEEE Trans Signal Process 45(3):600–616

    Article  Google Scholar 

  34. Aharon M, Elad M, Bruckstein A (2006) K-SVD: an algorithm for designing over complete dictionaries for sparse representation. IEEE Trans Signal Process 54(11):4311–4322

    Article  Google Scholar 

  35. Patrik OH (2004) Non-negative matrix factorization with sparseness constraints. J Mach Learn Res 5:1457–1469

    MATH  Google Scholar 

  36. Nakayama H, Harada T, Kuniyoshi Y (2010) Dense sampling low-level statistics of local features. IEICE Trans Inf Syst 93(7):1727–1736

    Article  Google Scholar 

  37. Nowak E, Jurie F, Triggs B (2006) Sampling strategies for bag-of-features image classification. In: Proceedings of the European conference on computer vision, pp 490–503

  38. Langville AN, Meyer CD, Albright R, Cox J, Duling D (2006) Initializations for the non negative matrix factorization. In: Proceedings of the twelfth ACM SIGKDD international conference on knowledge discovery and data mining, pp 23–26

  39. Rezaei M, Boostani R, Rezaei M (2011) An efficient initialization method for non negative matrix factorization. J Appl Sci 11(2):354–359

    Article  Google Scholar 

  40. Jafari MG, Plumbley MD (2011) Fast dictionary learning for sparse representations of speech signals. J Sel Top Signal Process 5(5):1025–1031

    Article  Google Scholar 

  41. Tropp J (2004) Greed is good: algorithmic results for sparse approximation. IEEE Trans Inf Theory 50(10):2231–2242

    Article  MathSciNet  MATH  Google Scholar 

  42. Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, London

  43. Vartak MN (1955) On an application of Kronecker product of matrices to statistical designs. Ann Math Stat 26(3):420–438

  44. Armijo L (1966) Minimization of functions having Lipschitz continuous first partial derivatives. Pac J Math 16(1):1–3

    Article  MathSciNet  MATH  Google Scholar 

  45. Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1–8

  46. Zhao Y, Hong R, Jiang J, Wen J, Zhang H (2013) Image matching by fast random sample consensus. In: Proceedings of the fifth international conference on internet multimedia computing and service, pp 159–162

  47. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the international conference on computer vision and pattern recognition, vol 2, pp 2169–2178

  48. Zhang Y, Jia Z, Chen T (2011) Image retrieval with geometry-preserving visual phrases. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 809–816

  49. Torralba A, Fergus R, Weiss Y (2008) Small codes and large image databases for recognition. In: Proceedings on computer vision and pattern recognition, pp 1–8

  50. Jgou H, Douze M, Schmid C, Prez P (2010) Aggregating local descriptors into a compact image representation. In: Proceeding of IEEE conference on computer vision and pattern recognition (CVPR), pp 3304–3311

  51. Perronnin F, Liu Y, Snchez J, Poirier H (2010) Large-scale image retrieval with compressed fisher vectors. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), pp 3384–3391

  52. Chatfield K, Lempitsky V, Vedaldi A, Zisserman A (2011) The devil is in the details: an evaluation of recent feature encoding methods. In: Proceedings of the 22nd british machine vision conference (BMVC), pp 76.1–76.12

  53. Tamura H, Mori S, Yamawaki T (1978) Textural features corresponding to visual perception. IEEE Trans Syst Man Cybern 8:460–472

    Article  Google Scholar 

  54. Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proceedings of ninth IEEE international conference on computer vision, pp 1470–1477

  55. Herve J, Matthijs D, Cordelia S (2008) Hamming embedding and weak geometric consistency for large scale image search. In: European conference on computer vision 2008 (ECCV 2008). Springer, Berlin, pp 304–317

  56. http://www.vision.ee.ethz.ch/showroom/zubud/index.en.html

  57. Lindeberg T (1998) Feature detection with automatic scale selection. Int J Comput Vis 30(2):79–116

    Article  Google Scholar 

  58. Mikolajczyk K, Schmid C (2004) Scale & affine invariant interest point detectors. Int J Comput Vis 60(1):63–86

    Article  Google Scholar 

  59. Lowe DG (2004) Distinctive image features from scale-invariant key points. Int J Comput Vis 60(2):91–110

    Article  Google Scholar 

  60. Tola E, Lepetit V, Fua P (2010) Daisy: an efficient dense descriptor applied to wide-baseline stereo. IEEE Trans Pattern Anal Mach Intell 32(5):815–830

    Article  Google Scholar 

  61. Bouachir W, Kardouchi M, Belacel N (2009) Improving bag of visual words image retrieval: a fuzzy weighting scheme for efficient indexation. In: Proceedings of fifth IEEE international conference on signal-image technology & internet-based systems (SITIS), pp 215–220

  62. Chum O, Philbin J, Zisserman A (2008) Near duplicate image detection: min-Hash and tf-idf weighting. In BMVC, vol 810, pp 812–815

  63. Ke Y, Sukthankar R (2004) PCA-SIFT: a more distinctive representation for local image descriptors. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR), vol 2, pp II-506

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. S. Arun.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Arun, K.S., Govindan, V.K. Optimizing visual dictionaries for effective image retrieval. Int J Multimed Info Retr 4, 165–185 (2015). https://doi.org/10.1007/s13735-015-0076-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13735-015-0076-1

Keywords

Navigation