Abstract
Characterizing images by high-level concepts from a learned visual dictionary is extensively used in image classification and retrieval. This paper deals with inferring discriminative visual dictionaries for effective image retrieval and examines a non-negative visual dictionary learning scheme towards this direction. More specifically, a non-negative matrix factorization framework with \(\ell _0\)-sparseness constraint on the coefficient matrix for optimizing the dictionary is proposed. It is a two-step iterative process composed of sparse encoding and dictionary enhancement stages. An initial estimate of the visual dictionary is updated in each iteration with the proposed \(\ell _0\)-constraint gradient projection algorithm. A desirable attribute of this formulation is an adaptive sequential dictionary initialization procedure. This leads to a sharp drop down of the approximation error and a faster convergence. Finally, the proposed dictionary optimization scheme is used to derive a compact image representation for the retrieval task. A new image signature is obtained by projecting local descriptors on to the basis elements of the optimized visual dictionary and then aggregating the resulting sparse encodings in to a single feature vector. Experimental results on various benchmark datasets show that the proposed system can infer enhanced visual dictionaries and the derived image feature vector can achieve better retrieval results as compared to state-of-the-art techniques.
Similar content being viewed by others
References
Rebollo-Neira L (2004) Dictionary redundancy elimination. IEE Proc Vis Image Signal Process 151(1):31–34
Lewicki M, Sejnowski T (2000) Learning overcomplete representations. Neural Comput 12(2):337–365
Lee DD, Seung HS (1999) Learning the parts of objects by nonnegative matrix factorization. Nature 401:788–791
Berry M, Browne M, Langville A, Pauca P, Plemmons R (2007) Algorithms and applications for approximate nonnegative matrix factorization. Comput Stat Data Anal 52:55–173
Spratling MW (2006) Learning image components for object recognition. J Mach Learn Res 7:793–815
Xinhui H, Ryosuke I, Hisashi K Satoshi N (2010) Clustered-based language model for spoken document retrieval using NMF-based document clustering. In: Interspeech proceeding, pp 705–708
Dhillon IS, Modha DM (2001) Concept decompositions for large sparse text data using clustering. Mach Learn 42:143–175
Cadzow JA (2002) Minimum \(\ell _1\), \(\ell _2\) and \(\ell _{\infty }\) norm approximate solutions to an overdetermined system of linear equations. Digit Signal Process 12(4):524–560
Aharon M, Elad M, Bruckstein A (2005) K-SVD and its non-negative variant for dictionary design. In: Proceedings of the SPIE conference on curvelet, directional, and sparse representations, vol 5914, pp 11.1–11.13
Peharz R, Pernkopf F (2012) Sparse nonnegative matrix factorization with \(\ell ^0\)-constraints. Neurocomput Spec Issue Mach Learn Signal Process 80(1):38–46
Bevilacqua M, Roumy A, Guillemot C, Morel MLA (2013) K-WEB: nonnegative dictionary learning for sparse image representations. In: Proceedings of the IEEE international conference on image processing
Shneier M, Abdel-Mottaleb M (1996) Exploiting the JPEG compression scheme for image retrieval. IEEE Trans Pattern Anal Mach Intell 18(8):849–853
Jacobs CE, Finkelstein A, Salesin DH (1995) Fast multi resolution image querying. In: Proceedings of the 22nd ACM annual conference on computer graphics and interactive techniques, pp 277–286
Zhou W, Sei-ichiro K (2013) Face recognition with learned local curvelet patterns and 2-directional l1-norm based 2DPCA. In: Proceedings of the 10th Asian conference on computer vision
Mallat S, Pennec EL (2005) Bandelet image approximation and compression. SIAM Multiscale Model Simul 4(3):992–1039
Mairal J, Bach F, Ponce J, Sapiro G (2010) Online learning for matrix factorization and sparse coding. J Mach Learn Res 11:19–60
Lu G, Teng S (1999) A novel image retrieval technique based on vector quantization. In: Proceedings of the international conference on computational intelligence for modelling, control and automation, pp 36–41
Belhumeur PN, Hespanha JP, Kriegman D (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720
Bartlett MS, Movellan JR, Sejnowski TJ (2002) Face recognition by independent component analysis. IEEE Trans Neural Netw 13(6):1450–1464
Wang N, Jingdong W, Yeung DY (2013) Online robust non-negative dictionary learning for visual tracking. In: Proceedings of IEEE international conference on computer vision, pp 657–664
Ross DA, Zemel RS (2006) Learning parts-based representations of data. J Mach Learn Res 7:2369–2397
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791
Lee H, Battle A, Raina R, Ng AY (2006) Efficient sparse coding algorithms. In: Advances in neural information processing systems, pp 801–808
Olshausen BA, Field DJ (1997) Sparse coding with an over complete basis set: a strategy employed by V1? Vis Res 37(23):3311–3325
Hoyer PO (2004) Non-negative matrix factorization with sparseness constraints. J Mach Learn Res 5:1457–1469
Lee DD, Seung HS (2000) Algorithms for non-negative matrix factorization. In: Proceedings of advances in neural information processing systems, pp 556–562
Kim H, Park H (2008) Non negative matrix factorization based on alternating non negativity constrained least squares and active set method. SIAM J Matrix Anal Appl 30(2):713–730
Lin CJ (2007) Projected gradient methods for non negative matrix factorization. Neural Comput 19(10):2756–2779
Mallat S, Zhang Z (1993) Matching pursuits with time–frequency dictionaries. IEEE Trans Signal Process 41:3397–3415
Pati YC, Rezaiifar R, Krishnaprasad PS (1993) Orthogonal matching pursuit: recursive function approximation with applications to wavelet decomposition. In: Proceedings of the twenty-seventh IEEE conference on signals, systems and computers, pp 40–44
Chen S, Donoho D, Saunders M (1998) Automatic decomposition by basis pursuit. SIAM J Sci Comput 1(3):33–61
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58(1):267–288
Gorodnitsky IF, Rao BD (1997) Sparse signal reconstruction from limited data using FOCUSS: a re-weighted minimum norm algorithm. IEEE Trans Signal Process 45(3):600–616
Aharon M, Elad M, Bruckstein A (2006) K-SVD: an algorithm for designing over complete dictionaries for sparse representation. IEEE Trans Signal Process 54(11):4311–4322
Patrik OH (2004) Non-negative matrix factorization with sparseness constraints. J Mach Learn Res 5:1457–1469
Nakayama H, Harada T, Kuniyoshi Y (2010) Dense sampling low-level statistics of local features. IEICE Trans Inf Syst 93(7):1727–1736
Nowak E, Jurie F, Triggs B (2006) Sampling strategies for bag-of-features image classification. In: Proceedings of the European conference on computer vision, pp 490–503
Langville AN, Meyer CD, Albright R, Cox J, Duling D (2006) Initializations for the non negative matrix factorization. In: Proceedings of the twelfth ACM SIGKDD international conference on knowledge discovery and data mining, pp 23–26
Rezaei M, Boostani R, Rezaei M (2011) An efficient initialization method for non negative matrix factorization. J Appl Sci 11(2):354–359
Jafari MG, Plumbley MD (2011) Fast dictionary learning for sparse representations of speech signals. J Sel Top Signal Process 5(5):1025–1031
Tropp J (2004) Greed is good: algorithmic results for sparse approximation. IEEE Trans Inf Theory 50(10):2231–2242
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, London
Vartak MN (1955) On an application of Kronecker product of matrices to statistical designs. Ann Math Stat 26(3):420–438
Armijo L (1966) Minimization of functions having Lipschitz continuous first partial derivatives. Pac J Math 16(1):1–3
Philbin J, Chum O, Isard M, Sivic J, Zisserman A (2007) Object retrieval with large vocabularies and fast spatial matching. In: Proceedings of IEEE conference on computer vision and pattern recognition, pp 1–8
Zhao Y, Hong R, Jiang J, Wen J, Zhang H (2013) Image matching by fast random sample consensus. In: Proceedings of the fifth international conference on internet multimedia computing and service, pp 159–162
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proceedings of the international conference on computer vision and pattern recognition, vol 2, pp 2169–2178
Zhang Y, Jia Z, Chen T (2011) Image retrieval with geometry-preserving visual phrases. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 809–816
Torralba A, Fergus R, Weiss Y (2008) Small codes and large image databases for recognition. In: Proceedings on computer vision and pattern recognition, pp 1–8
Jgou H, Douze M, Schmid C, Prez P (2010) Aggregating local descriptors into a compact image representation. In: Proceeding of IEEE conference on computer vision and pattern recognition (CVPR), pp 3304–3311
Perronnin F, Liu Y, Snchez J, Poirier H (2010) Large-scale image retrieval with compressed fisher vectors. In: Proceedings of IEEE conference on computer vision and pattern recognition (CVPR), pp 3384–3391
Chatfield K, Lempitsky V, Vedaldi A, Zisserman A (2011) The devil is in the details: an evaluation of recent feature encoding methods. In: Proceedings of the 22nd british machine vision conference (BMVC), pp 76.1–76.12
Tamura H, Mori S, Yamawaki T (1978) Textural features corresponding to visual perception. IEEE Trans Syst Man Cybern 8:460–472
Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proceedings of ninth IEEE international conference on computer vision, pp 1470–1477
Herve J, Matthijs D, Cordelia S (2008) Hamming embedding and weak geometric consistency for large scale image search. In: European conference on computer vision 2008 (ECCV 2008). Springer, Berlin, pp 304–317
Lindeberg T (1998) Feature detection with automatic scale selection. Int J Comput Vis 30(2):79–116
Mikolajczyk K, Schmid C (2004) Scale & affine invariant interest point detectors. Int J Comput Vis 60(1):63–86
Lowe DG (2004) Distinctive image features from scale-invariant key points. Int J Comput Vis 60(2):91–110
Tola E, Lepetit V, Fua P (2010) Daisy: an efficient dense descriptor applied to wide-baseline stereo. IEEE Trans Pattern Anal Mach Intell 32(5):815–830
Bouachir W, Kardouchi M, Belacel N (2009) Improving bag of visual words image retrieval: a fuzzy weighting scheme for efficient indexation. In: Proceedings of fifth IEEE international conference on signal-image technology & internet-based systems (SITIS), pp 215–220
Chum O, Philbin J, Zisserman A (2008) Near duplicate image detection: min-Hash and tf-idf weighting. In BMVC, vol 810, pp 812–815
Ke Y, Sukthankar R (2004) PCA-SIFT: a more distinctive representation for local image descriptors. In: Proceedings of the IEEE computer society conference on computer vision and pattern recognition (CVPR), vol 2, pp II-506
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Arun, K.S., Govindan, V.K. Optimizing visual dictionaries for effective image retrieval. Int J Multimed Info Retr 4, 165–185 (2015). https://doi.org/10.1007/s13735-015-0076-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13735-015-0076-1