Efficient histogram dictionary learning for text/image modeling and classification

Kim, Minyoung

doi:10.1007/s10618-016-0461-2

Efficient histogram dictionary learning for text/image modeling and classification

Published: 26 April 2016

Volume 31, pages 203–232, (2017)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Minyoung Kim¹

727 Accesses
4 Citations
Explore all metrics

Abstract

In dealing with text or image data, it is quite effective to represent them as histograms. In modeling histograms, although recent Bayesian topic models such as latent Dirichlet allocation and its variants are shown to be successful, they often suffer from computational overhead for inference of a large number of hidden variables. In this paper we consider a different modeling strategy of forming a dictionary of base histograms whose convex combination yields a histogram of observable text/image document. The dictionary entries are learned from data, which establishes direct/indirect association between specific topics/keywords and the base histograms. From a learned dictionary, the coding of an observed histogram can provide succinct and salient information useful for classification. One of our main contributions is that we propose a very efficient dictionary learning algorithm based on the recent Nesterov’s smooth optimization technique in conjunction with analytic solution methods for quadratic minimization sub-problems. Not alone the faster theoretical convergence rate, also in real time, our algorithm is 20–30 times faster than general-purpose optimizers such as interior-point methods. In classification/annotation tasks on several text/image datasets, our approach exhibits comparable or often superior performance to existing Bayesian models, while significantly faster than their variational inference.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

Typically, each visual codeword corresponds to a particular cluster in the feature space after clustering all features from data.
Hence, hereafter we often abuse the term document to refer to both text-based document and image of visual codewords.
The distance between two histogram vectors \(\mathbf{h}\) and \(\mathbf{h}'\) is defined as: \(d_{\chi ^2}(\mathbf{h},\mathbf{h}') = \sum _j \frac{(h_j-h'_j)^2}{h_j+h'_j}\) or \(d_{L_2}(\mathbf{h},\mathbf{h}')=\sum _j (h_j-h'_j)^2\).
Here we typically assume \(M \ll V\).
We will see soon how this can be explicitly formulated.
This aims to find approximate factorization of data matrix \(\mathbf{H} = [\mathbf{h}_1,\dots ,\mathbf{h}_n] \approx \mathbf{X} \cdot \mathbf{A}\) (hence, of rank M) where \(\mathbf{A} = [{\varvec{\alpha }}_1, \dots , {\varvec{\alpha }}_n]\).
\(f(y) \le f(x) + \nabla f(x)^{\top }(y-x) + \frac{L}{2} \Vert y-x\Vert _2^2\). It can be easily shown from (8).
The stopping criterion in the algorithm is when the relative change in the iterates or the objective values is below some threshold (e.g., \(10^{-4}\)).
We use fmincon() in Matlab that implements the algorithm.
Specifically, we apply the CVX Matlab package (CVX Research Inc 2012; Grant and Boyd 2008).
Available from http://www.cs.princeton.edu/~blei/lda-c/.
More extensive results on running times are demonstrated in the next sections.
We use the C++ implementation publicly available from http://www.cs.cmu.edu/~chongw/slda/.
Avaiable at http://new-labelme.csail.mit.edu/Release3.0/.
The SLDA model of Wang et al. (2009) can either ignore or exploit the annotation terms in learning. In this paper we only test with the former model not only because there is less significant improvement with the annotation information as reported in Wang et al. (2009), but also due to the unavailability of the codes for the latter model.
Refer to the supplemental material for the performance of fairly standard existing approaches including Gaussian mixtures and (sparse) non-negative matrix factorization.
Available from http://vision.stanford.edu/lijiali/event_dataset/.
In the supplemental material, we also show the performance of fairly standard existing approaches including Gaussian mixtures and (sparse) non-negative matrix factorization.
http://www.image-net.org/.

References

Aharon M, Elad M, Bruckstein AM (2005) K-svd and its non-negative variant for dictionary design. In: Proceedings of the SPIE conference wavelets, pp 327–339
Asuncion A, Newman D (2007) UCI machine learning repository
Bach F, Mairal J, Ponce J (2012) Task-driven dictionary learning. IEEE Trans Pattern Anal Mach Intell 34(4):791–804
Article Google Scholar
Barla A, Odone F, Verri A (2003) Histogram intersection kernel for image classification. In: International conference on image processing
Bayón L, Grau JM, Suárez PM (2002) A new formulation of the equivalent thermal in optimization of hydrothermal systems. Math Prob Eng 8(3):181–196
Article MathSciNet MATH Google Scholar
Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imaging Sci 2(1):183–202
Article MathSciNet MATH Google Scholar
Blei D, Jordan M (2003) Modeling annotated data. In: ACM SIGIR conference
Blei D, McAuliffe J (2007) Supervised topic models. In: Neural information processing systems
Blei D, Ng A, Jordan M (2003) Latent dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Bolovinou A, Pratikakis I, Perantonis S (2012) Bag of spatio-visual words for context inference in scene classification. Pattern Recognit. doi:10.1016/j.patcog.2012.07.024
Bosch A, Zisserman A, Munoz X (2006) Scene classification via pLSA. In: European conference on computer vision
Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, Cambridge
Book MATH Google Scholar
Chen SS, Donoho DL, Saunders MA (1998) Atomic decomposition by basis pursuit. SIAM J Sci Comput 20(1):33–61
Article MathSciNet MATH Google Scholar
Coates A, Lee H, Ng AY (2011) An analysis of single layer networks in unsupervised feature learning. In: International conference on Artificial Intelligence and Statistics (AISTATS)
Coleman TF, Li Y (1996) A reflective Newton method for minimizing a quadratic function subject to bounds on some of the variables. SIAM J Optim 6(4):1040–1058
Article MathSciNet MATH Google Scholar
CVX Research Inc. (2012) CVX: Matlab software for disciplined convex programming, version 2.0 beta. http://cvxr.com/cvx
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32(2):407–499
Article MathSciNet MATH Google Scholar
Fei-Fei L, Perona P (2005) A Bayesian hierarchical model for learning natural scene categories. In: IEEE international conference on computer vision and pattern recognition
Gill PE, Murray W, Wright MH (1981) Pract Optim. Academic Press, London
Google Scholar
Grant M, Boyd S (2008) Graph implementations for nonsmooth convex programs., Recent Advances in Learning and ControlSpringer, London, pp 95–110
MATH Google Scholar
Ho ND, Dooren PV (2008) Non-negative matrix factorization with fixed row and column sums. Linear Algebra Appl 429(5–6):1020–1025
Article MathSciNet MATH Google Scholar
Hofmann T (1999) Probabilistic latent semantic analysis. In: Proceedings of uncertainty in artificial intelligence
Hoyer PO (2004) Non-negative matrix factorization with sparseness constraints. J Mach Learn Res 5:1457–1469
MathSciNet MATH Google Scholar
Kiros R, Szepesvári C (2012) Deep representations and codes for image auto-annotation. In: Advances in Neural Information Processing Systems (NIPS)
Kreutz-Delgado K, Murray JF, Rao BD, Engan K, Lee TW, Sejnowski TJ (2003) Dictionary learning algorithms for sparse representation. Neural Comput 15(2):349–396
Article MATH Google Scholar
Li LJ, Fei-Fei L (2007) What, where and who? classifying event by scene and object recognition. In: IEEE International Conference on Computer Vision
Lowe DG (2004) Distinctive image features from scale-invariant keypoints. Int J Computer Vision 60(2):91–110
Article Google Scholar
Mairal J, Bach F, Ponce J, Sapiro G (2009) Online dictionary learning for sparse coding. In: International conference on machine learning
Nesterov Y (2005) Smooth minimization of non-smooth functions. Math Prog 103(1):127–152
Article MathSciNet MATH Google Scholar
Osborne MR, Presnell B, Turlach B (2000) A new approach to variable selection in least squares problems. IMA J Numer Anal 20(3):389–403
Article MathSciNet MATH Google Scholar
Pele O, Werman M (2010) The quadratic-chi histogram distance family. In: European conference on computer vision
Perkins S, Theiler J (2003) Online feature selection using grafting. In: International conference on machine learning (ICML)
Platt J (1999) Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Smola AJ, Barlett P, Schölkopf B, Schuurmans D (eds) Advances in Large Margin Classifiers. MIT Press, Cambridge
Google Scholar
Polyak BT (1987) Introduction to optimization. Optimization Software Inc., New York
MATH Google Scholar
Rosset S (2004) Tracking curved regularized optimization solution paths. In: In Advances in Neural Information Processing Systems. MIT Press
Rubinstein R, Bruckstein A, Elad M (2010) Dictionaries for sparse representation modeling. Proc IEEE 98(6):1045–1057
Article Google Scholar
Rubner Y, Tomasi C, Guibas L (2000) The earth mover’s distance as a metric for image retrieval. Int J Computer Vision 40(2):99–121
Article MATH Google Scholar
Russell B, Torralba A, Murphy K, Freeman WT (2008) LabelMe: a database and web-based tool for image annotation. Int J Comput Vision 77(1–3):157–173
Article Google Scholar
Sindhwani V, Ghoting A (2012) Large-scale distributed non-negative sparse coding and sparse dictionary learning. In: International conference on knowledge discovery and data mining
Thurau C, Kersting K, Bauckhage C (2009) Convex non-negative matrix factorization in the wild. In: International conference on data mining
Tibshirani R (1994) Regression shrinkage and selection via the lasso. J R Stat Soc B 58:267–288
MathSciNet MATH Google Scholar
Tosic I, Frossard P (2011) Dictionary learning. IEEE Signal Proc Mag 28(2):27–38
Article Google Scholar
Wang C, Blei DM, Fei-Fei L (2009) Simultaneous image classification and annotation. In: IEEE international conference on computer vision and pattern recognition
Wang Y, Jia Y, Hu C, Turk M (2004) Fisher non-negative matrix factorization for learning local features. In: Asian conference on computer vision
Yang AY, Zhou Z, Ganesh A, Sastry SS, Ma Y (2013) Fast \(l_1\)-minimization algorithms for robust face recognition. IEEE Trans Image Proc 22(8):3234–3246
Article Google Scholar

Download references

Acknowledgments

This study is supported by National Research Foundation of Korea (NRF-2013R1A1A1076101).

Author information

Authors and Affiliations

Department of Electronics and IT Media Engineering, Seoul National University of Science & Technology, Seoul, South Korea
Minyoung Kim

Authors

Minyoung Kim
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Minyoung Kim.

Additional information

Responsible editor: Bing Liu.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 135 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, M. Efficient histogram dictionary learning for text/image modeling and classification. Data Min Knowl Disc 31, 203–232 (2017). https://doi.org/10.1007/s10618-016-0461-2

Download citation

Received: 19 January 2014
Accepted: 06 April 2016
Published: 26 April 2016
Issue Date: January 2017
DOI: https://doi.org/10.1007/s10618-016-0461-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient histogram dictionary learning for text/image modeling and classification

Abstract

Access this article

Similar content being viewed by others

A fast and effective image retrieval scheme using color-, texture-, and shape-based histograms

Learning Binary Hash Codes for Large-Scale Image Search

Classifying Textures with Only 10 Visual-Words Using Hidden Markov Models with Dirichlet Mixtures

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (pdf 135 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficient histogram dictionary learning for text/image modeling and classification

Abstract

Access this article

Similar content being viewed by others

A fast and effective image retrieval scheme using color-, texture-, and shape-based histograms

Learning Binary Hash Codes for Large-Scale Image Search

Classifying Textures with Only 10 Visual-Words Using Hidden Markov Models with Dirichlet Mixtures

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (pdf 135 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation