Skip to main content
Log in

Simultaneous dimensionality reduction and dictionary learning for sparse representation based classification

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Learning dictionaries from the training data has led to promising results for pattern classification tasks. Dimensionality reduction is also an important issue for pattern classification. However, most existing methods perform dimensionality reduction (DR) and dictionary learning (DL) independently, which may result in not fully exploiting the discriminative information of the training data. In this paper, we propose a simultaneous dimensionality reduction and dictionary learning (SDRDL) model to learn a DR projection matrix and a class-specific dictionary (i.e., the dictionary atoms correspond to the class labels) simultaneously. Since simultaneously learning makes the learned projection and dictionary fit better with each other, more effective pattern classification can be achieved using the representation residual. In SDRDL model, not only the representation residual is discriminative, but the representation coefficients are also discriminative. Therefore, a classification scheme associated with SDRDL is presented by exploiting such discriminative information. Experimental results on a series of benchmark image databases show that our proposed method outperforms many state-of-the-art discriminative dictionary learning methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Aharon M, Elad M, Bruckstein A (2006) K-SVD: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54(1):4311–4322

    Article  Google Scholar 

  2. Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imag Sci 2(1):183–202

    Article  MathSciNet  MATH  Google Scholar 

  3. Belhumeur PN, Hespanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. Pattern Anal Mach Intell IEEE Trans 19(7):711–720

    Article  Google Scholar 

  4. Bengio S, Pereira F, Singer Y, Strelow D (2009) Group sparse coding. In: Proceedings of the Neural Information Processing Systems

  5. Boyd S, Vandenberghe L (2004) Convex optimization. Cambridge University Press, New York

    Book  MATH  Google Scholar 

  6. Bryt O, Elad M (2008) Compression of facial images using the k-svd algorithm. J Vis Commun Image Represent 19(4):270–282

    Article  Google Scholar 

  7. Cai S, Zuo W, Zhang L, Feng X, Wang P (2014) Support vector guided dictionary learning. In: Computer Vision–ECCV. pp 624–639

  8. Candès EJ et al (2006) Compressive sampling. In: Proceedings of the international congress of mathematicians, vol. 3. Madrid, Spain, pp 1433–1452

  9. Castrodad A, Sapiro G (2012) Sparse modeling of human actions from motion imagery. Int J Comput Vis 100:1–15

    Article  Google Scholar 

  10. Elad M, Aharon M Image denoising via sparse and redundant representations over learned dictionaries. IEEE Trans Image Process 15(12):3736–3745

  11. Elad M, Aharon M (2006) Image denoising via learned dictionaries and sparse representation. In: Computer Vision and Pattern Recognition, vol. 1. pp 895–900

  12. Feng Z, Yang M, Zhang L, Liu Y, Zhang D (2013) Joint discriminative dimensionality reduction and dictionary learning for face recognition. Pattern Recogn 46(8):2134–2143

    Article  Google Scholar 

  13. Georghiades A, Belhumeur P, Kriegman D (2001) From few to many: illumination cone models for face recognition under variable lighting and pose. IEEE Trans Pattern Anal Mach Intell 23(6):643–660

    Article  Google Scholar 

  14. Guha T, Ward RK (2012) Learning sparse representations for human action recognition. IEEE Trans Pattern Anal Mach Learn 34(8):1576–1888

    Article  Google Scholar 

  15. Hoyer PO (2002) Non-negative sparse coding. In: Proceedings of the IEEE Workshop Neural Networks for Signal Processing

  16. Huang K, Aviyente S (2006) Sparse representation for signal classification. In: Advances in neural information processing system. pp 609–616

  17. Jenatton R, Mairal J, Obozinski G, Bach F (2011) Proximal methods for hierarchical sparse coding. J Mach Learn Res 12:2234–2297

    MathSciNet  MATH  Google Scholar 

  18. Jiang ZL, Zhang GX, Davis LS (2012) Submodular dictionary learning for sparse coding. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition

  19. Jiang ZL, Lin Z, Davis LS (2013) Label consistent K-SVD: learning a discriminative dictionary for recognition. IEEE Trans Pattern Anal Mach Intell 34:533

    Article  Google Scholar 

  20. Kong S, Wang DH (2012) A dictionary learning approach for classification: Separating the particularity and the commonality. In: Proceedings of the European Conference on Computer Vision

  21. Mairal J, Elad M, Sapiro G (2008a) Sparse representation for color image restoration. Image Process IEEE Trans 17(1):53–69

    Article  MathSciNet  MATH  Google Scholar 

  22. Mairal J, Bach F, Ponce J, Sapiro G, Zissserman A (2008b) Learning discriminative dictionaries for local image analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  23. Mairal J, Leordeanu M, Bach F, Hebert M, Ponce J (2008c) Discriminative sparse image models for class-specific edge detection and image interpretation. In: Proceedings of the European Conference on Computer Vision

  24. Mairal J, Bach F, Ponce J, Sapiro G, Zisserman A (2009) Supervised dictionary learning. In: Proceedings of the Neural Information and Processing Systems

  25. Mairal J, Bach F, Ponce J (2012) Task-driven dictionary learning. IEEE Trans Pattern Anal Mach Intell 34(4):791–804

    Article  Google Scholar 

  26. Martinez A, Benavente R (1998) The AR face database, CVC Technical Report 24

  27. Niyogi X (2004) Locality preserving projections. In: Neural information processing systems, vol. 16. MIT, p 153

  28. Olshausen BA, Field DJ (1997) Sparse coding with an overcomplete basis set: a strategy employed by v1? Vis Res 37(23):3311–3325

    Article  Google Scholar 

  29. Olshausen BA et al (1996) Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381(6583):607–609

    Article  Google Scholar 

  30. Petrou M, Bosdogianni P (1999) Image processing: the fundamentals. Wiley

  31. Pham D, Venkatesh S (2008) Joint learning and dictionary construction for pattern recognition. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition

  32. Qiu Q, Jiang ZL, Chellappa R (2011) Sparse dictionary-based representation and recognition of action attributes. In: Proceedings of the International Conference on Computer Vision

  33. Ramirez I, Sprechmann P, Sapiro G (2010) Classification and clustering via dictionary learning with structured incoherence and shared features. In: Computer Vision and Pattern Recognition (CVPR), IEEE Conference on. IEEE, 2010, pp 3501–3508

  34. Rodriguez F, Sapiro G (2007) Sparse representation for image classification: Learning discriminative and reconstructive nonparametric dictionaries. Preprint: IMA, p 2213

  35. Rodriguez M, Ahmed J, Shah M (2008) A spatio-temporal maximum average correlation height filter for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  36. Sadanand S, Corso JJ (2012) Action bank: a high-level representation of activeity in video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  37. Sprechmann P, Sapiro G (2010) Dictionary learning and sparse coding for unsupervised clustering. In: Proceedings of the International Conference on Acoustics Speech and Signal Processing

  38. Szabo Z, Poczos B, Lorincz A (2011) Online group-structured dictionary learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  39. Turk M, Pentland AP et al (1991) Face recognition using eigenfaces. In: Computer Vision and Pattern Recognition. pp. 586–591

  40. Wagner A, Wright J, Ganesh A, Zhou Z, Mobahi H, Ma Y (2012) Toward a practical face recognition system: robust alignment and illumination by sparse representation. Pattern Anal Mach Intell IEEE Trans 34(2):372–386

    Article  Google Scholar 

  41. Wang HR, Yuan CF, Hu WM, Sun CY (2012) Supervised class-specific dictionary learning for sparse modeling in action recognition. Pattern Recogn 45(11):3902–3911

    Article  Google Scholar 

  42. Wright J, Yang AY, Ganesh A, Sastry SS, Ma Y (2009a) Robust face recognition via sparse representation. Pattern Anal Mach Intell IEEE Trans 31(2):210–227

    Article  Google Scholar 

  43. Wright JS, Nowak DR, Figueiredo TAM (2009b) Sparse reconstruction by separable approximation. IEEE Trans Signal Process 57(7):2479–2493

    Article  MathSciNet  Google Scholar 

  44. Wu YN, Si ZZ, Gong HF, Zhu SC (2010) Learning active basis model for object detection and recognition. Int J Comput Vis 90:198–235

    Article  MathSciNet  Google Scholar 

  45. Yang M, Zhang L (2010) Gabor feature based sparse representation for face recognition with gabor occlusion dictionary. In: Computer Vision–ECCV 2010. Springer, pp 448–461

  46. Yang JC, Yu K, Huang T (2010a) Supervised translation-invariant sparse coding. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition

  47. Yang M, Zhang L, Yang J, Zhang D (2010b) Metaface learning for sparse representation based face recognition. In: Proceedings of the IEEE Conference on Image Processing

  48. Yang M, Zhang L, Yang J, Zhang D (2011a) Robust sparse coding for face recognition. In: Computer Vision and Pattern Recognition (CVPR). pp 625–632

  49. Yang M, Zhang L, Feng XC, Zhang D (2011b) Fisher discrimination dictionary learning for sparse representatio. In: Proceedings of the International Conference on Computer Vision

  50. Yang M, Zhang L, Feng XC, Zhang D (2014) Sparse representation based fisher discrimination dictionary learning for image classification. Int J Comput Vis 109(3):209–232

    Article  MathSciNet  MATH  Google Scholar 

  51. Yao A, Gall J, Gool LV (2010) A hough transform-based voting framework for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  52. Zhang Q, Li B (2010) Discriminative k-svd for dictionary learning in face recognition. In: Computer Vision and Pattern Recognition (CVPR). pp 2691–2698

  53. Zhang L, Yang M, Feng Z, Zhang D (2010) On the dimensionality reduction for sparse representation based face recognition. In: Pattern Recognition (ICPR), 2010 20th International Conference on IEEE. pp 1237–1240

  54. Zhou N, Fan JP (2012) Learning inter-related visual dictionary for object recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  55. Zhou MY, Chen HJ, Paisley J, Ren L, Li LB, Xing ZM et al (2012) Nonparametric Bayesian dictionary learning for analysis of noisy and incomplete images. IEEE Trans Image Process 21(1):130–144

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

This work was supported by National Instrument Development Special Program of China under the grants 2013YQ03065101, 2013YQ03065105, Ministry of Science and Technology of China under National Basic Research Project under the grants 2010CB731803, and by National Natural Science Foundation of China under the grants 61221003, 61290322, 61174127, 61273181, 60934003, 61290322, 61503243 and U1405251, the Program of New Century Talents in University of China under the grant NCET-13-0358, the Science and Technology Commission of Shanghai Municipal, China under the grant 13QA1401900, Postdoctoral Science Foundation of China under the grants 2014 M551406.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Bao-Qing Yang or Chao-Chen Gu.

Appendix

Appendix

φ i (Z i ) is convex and continuously differentiable with Lipschitz continuous gradient L(φ i ):

$$ \left\Vert \nabla {\varphi}_i(x)-\nabla {\varphi}_i(y)\right\Vert \le L\left({\varphi}_i\right)\left\Vert x-y\right\Vert, \forall x,y\in {R}^n. $$
(21)

where ‖ ⋅ ‖ denotes the standard Euclidean norm and L(φ i ) > 0 is the Lipschitz constant of ∇φ i .

In Eq. (10)

$$ {\varphi}_i\left({Z}_i\right)={\left\Vert {B}_i-D{Z}_i\right\Vert}_F^2+{\left\Vert {B}_i-{D}_i{Z}_i^i\right\Vert}_F^2+{\displaystyle {\sum}_{j\ne i}{\left\Vert {D}_j{Z}_i^j\right\Vert}_F^2}+{\lambda}_2{\displaystyle {\sum}_{j\ne i}{\left\Vert {\tilde{Z}}_j^T{Z}_i\right\Vert}_F^2} $$
(22)

Let Z i i  = P i Z i and Z j i  = P j Z i where P i(P j) are projection matrixes which keeps components of Z i (Z j ) associated with D i (D j ) unchanged but sets other components to be zero. Hence, we can rewrite Eq. (22) as:

$$ {\varphi}_i\left({Z}_i\right)={\left\Vert {B}_i-D{Z}_i\right\Vert}_F^2+{\left\Vert {B}_i-D{P}^i{Z}_i\right\Vert}_F^2+{\displaystyle {\sum}_{j\ne i}{\left\Vert D{P}^j{Z}_i\right\Vert}_F^2}+{\lambda}_2{\displaystyle {\sum}_{j\ne i}{\left\Vert {\tilde{Z}}_j^T{Z}_i\right\Vert}_F^2} $$
(23)

Let DP i = D i and DP j = D j. Equation (23) equals to:

$$ {\varphi}_i\left({Z}_i\right)={\left\Vert {B}_i-D{Z}_i\right\Vert}_F^2+{\left\Vert {B}_i-{D}^i{Z}_i\right\Vert}_F^2+{\displaystyle {\sum}_{j\ne i}{\left\Vert {D}^j{Z}_i\right\Vert}_F^2}+{\lambda}_2{\displaystyle {\sum}_{j\ne i}{\left\Vert {\tilde{Z}}_j^T{Z}_i\right\Vert}_F^2} $$
(24)

The stacking operator introduced in [30] can be used to write B i and Z i as a column vector. We form \( {\varPsi}_i={\left[{b}_{i,1},{b}_{i,2},\cdots, {b}_{i,{n}_i}\right]}^T \), \( {\chi}_i={\left[{z}_{i,1},{z}_{i,2},\cdots, {z}_{i,{n}_i}\right]}^T \) where a i,i , z i,i  ∈ R m × 1 and thus \( {\varPsi}_i,{\chi}_i\in {R}^{\left(m\cdot {n}_i\right)\times 1} \). Hence, Eq. (24) can be rewrite as:

$$ \begin{array}{l}{\varphi}_i\left({\chi}_i\right)={\left\Vert {\varPsi}_i-\mathrm{diag}(D){\chi}_i\right\Vert}_2^2+{\left\Vert {\varPsi}_i-\mathrm{diag}\left({D}^i\right){\chi}_i\right\Vert}_2^2+{\displaystyle {\sum}_{j\ne i}{\left\Vert \mathrm{diag}\left({D}^j\right){\chi}_i\right\Vert}_2^2}+\hfill \\ {}\kern4em {\lambda}_2{\displaystyle {\sum}_{j\ne i}{\left\Vert \mathrm{diag}\left({\tilde{\chi}}_j^T\right){\chi}_i\right\Vert}_2^2}\hfill \end{array} $$
(25)

where diag(T) is a block diagonal matrix with each block on the diagonal being matrix T. And also φ i (χ i ) equals to:

$$ \begin{array}{c}\hfill 2{\varPsi}_i^T{\varPsi}_i-2{\varPsi}_i^T\left(\mathrm{diag}(D)+\mathrm{diag}\left({D}^i\right)\right){\chi}_i+{\chi}_i^T\left(\mathrm{diag}\left({D}^TD\right)+\mathrm{diag}\left({D^i}^T{D}^i\right)\right.+\hfill \\ {}\hfill\ \left.\mathrm{diag}\left({\displaystyle {\sum}_{j\ne i}{D^j}^T{D}^j}\right)+{\lambda}_2\mathrm{diag}\left({\displaystyle {\sum}_{j\ne i}{\tilde{\chi}}_j{\tilde{\chi}}_j^T}\right)\right){\chi}_i\hfill \end{array} $$
(26)

The convexity of φ i (χ i ) depends on its Hessian matrix ∇2 φ i (χ i ) is whether positive semi-definite or not [5]. We could write the Hessian matrix of φ i (χ i ) as:

$$ \begin{array}{c}\hfill {\nabla}^2{\varphi}_i\left({\chi}_i\right)=2\mathrm{diag}\left({D}^TD\right)+2\mathrm{diag}\left({D^i}^T{D}^i\right)+2\mathrm{diag}\left({\displaystyle {\sum}_{j\ne i}{D^j}^T{D}^j}\right)+\hfill \\ {}\hfill 2{\lambda}_2\mathrm{diag}\left({\displaystyle {\sum}_{j\ne i}{\tilde{\chi}}_j{\tilde{\chi}}_j^T}\right)\hfill \end{array} $$
(27)

Since diag(D T D), diag(D iT D i), diag(∑ j ≠ i D jT D j) and \( \mathrm{diag}\left({\displaystyle {\sum}_{j\ne i}{\tilde{\chi}}_j{\tilde{\chi}}_j^T}\right) \) are all Hermite matrix, they are all positive semi-definite. Therefore, Hessian matrix ∇2 φ i (χ i ) is positive semi-definite. Based on this, we claim that φ i (χ i ) is a convex function.

Via Eq. (26), we have:

$$ \begin{array}{l}\nabla {\varphi}_i\left({\chi}_i\right)=-2{\varPsi}_i^T\left(\mathrm{diag}(D)+\mathrm{diag}\left({D}^i\right)\right)+2\left(\mathrm{diag}\left({D}^TD\right)+\mathrm{diag}\left({D^i}^T{D}^i\right) + \right.\hfill \\ {}\left.\kern5em \mathrm{diag}\left({\displaystyle {\sum}_{j\ne i}{D^j}^T{D}^j}\right)+{\lambda}_2\mathrm{diag}\left({\displaystyle {\sum}_{j\ne i}{\tilde{\chi}}_j{\tilde{\chi}}_j^T}\right)\right){\chi}_i\hfill \end{array} $$
(28)

From Eq. (28), we can easy see that ∇φ i (χ i ) is continuously differentiable to χ i . And via Eq. (28), we have:

$$ \begin{array}{c}\hfill \nabla {\varphi}_i(x)-\nabla {\varphi}_i(y)=2\left(\mathrm{diag}\left({D}^TD\right)+\mathrm{diag}\left({D^i}^T{D}^i\right)+\mathrm{diag}\left({\displaystyle {\sum}_{j\ne i}{D^j}^T{D}^j}\right)+\right.\hfill \\ {}\hfill \left.{\lambda}_2\mathrm{diag}\left({\displaystyle {\sum}_{j\ne i}{\tilde{\chi}}_j{\tilde{\chi}}_j^T}\right)\right)\left(x-y\right)\hfill \end{array} $$
(29)

Hence, we obtain:

$$ \begin{array}{c}\hfill \left\Vert \nabla {\varphi}_i(x)-\nabla {\varphi}_i(y)\right\Vert =\left\Vert 2\left(\mathrm{diag}\left({D}^TD\right)+\mathrm{diag}\left({D^i}^{{}^T}{D}^i\right)+\mathrm{diag}\left({\displaystyle {\sum}_{j\ne i}{D^j}^{{}^T}{D}^j}\right)+\right.\right.\hfill \\ {}\hfill \left.\left.\ {\lambda}_2\mathrm{diag}\left({\displaystyle {\sum}_{j\ne i}{\tilde{\chi}}_j{\tilde{\chi}}_j^T}\right)\right)\left(x-y\right)\right\Vert \hfill \\ {}\hfill \le 2\left(\mathrm{diag}\left({D}^TD\right)+\mathrm{diag}\left({D^i}^{{}^T}{D}^i\right)+ diag\left({\displaystyle {\sum}_{j\ne i}{D^j}^{{}^T}{D}^j}\right)+\right.\hfill \\ {}\hfill \left.{\lambda}_2\mathrm{diag}\left({{\displaystyle {\sum}_{j\ne i}\tilde{\chi}}}_j{\tilde{\chi}}_j^T\right)\right)\left\Vert \left\Vert \left(x-y\right)\right\Vert \right.\hfill \\ {}\hfill \le 2\ \left\Vert {\lambda}_{max}^1+{\lambda}_{max}^2+{\lambda}_{max}^3+{\lambda}_2{\lambda}_{max}^4\right\Vert \left\Vert \left(x-y\right)\right\Vert \hfill \end{array} $$
(30)

where λ 1 max  = λ max(diag(D T D)), λ 2 max  = λ max(diag(D iT D i)), λ 3 max  = λ max(diag(∑ j ≠ i D jT D j)) and \( {\lambda}_{\max}^4={\lambda}_{\max}\left(\mathrm{diag}\left({\displaystyle {\sum}_{j\ne i}{\tilde{\chi}}_j{\tilde{\chi}}_j^T}\right)\right) \). So the (smallest) Lipschitz constant of the gradient ∇φ i (χ i ) is L(φ i ) = 2(λ 1 max  + λ 2 max  + λ 3 max  + λ 2 λ 4 max ).

Therefore, we claim that φ i (Z i ) is continuously differentiable with Lipschitz continuous gradient L(φ i ).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, BQ., Gu, CC., Wu, KJ. et al. Simultaneous dimensionality reduction and dictionary learning for sparse representation based classification. Multimed Tools Appl 76, 8969–8990 (2017). https://doi.org/10.1007/s11042-016-3492-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-3492-1

Keywords

Navigation