Skip to main content
Log in

Sparse Representation Based Fisher Discrimination Dictionary Learning for Image Classification

  • Published:
International Journal of Computer Vision Aims and scope Submit manuscript

Abstract

The employed dictionary plays an important role in sparse representation or sparse coding based image reconstruction and classification, while learning dictionaries from the training data has led to state-of-the-art results in image classification tasks. However, many dictionary learning models exploit only the discriminative information in either the representation coefficients or the representation residual, which limits their performance. In this paper we present a novel dictionary learning method based on the Fisher discrimination criterion. A structured dictionary, whose atoms have correspondences to the subject class labels, is learned, with which not only the representation residual can be used to distinguish different classes, but also the representation coefficients have small within-class scatter and big between-class scatter. The classification scheme associated with the proposed Fisher discrimination dictionary learning (FDDL) model is consequently presented by exploiting the discriminative information in both the representation residual and the representation coefficients. The proposed FDDL model is extensively evaluated on various image datasets, and it shows superior performance to many state-of-the-art dictionary learning methods in a variety of classification tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. Illuminations {0,1,3,4, 6,7,8,11,13,14,16,17,18,19}.

  2. Illuminations {0,2,4,6,8,10,12,14,16,18}.

References

  • Aharon, M., Elad, M., & Bruckstein, A. (2006). K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing, 54(1), 4311–4322.

    Article  MathSciNet  Google Scholar 

  • Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1), 183–202.

    Article  MATH  MathSciNet  Google Scholar 

  • Bengio, S., Pereira, F., Singer, Y., & Strelow, D. (2009). Group sparse coding. In Proceedings of the Neural Information Processing Systems

  • Bobin, J., Starck, J., Fadili, J., Moudden, Y., & Donoho, D. (2007). Morphological component analysis: An adaptive thresholding strategy. IEEE Transactions on Image Processing, 16(11), 2675–2681.

    Article  MATH  MathSciNet  Google Scholar 

  • Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge university press.

    Book  MATH  Google Scholar 

  • Bryt, O., & Elad, M. (2008). Compression of facial images using the K-SVD algorithm. Journal of Visual Communication and Image Representation, 19(4), 270–282.

    Article  Google Scholar 

  • Candes, E. (2006). Compressive sampling. International Congress of Mathematicians, 3, 1433–1452.

    MathSciNet  Google Scholar 

  • Castrodad, A., & Sapiro, G. (2012). Sparse modeling of human actions from motion imagery. International Journal of Computer Vision, 100, 1–15.

    Article  Google Scholar 

  • Cooley, J. W., & Tukey, J. W. (1965). An algorithm for the machine calculation of complex Fourier series. Mathematics of Computation, 19, 297–301.

    Article  MATH  MathSciNet  Google Scholar 

  • Deng, W. H., Hu, J. N., & Guo, J. (2012). Extended SRC: Undersampled face recognition via intraclass variation dictionary. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9), 1864–1870.

    Article  Google Scholar 

  • Duda, R., Hart, P., & Stork, D. (2000). Pattern classification (2nd ed.). New York: Wiley-Interscience.

    Google Scholar 

  • Elad, M., & Aharon, M. (2006). Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing, 15(12), 3736–3745.

    Article  MathSciNet  Google Scholar 

  • Engan, K., Aase, S. O., & Husoy, J. H. (1999). Method of optimal directions for frame design. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing

  • Fernando, B., Fromont, E., & Tuytelaars, T. (2012). Effective use of frequent itemset mining for image classification. In: Proceedings of the European Conference Computer Vision

  • Gehler, P., & Nowozin, S. (2009). On feature combination for multiclass object classification. In: Proceedings of the International Conference Computer Vision

  • Georghiades, A., Belhumeur, P., & Kriegman, D. (2001). From few to many: Illumination cone models for face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6), 643–660.

    Article  Google Scholar 

  • Gross, R., Matthews, I., Cohn, J., Kanade, T., & Baker, S. (2010). Multi-PIE. Image and Vision Computing, 28, 807–813.

    Article  Google Scholar 

  • Guha, T., & Ward, R. K. (2012). Learning sparse representations for human action recognition. IEEE Transactions on Pattern Analysis and Machine Learning, 34(8), 1576–1888.

    Article  Google Scholar 

  • Guo, Y., Li, S., Yang, J., Shu, T., & Wu, L. (2003). A generalized Foley–Sammon transform based on generalized Fisher discrimination criterion and its application to face recognition. Pattern Recognition Letter, 24(1), 147–158.

    Article  MATH  Google Scholar 

  • Hoyer, P. O. (2002). Non-negative sparse coding. In: Proceedings of the IEEE Workshop Neural Networks for Signal Processing

  • Huang, K., & Aviyente, S. (2006). Sparse representation for signal classification. In: Proceedings of the Neural Information and Processing Systems

  • Hull, J. J. (1994). A database for handwritten text recognition research. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(5), 550–554.

    Article  Google Scholar 

  • Jenatton, R., Mairal, J., Obozinski, G., & Bach, F. (2011). Proximal methods for hierarchical sparse coding. Journal of Machine Learning Research, 12, 2234–2297.

    Google Scholar 

  • Jia, Y. Q., Nie, F. P., & Zhang, C. S. (2009). Trace ratio problem revisited. IEEE Transactions on Neural Network, 20(4), 729–735.

    Article  Google Scholar 

  • Jiang, Z. L., Lin, Z., & Davis, L. S. (2013). abel consistent K-SVD: Learning a discriminative dictionary for recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34, 533.

    Article  Google Scholar 

  • Jiang, Z. L., Zhang, G. X., & Davis, L. S. (2012). Submodular dictionary learning for sparse coding. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition

  • Kim, S. J., Koh, K., Lustig, M., Boyd, S., & Gorinevsky, D. (2007). A interior-point method for large-scale \(l_{1}\)-regularized least squares. IEEE Journal on Selected Topics in Signal Processing, 1, 606–617.

    Article  Google Scholar 

  • Kong, S., & Wang, D. H. (2012). A dictionary learning approach for classification: Separating the particularity and the commonality. In: Proceedings of the European Conference on Computer Vision.

  • Li, H., Jiang, T., & Zhang, K. (2006). Efficient and robust feature extraction by maximum margin criterion. IEEE Transactions on Neural Network, 17(1), 157–165.

    Article  Google Scholar 

  • Lian, X. C., Li, Z. W., Lu, B. L., & Zhang, L. (2010). Max-Margin Dictionary Learning for Multi-class Image Categorization. In: Proceedings of the European Conference on Computer Vision

  • Mairal, J., Bach, F., & Ponce, J. (2012). Task-driven dictionary learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(4), 791–804.

    Article  Google Scholar 

  • Mairal, J., Bach, F., Ponce, J., Sapiro, G., & Zissserman, A. (2008b). Learning discriminative dictionaries for local image analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  • Mairal, J., Bach, F., Ponce, J., Sapiro, G., & Zisserman, A. (2009). Supervised dictionary learning. In: Proceedings of the Neural Information and Processing Systems

  • Mairal, J., Elad, M., & Sapiro, G. (2008a). Sparse representation for color image restoration. IEEE Transactions on Image Processing, 17(1), 53–69.

    Article  MathSciNet  Google Scholar 

  • Mairal, J., Leordeanu, M., Bach, F., Hebert, M., & Ponce, J. (2008c). Discriminative sparse image models for class-specific edge detection and image interpretation. In: Proceedings of the European Conference on Computer Vision

  • Mallat, S. (1999). A wavelet tour of signal processing (2nd ed.). San Diego: Academic Press.

    MATH  Google Scholar 

  • Martinez, A., & Benavente, R. (1998). The AR face database (p. 24). Report No: CVC Tech.

  • Nesterov, Y., & Nemirovskii, A. (1994). Interior-point polynomial algorithms in convex programming. Philadelphia: SIAM.

    Book  MATH  Google Scholar 

  • Nilsback, M., & Zisserman, A. (2006). A visual vocabulary for flower classification. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition

  • Okatani, T., & Deguchi, K. (2007). On the Wiberg algorithm for matrix factorization in the presence of missing components. Internationall Journal of Computer Vision, 72(3), 329–337.

    Article  Google Scholar 

  • Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42, 145–174.

    Article  MATH  Google Scholar 

  • Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381, 607–609.

    Article  Google Scholar 

  • Olshausen, B. A., & Field, D. J. (1997). Sparse coding with an overcomplete basis set: A strategy employed by v1? Vision Research, 37(23), 3311–3325.

    Article  Google Scholar 

  • Pham, D., & Venkatesh, S. (2008). Joint learning and dictionary construction for pattern recognition. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition

  • Phillips, P. J., Flynn, P. J., Scruggs, W. T., Bowyer, K. W., Chang, J., Hoffman, K., et al. (2005). Overiew of the face recognition grand challenge. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  • Qiu, Q., Jiang, Z. L., & Chellappa, R. (2011). Sparse dictionary-based representation and recognition of action attributes. In: Proceedings of the International Conference on Computer Vision

  • Ramirez, I., Sprechmann, P., & Sapiro, G. (2010). Classification and clustering via dictionary learning with structured incoherence and shared features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  • Rodriguez, F., & Sapiro, G. (2007). Sparse representation for image classification: Learning discriminative and reconstructive non-parametric dictionaries (p. 2213). Preprint: IMA.

  • Rodriguez, M., Ahmed, J., & Shah, M. (2008). A spatio-temporal maximum average correlation height filter for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  • Rosasco, L., Verri, A., Santoro, M., Mosci, S., & Villa, S. (2009). Iterative Projection Methods for Structured Sparsity Regularization. MIT Technical Reports, MIT-CSAIL-TR-2009-050, CBCL-282.

  • Rubinstein, R., Bruckstein, A. M., & Elad, M. (2010). Dictionaries for sparse representation modeling. Proceedings of the IEEE, 98(6), 1045–1057.

    Article  Google Scholar 

  • Sadanand, S., & Corso, J. J. (2012). Action bank: A high-level representation of activeity in video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  • Shen, L., Wang, S. H., Sun, G., Jiang, S. Q., & Huang, Q. M. (2013). Multi-level discriminative dictionary learning towards hierarchical visual categorization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  • Song, F. X., Zhang, D., Mei, D. Y., & Guo, Z. W. (2007). A multiple maximum scatter difference discriminant criterion for facial feature extraction. IEEE Transactions on Systems, Man, and Cybernetics Part B, 37(6), 1599–1606.

    Article  Google Scholar 

  • Sprechmann, P., & Sapiro, G. (2010). Dictionary learning and sparse coding for unsupervised clustering. In: Proceedings of the International Conference on Acoustics Speech and Signal Processing

  • Szabo, Z., Poczos, B., & Lorincz, A. (2011). Online group-structured dictionary learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  • Tropp, J. A., & Wright, S. J. (2010). Computational methods for sparse solution of linear inverse problems. Proceedings of the IEEE Conference Special Issue on Applications of Compressive Representation, 98(6), 948–958.

    Google Scholar 

  • Turk, M., & Pentland, A. (1991). Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1), 71–86.

    Article  Google Scholar 

  • Viola, P., & Jones, M. J. (2004). Robust real-time face detection. International Journal of Computer Vision, 57, 137–154.

    Article  Google Scholar 

  • Wagner, A., Wright, J., Ganesh, A., Zhou, Z. H., Mobahi, H., & Ma, Y. (2012). Toward a practical face recognition system: Robust alignment and illumination by sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(2), 373–386.

    Article  Google Scholar 

  • Wang, H., Ullah, M., Klaser, A., Laptev, I., & Schmid C. (2009). Evaluation of local spatio-temporal features for actions recognition. In: Proceedings of the British Machine Vision Conference.

  • Wang, H., Yan, S.C., Xu, D., Tang, X.O., & Huang, T. (2007). Trace ratio versus ratio trace for dimensionality reduction. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition.

  • Wang, H. R., Yuan, C. F., Hu, W. M., & Sun, C. Y. (2012). Supervised class-specific dictionary learning for sparse modeling in action recognition. Pattern Recognition, 45(11), 3902–3911.

    Article  Google Scholar 

  • Wright, J., Yang, A. Y., Ganesh, A., Sastry, S. S., & Ma, Y. (2009b). Robust face recognition via sparse representation. IEEE Trans Pattern Analysis and Machine Intelligence, 31(2), 210–227.

    Article  Google Scholar 

  • Wright, J. S., Nowak, D. R., & Figueiredo, T. A. M. (2009a). Sparse reconstruction by separable approximation. IEEE Transactions on Signal Processing, 57(7), 2479–2493.

    Article  MathSciNet  Google Scholar 

  • Wu, Y. N., Si, Z. Z., Gong, H. F., & Zhu, S. C. (2010). Learning active basis model for object detection and recognition. International Journal of Computer Vision, 90, 198–235.

    Article  MathSciNet  Google Scholar 

  • Xie, N., Ling, H., Hu, W., & Zhang, X. (2010). Use bin-ratio information for category and scene classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  • Yang, A.Y., Ganesh, A., Zhou, Z. H., Sastry, S. S., & Ma, Y. (2010a). A review of fast \(l_{1}\)-minimization algorithms for robust face recognition. arXiv:1007.3753v2.

  • Yang, J. C., Wright, J., Ma, Y., & Huang, T. (2008). Image super-resolution as sparse representation of raw image patches. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  • Yang, J. C., Yu, K., Gong, Y., & Huang, T. (2009). Linear spatial pyramid matching using sparse coding for image classification.In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  • Yang, J. C., Yu, K., & Huang, T. (2010b). Supervised Translation-Invariant Sparse coding. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition

  • Yang, M., & Zhang, L. (2010). Gabor feature based sparse representation for face recognition with gabor occlusion dictionary. In: Proceedings of the European Conference on Computer Vision

  • Yang, M., Zhang, L., Feng, X. C., & Zhang, D. (2011b). Fisher discrimination dictionary learning for sparse representatio. In: Proceedings of the International Conference on Computer Vision

  • Yang, M., Zhang, L., Yang, J., & Zhang, D. (2010c). Metaface learning for sparse representation based face recognition. In: Proceedings of the IEEE Conference on Image Processing

  • Yang, M., Zhang, L., Yang, J., & Zhang, D. (2011a). Robust sparse coding for face recognition. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition

  • Yang, M., Zhang, L., & Zhang, D. (2012). Efficient misalignment robust representation for real-time face recognition. In: Proceedings of the European Conference on Computer Vision

  • Yao, A., Gall, J., & Gool, L. V. (2010). A hough transform-based voting framework for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  • Ye, G. N., Liu, D., Jhuo, I.-H., & Chang, S.-F. (2012). Robust late fusion with rank minimization. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition

  • Yu, K., Xu, W., & Gong, Y. (2009). Deep learning with kernel regularization for visual recognition. In: Advances in Neural Information Processing Systems, p. 21.

  • Yuan, X. T., & Yan, S. C. (2010). Visual classification with multitask joint sparse representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  • Zhang, L., Yang, M., & Feng, X. C. (2011). Sparse representation or collaborative representation: which helps face recognition?. In: Proceedings of the International Conference on Computer Vision

  • Zhang, Q., & Li, B. X. (2010). Discriminative K-SVD for dictionary learning in face recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  • Zhang, Z. D., Ganesh, A., Liang, X., & Ma, Y. (2012). TILT: Transformation invariant low-rank textures. International Journal of Computer Vision, 99, 1–24.

  • Zhou, M. Y., Chen, H. J., Paisley, J., Ren, L., Li, L. B., Xing, Z. M., et al. (2012). Nonparametric Bayesian dictionary learning for analysis of noisy and incomplete images. IEEE Transactions on Image Processing, 21(1), 130–144.

    Article  MathSciNet  Google Scholar 

  • Zhou, N., & Fan, J. P. (2012). Learning inter-related visual dictionary for object recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

  • Zou, H., & Hastie, T. (2005). Regularization and variable selection via elastic net. Journal of the Royal Statistical Society B, 67(Part 2), 301–320.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lei Zhang.

Additional information

Communicated by K. Ikeuchi.

Appendices

Appendix 1: \({{\varvec{tr}}}({{\varvec{S}}}_{B}({{\varvec{X}}}))\) when \(X_{i}^j =0,\;j\ne i\)

Denote by \({{\varvec{m}}}_{i}^{i}, {{\varvec{m}}}_{i}\) and \({{\varvec{m}}}\) the mean vectors of \(\varvec{X}_{i}^i , {{\varvec{X}}}_{i}\) and \({{\varvec{X}}}\), respectively. Because \(\varvec{X}_{i}^j =0\) for \(j\ne i\), we can rewrite \(\varvec{m}_{i} =\left[ {\mathbf{0};\ldots ;\varvec{m}_{i}^i ;\ldots ;\mathbf{0}} \right] \) and \(\varvec{m}={\left[ {n_1 \varvec{m}_1^1 ;\ldots ;n_{i} \varvec{m}_{i}^i ;\ldots ;n_K \varvec{m}_K^K } \right] }\Big /n\). Therefore, the between-class scatter, i.e. \(tr\left( {\varvec{S}_B \left( \varvec{X} \right) } \right) =\sum _{i=1}^K {n_{i} \left\| {\varvec{m}_{i} -\varvec{m}} \right\| _2^2 } ,\) becomes

$$\begin{aligned}&\varvec{S}_B \left( \varvec{X} \right) =\sum \nolimits _{i=1}^K {n_{i} }/{n^{2}}\left[ -n_1 \varvec{m}_1^1 ;\ldots ;\left( {n-n_{i} } \right) \varvec{m}_{i}^i ;\right. \\&\left. \quad \ldots ;-n_K \varvec{m}_K^K \right] \\&\quad \left[ -n_1 \varvec{m}_1^1 ;\ldots ;\left( {n-n_{i} } \right) \varvec{m}_{i}^i ;\ldots ;-n_K \varvec{m}_K^K \right] ^{T}. \end{aligned}$$

Denote by \(\varvec{\kappa }_{i}=1-n_{i}/n\). After some derivation, the trace of \({{\varvec{S}}}_{B}({{\varvec{X}}})\) becomes

$$\begin{aligned}&tr\left( {\varvec{S}_B \left( \varvec{X} \right) } \right) =\sum _{i=1}^K {n_{i} }\Big /{n^{2}}\\&\left\| \left[ -n_1 \varvec{m}_1^1 ;\ldots ;\left( {n-n_{i} } \right) \varvec{m}_{i}^i ;\ldots ;-n_K \varvec{m}_K^K \right] \right\| _2^2\\&=\sum _{i=1}^K {\varvec{\kappa }_{i} n_{i} \left\| {\varvec{m}_{i}^i } \right\| _2^2 } . \end{aligned}$$

Because \(\varvec{m}_{i}^i \) is the mean representation vector of the samples from the same class, which will generally have non-neglected values, the trace of between-class scatter will have big energy in general.

Appendix 2: The Derivation of Simplified FDDL Model

Denote by \(\varvec{m}_{i}^i \) and \({{\varvec{m}}}_{i}\) the mean vector of \(\varvec{X}_{i}^i \) and \({{\varvec{X}}}_{i}\), respectively. Because \(\varvec{X}_{i}^j =0\) for \(j\ne i\), we can rewrite \(\varvec{m}_{i} =\left[ {\mathbf{0};\ldots ;\varvec{m}_{i}^i ;\ldots ;\mathbf{0}} \right] \). So the within-class scatter changes to

$$\begin{aligned} \varvec{S}_W \left( \varvec{X} \right) =\sum \nolimits _{i=1}^K {\sum \nolimits _{{\varvec{x}}_k \in {\varvec{X}}_{i} } {\left( {{{\varvec{x}}}_k^i -\varvec{m}_{i}^i } \right) \left( {{{\varvec{x}}}_k^i -\varvec{m}_{i}^i } \right) ^{T}} } . \end{aligned}$$

The trace of within-class scatter is

$$\begin{aligned} tr\left( {\varvec{S}_W \left( \varvec{X} \right) } \right) =\sum \nolimits _{i=1}^K {\sum \nolimits _{{\varvec{x}}_k \in {\varvec{X}}_{i} } {\left\| {\varvec{x}_k^i -\varvec{m}_{i}^i } \right\| _2^2 } }. \end{aligned}$$

Based on Appendix 1, the trace of between-class scatter is \(tr\left( {\varvec{S}_B \left( \varvec{X} \right) } \right) =\sum \nolimits _{i=1}^K {\varvec{\kappa }_{i} n_{i} \left\| {\varvec{m}_{i}^i } \right\| _2^2 } \), where \(\kappa _{i}=1-n_{i}/n\). Therefore the discriminative coefficient term, i.e. \(f\left( \varvec{X} \right) =\left( {\varvec{S}_W \left( \varvec{X} \right) -\varvec{S}_B \left( \varvec{X} \right) } \right) +\eta \left\| \varvec{X} \right\| _F^2 \), could be simplified to

$$\begin{aligned} f\left( \varvec{X} \right)&= \sum \nolimits _{i=1}^K \left( \sum \nolimits _{{\varvec{x}}_k \in {\varvec{X}}_{i} } {\left\| {\varvec{x}_k^i -\varvec{m}_{i}^i } \right\| _2^2 }\right. \\&\quad \left. +\,\kappa _{i} \left( {\left\| {\varvec{X}_{i}^i } \right\| _F^2 -n_{i} \left\| {\varvec{m}_{i}^i } \right\| _2^2 } \right) +\left( {\eta -\kappa _{i} } \right) \left\| {\varvec{X}_{i}^i } \right\| _F^2 \right) . \end{aligned}$$

Denote by \(\varvec{E}_{i}^j =\left[ 1 \right] _{n_{i} \times n_j } \) the matrix of size \(n_{i}\times n_{j}\) with all entries being 1, then \(\varvec{M}_{i}^i =\left[ {\varvec{m}_{i}^i } \right] _{1\times n_{i} } ={\varvec{X}_{i}^i \varvec{E}_{i}^i }/{n_{i} }\). Because \({{\varvec{I}}}-{\varvec{E}_{i}^i }/{n_{i} }\left( {{\varvec{E}_{i}^i }/{n_{i} }} \right) ^{T}= ({{\varvec{I}}}-{\varvec{E}_{i}^i }/{n_{i} })({{\varvec{I}}}-{\varvec{E}_{i}^i }/{n_{i} })^{T}\), we have

$$\begin{aligned}&\left\| {\varvec{X}_{i}^i } \right\| _F^2 -n_{i} \left\| {\varvec{m}_{i}^i } \right\| _2^2 =\left\| {\varvec{X}_{i}^i } \right\| _F^2 -\left\| {\left[ {\varvec{m}_{i}^i } \right] _{1\times n_{i} } } \right\| _F^2\\&\quad =tr\left( {\varvec{X}_{i}^i \left( {\varvec{I}-{\varvec{E}_{i}^i }/{n_{i} }\left( {{\varvec{E}_{i}^i }/{n_{i} }} \right) ^{T}} \right) \left( {\varvec{X}_{i}^i } \right) ^{T}} \right) \\&\quad =tr\left( {\varvec{X}_{i}^i \left( {\varvec{I}-{\varvec{E}_{i}^i }/{n_{i} }} \right) \left( {\varvec{I}-{\varvec{E}_{i}^i }/{n_{i} }} \right) ^{T}\left( {\varvec{X}_{i}^i } \right) ^{T}} \right) \\&\quad =\left\| {\varvec{X}_{i}^i -\left[ {\varvec{m}_{i}^i } \right] _{1\times n_{i} } } \right\| _F^2 =\left\| {\varvec{X}_{i}^i -\varvec{M}_{i}^i } \right\| _F^2 \end{aligned}$$

Then the discriminative coefficient term could be written as

$$\begin{aligned} f\left( \varvec{X} \right)&=\sum \nolimits _{i=1}^K \left( {\sum \nolimits _{{\varvec{x}}_k \in {\varvec{X}}_{i} } {\left\| {{{\varvec{x}}}_k^i -\varvec{m}_{i}^i } \right\| _2^2 } +\kappa _{i} \left\| {\varvec{X}_{i}^i -\varvec{M}_{i}^i } \right\| _F^2}\right. \\&\quad \left. {+\left( {\eta -\kappa _{i} } \right) \left\| {\varvec{X}_{i}^i } \right\| _F^2 } \right) \\&=\sum \nolimits _{i=1}^K {\left( {\left( {1\!+\!\kappa _{i} } \right) \left\| {\varvec{X}_{i}^i -\varvec{M}_{i}^i } \right\| _F^2 +\left( {\eta -\kappa _{i} } \right) \left\| {\varvec{X}_{i}^i } \right\| _F^2 } \right) } \end{aligned}$$
(21)

With the constraint that \(\varvec{X}_{i}^j =0\) for \(j\ne i\) in Eq. (10), we have

$$\begin{aligned} \left\| {\varvec{A}_{i} -\varvec{DX}_{i} } \right\| _F^2 =\left\| {\varvec{A}_{i} -\varvec{D}_{i} \varvec{X}_{i}^i } \right\| _F^2 \end{aligned}$$
(22)

With Eqs. (21) and (22), the model of simplified FDDL [i.e. Eq. (10] could be written as

where \({\lambda }'_1 ={\lambda _1 }/2,{\lambda }'_2 ={\lambda _2 \left( {1+\kappa _{i} } \right) }/2\), and \({\lambda }'_3 ={\lambda _2 \left( {\eta -\kappa _{i} } \right) }/2\).

Appendix 3: The convexity of \({{\varvec{f}}}_{i}({{\varvec{X}}})\)

Let \(\varvec{E}_{i}^j =\left[ 1 \right] _{n_{i} \times n_j } \) be a matrix of size \(n_{i} \times n_{j}\) with all entries being 1, and let \(\varvec{N}_{i} =\varvec{I}_{n_{i} \times n_{i} } -{\varvec{E}_{i}^i }/{n_{i} },\,\varvec{P}_{i} ={\varvec{E}_{i}^i }/{n_{i} }-{\varvec{E}_{i}^i }/n,\,\varvec{C}_{i}^j ={\varvec{E}_{i}^j }/n\), where \(\varvec{I}_{n_{i} \times n_{i} } \) is an identity matrix of size \(n_{i}\times n_{i}\).

From \(f_{i} \left( {\varvec{X}_{i} } \right) =\left\| {\varvec{X}_{i} -\varvec{M}_{i} } \right\| _F^2 -\sum \nolimits _{k=1}^K {\left\| {\varvec{M}_k -M} \right\| _F^2 } +\eta \left\| {\varvec{X}_{i} } \right\| _F^2 \), we can derive that

$$\begin{aligned} f_{i} \left( {\varvec{X}_{i} } \right)&=\left\| {\varvec{X}_{i} \varvec{N}_{i} } \right\| _F^2 -\left\| {\varvec{X}_{i} \varvec{P}_{i} -\varvec{G}} \right\| _F^2\\&\quad -\sum \nolimits _{k=1,k\ne i}^K {\left\| {\varvec{Z}_k -\varvec{X}_{i} \varvec{C}_i^k } \right\| _F^2 } +\eta \left\| {\varvec{X}_{i} } \right\| _F^2 \end{aligned}$$
(24)

where \(\varvec{G}=\sum \nolimits _{k=1,k\ne i}^K {\varvec{X}_k \varvec{C}_k^i } ,\,\varvec{Z}_k ={\varvec{X}_k \varvec{E}_k^k }/{n_k }-\sum \nolimits _{j=1,j\ne i}^K \varvec{X}_j \varvec{C}_j^k\).

Rewrite \({{\varvec{X}}}_{i}\) as a column vector, \(\varvec{\chi }_{i} \!=\!\left[ {\varvec{r}_{i,1} ,\varvec{r}_{i,2} ,\ldots ,\varvec{r}_{i,d} } \right] ^{T}\), where \({{\varvec{r}}}_{i,j}\) is the \(j^\mathrm{th}\) row vector of \({{\varvec{X}}}_{i}\), and \(d\) is the total number of row vectors in \({{\varvec{X}}}_{i}\). Then \(f_{i}({{\varvec{X}}}_{i})\) equals to

$$\begin{aligned}&\left\| {\hbox {diag}\left( {\varvec{N}_{i} ^{T}} \right) \varvec{\chi }_{i} } \right\| _2^2 -\left\| \hbox {diag}\left( {\varvec{P}_{i} ^{T}} \right) \varvec{\chi }_{i}\right. \\&-\left. \hbox {vec}\left( {\varvec{G}^{T}} \right) \right\| _2^2 -\sum \nolimits _{k=1,k\ne i}^K \left\| \hbox {diag}\left( {\left( {\varvec{C}_{i}^k } \right) ^{T}} \right) \varvec{\chi }_{i}\right. \\&-\left. \hbox {vec}\left( {\varvec{Z}_k^T } \right) \right\| _2^2 +\eta \left\| {\varvec{\chi }_{i} } \right\| _2^2 \end{aligned}$$

where diag \(({{\varvec{T}}})\) is to construct a block diagonal matrix with each block on the diagonal being matrix \({{\varvec{T}}}\), and vec(\({{\varvec{T}}})\) is to construct a column vector by concatenating all the column vectors of \({{\varvec{T}}}\).

The convexity of \(f_{i}(\varvec{\chi }_{i})\) depends on whether its Hessian matrix \(\nabla ^{2}f_{i}(\varvec{\chi }_{i})\) is positive definite or not (Boyd and Vandenberghe 2004). We could write the Hessian matrix of \(f_{i}(\varvec{\chi }_{i})\) as

$$\begin{aligned} \nabla ^{2}f_{i} \left( {\varvec{\chi }_{i} } \right)&= 2\hbox {diag}\left( {\varvec{N}_{i} \varvec{N}_{i} ^{T}} \right) -2\hbox {diag}\left( {\varvec{P}_{i} \varvec{P}_{i} ^{T}} \right) \\&\quad -\sum \nolimits _{k=1,k\ne i}^K {\hbox {2diag}\left( {\varvec{C}_{i}^k \left( {\varvec{C}_{i}^k } \right) ^{T}} \right) } +2\eta \varvec{I}. \end{aligned}$$

\(\nabla ^{2}f_{i}(\varvec{\chi }_{i})\) will be positive definite if the following matrix \({{\varvec{S}}}\) is positive definite:

$$\begin{aligned} \varvec{S}=\varvec{N}_{i} \varvec{N}_{i}^T -\left( {\varvec{P}_{i} \varvec{P}_{i}^T +\sum _{k=1,k\ne i}^K {\varvec{C}_{i}^k \left( {\varvec{C}_{i}^k } \right) ^{T}} } \right) +\eta \varvec{I}. \end{aligned}$$

After some derivations, we have

$$\begin{aligned} \varvec{S}=\left( {1+\eta } \right) \varvec{I}-\varvec{E}_{i}^i \left( {2/{n_{i} -2/n}+\sum \nolimits _{k=1}^K {{n_k }/{n^{2}}} } \right) . \end{aligned}$$

In order to make \({{\varvec{S}}}\) positive define, each eigenvalue of \({{\varvec{S}}}\) should be greater than 0. Because the maximal eigenvalue of \({{\varvec{E}}}_{i}^{i}\) is \(n_{i}\), we should ensure

$$\begin{aligned} \left( {1+\eta } \right) -n_{i} \left( {2/{n_{i} -2/n}+\sum _{k=1}^K {{n_k }/{n^{2}}} } \right) >0 \end{aligned}$$

For \(n=n_{1}+n_{2}+{\ldots }+n_{K}\), we have \(\eta >\kappa _{i}\), which could guarantee that \(f_{i}({{\varvec{X}}}_{i})\) is convex to \({{\varvec{X}}}_{i}\). Here \(\kappa _{i}=1-n_{i}/n\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, M., Zhang, L., Feng, X. et al. Sparse Representation Based Fisher Discrimination Dictionary Learning for Image Classification. Int J Comput Vis 109, 209–232 (2014). https://doi.org/10.1007/s11263-014-0722-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11263-014-0722-8

Keywords

Navigation