Abstract
The employed dictionary plays an important role in sparse representation or sparse coding based image reconstruction and classification, while learning dictionaries from the training data has led to state-of-the-art results in image classification tasks. However, many dictionary learning models exploit only the discriminative information in either the representation coefficients or the representation residual, which limits their performance. In this paper we present a novel dictionary learning method based on the Fisher discrimination criterion. A structured dictionary, whose atoms have correspondences to the subject class labels, is learned, with which not only the representation residual can be used to distinguish different classes, but also the representation coefficients have small within-class scatter and big between-class scatter. The classification scheme associated with the proposed Fisher discrimination dictionary learning (FDDL) model is consequently presented by exploiting the discriminative information in both the representation residual and the representation coefficients. The proposed FDDL model is extensively evaluated on various image datasets, and it shows superior performance to many state-of-the-art dictionary learning methods in a variety of classification tasks.
Similar content being viewed by others
Notes
Illuminations {0,1,3,4, 6,7,8,11,13,14,16,17,18,19}.
Illuminations {0,2,4,6,8,10,12,14,16,18}.
References
Aharon, M., Elad, M., & Bruckstein, A. (2006). K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing, 54(1), 4311–4322.
Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1), 183–202.
Bengio, S., Pereira, F., Singer, Y., & Strelow, D. (2009). Group sparse coding. In Proceedings of the Neural Information Processing Systems
Bobin, J., Starck, J., Fadili, J., Moudden, Y., & Donoho, D. (2007). Morphological component analysis: An adaptive thresholding strategy. IEEE Transactions on Image Processing, 16(11), 2675–2681.
Boyd, S., & Vandenberghe, L. (2004). Convex optimization. Cambridge: Cambridge university press.
Bryt, O., & Elad, M. (2008). Compression of facial images using the K-SVD algorithm. Journal of Visual Communication and Image Representation, 19(4), 270–282.
Candes, E. (2006). Compressive sampling. International Congress of Mathematicians, 3, 1433–1452.
Castrodad, A., & Sapiro, G. (2012). Sparse modeling of human actions from motion imagery. International Journal of Computer Vision, 100, 1–15.
Cooley, J. W., & Tukey, J. W. (1965). An algorithm for the machine calculation of complex Fourier series. Mathematics of Computation, 19, 297–301.
Deng, W. H., Hu, J. N., & Guo, J. (2012). Extended SRC: Undersampled face recognition via intraclass variation dictionary. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9), 1864–1870.
Duda, R., Hart, P., & Stork, D. (2000). Pattern classification (2nd ed.). New York: Wiley-Interscience.
Elad, M., & Aharon, M. (2006). Image denoising via sparse and redundant representations over learned dictionaries. IEEE Transactions on Image Processing, 15(12), 3736–3745.
Engan, K., Aase, S. O., & Husoy, J. H. (1999). Method of optimal directions for frame design. In: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing
Fernando, B., Fromont, E., & Tuytelaars, T. (2012). Effective use of frequent itemset mining for image classification. In: Proceedings of the European Conference Computer Vision
Gehler, P., & Nowozin, S. (2009). On feature combination for multiclass object classification. In: Proceedings of the International Conference Computer Vision
Georghiades, A., Belhumeur, P., & Kriegman, D. (2001). From few to many: Illumination cone models for face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 23(6), 643–660.
Gross, R., Matthews, I., Cohn, J., Kanade, T., & Baker, S. (2010). Multi-PIE. Image and Vision Computing, 28, 807–813.
Guha, T., & Ward, R. K. (2012). Learning sparse representations for human action recognition. IEEE Transactions on Pattern Analysis and Machine Learning, 34(8), 1576–1888.
Guo, Y., Li, S., Yang, J., Shu, T., & Wu, L. (2003). A generalized Foley–Sammon transform based on generalized Fisher discrimination criterion and its application to face recognition. Pattern Recognition Letter, 24(1), 147–158.
Hoyer, P. O. (2002). Non-negative sparse coding. In: Proceedings of the IEEE Workshop Neural Networks for Signal Processing
Huang, K., & Aviyente, S. (2006). Sparse representation for signal classification. In: Proceedings of the Neural Information and Processing Systems
Hull, J. J. (1994). A database for handwritten text recognition research. IEEE Transactions on Pattern Analysis and Machine Intelligence, 16(5), 550–554.
Jenatton, R., Mairal, J., Obozinski, G., & Bach, F. (2011). Proximal methods for hierarchical sparse coding. Journal of Machine Learning Research, 12, 2234–2297.
Jia, Y. Q., Nie, F. P., & Zhang, C. S. (2009). Trace ratio problem revisited. IEEE Transactions on Neural Network, 20(4), 729–735.
Jiang, Z. L., Lin, Z., & Davis, L. S. (2013). abel consistent K-SVD: Learning a discriminative dictionary for recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34, 533.
Jiang, Z. L., Zhang, G. X., & Davis, L. S. (2012). Submodular dictionary learning for sparse coding. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition
Kim, S. J., Koh, K., Lustig, M., Boyd, S., & Gorinevsky, D. (2007). A interior-point method for large-scale \(l_{1}\)-regularized least squares. IEEE Journal on Selected Topics in Signal Processing, 1, 606–617.
Kong, S., & Wang, D. H. (2012). A dictionary learning approach for classification: Separating the particularity and the commonality. In: Proceedings of the European Conference on Computer Vision.
Li, H., Jiang, T., & Zhang, K. (2006). Efficient and robust feature extraction by maximum margin criterion. IEEE Transactions on Neural Network, 17(1), 157–165.
Lian, X. C., Li, Z. W., Lu, B. L., & Zhang, L. (2010). Max-Margin Dictionary Learning for Multi-class Image Categorization. In: Proceedings of the European Conference on Computer Vision
Mairal, J., Bach, F., & Ponce, J. (2012). Task-driven dictionary learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(4), 791–804.
Mairal, J., Bach, F., Ponce, J., Sapiro, G., & Zissserman, A. (2008b). Learning discriminative dictionaries for local image analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Mairal, J., Bach, F., Ponce, J., Sapiro, G., & Zisserman, A. (2009). Supervised dictionary learning. In: Proceedings of the Neural Information and Processing Systems
Mairal, J., Elad, M., & Sapiro, G. (2008a). Sparse representation for color image restoration. IEEE Transactions on Image Processing, 17(1), 53–69.
Mairal, J., Leordeanu, M., Bach, F., Hebert, M., & Ponce, J. (2008c). Discriminative sparse image models for class-specific edge detection and image interpretation. In: Proceedings of the European Conference on Computer Vision
Mallat, S. (1999). A wavelet tour of signal processing (2nd ed.). San Diego: Academic Press.
Martinez, A., & Benavente, R. (1998). The AR face database (p. 24). Report No: CVC Tech.
Nesterov, Y., & Nemirovskii, A. (1994). Interior-point polynomial algorithms in convex programming. Philadelphia: SIAM.
Nilsback, M., & Zisserman, A. (2006). A visual vocabulary for flower classification. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition
Okatani, T., & Deguchi, K. (2007). On the Wiberg algorithm for matrix factorization in the presence of missing components. Internationall Journal of Computer Vision, 72(3), 329–337.
Oliva, A., & Torralba, A. (2001). Modeling the shape of the scene: A holistic representation of the spatial envelope. International Journal of Computer Vision, 42, 145–174.
Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381, 607–609.
Olshausen, B. A., & Field, D. J. (1997). Sparse coding with an overcomplete basis set: A strategy employed by v1? Vision Research, 37(23), 3311–3325.
Pham, D., & Venkatesh, S. (2008). Joint learning and dictionary construction for pattern recognition. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition
Phillips, P. J., Flynn, P. J., Scruggs, W. T., Bowyer, K. W., Chang, J., Hoffman, K., et al. (2005). Overiew of the face recognition grand challenge. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Qiu, Q., Jiang, Z. L., & Chellappa, R. (2011). Sparse dictionary-based representation and recognition of action attributes. In: Proceedings of the International Conference on Computer Vision
Ramirez, I., Sprechmann, P., & Sapiro, G. (2010). Classification and clustering via dictionary learning with structured incoherence and shared features. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Rodriguez, F., & Sapiro, G. (2007). Sparse representation for image classification: Learning discriminative and reconstructive non-parametric dictionaries (p. 2213). Preprint: IMA.
Rodriguez, M., Ahmed, J., & Shah, M. (2008). A spatio-temporal maximum average correlation height filter for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Rosasco, L., Verri, A., Santoro, M., Mosci, S., & Villa, S. (2009). Iterative Projection Methods for Structured Sparsity Regularization. MIT Technical Reports, MIT-CSAIL-TR-2009-050, CBCL-282.
Rubinstein, R., Bruckstein, A. M., & Elad, M. (2010). Dictionaries for sparse representation modeling. Proceedings of the IEEE, 98(6), 1045–1057.
Sadanand, S., & Corso, J. J. (2012). Action bank: A high-level representation of activeity in video. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Shen, L., Wang, S. H., Sun, G., Jiang, S. Q., & Huang, Q. M. (2013). Multi-level discriminative dictionary learning towards hierarchical visual categorization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Song, F. X., Zhang, D., Mei, D. Y., & Guo, Z. W. (2007). A multiple maximum scatter difference discriminant criterion for facial feature extraction. IEEE Transactions on Systems, Man, and Cybernetics Part B, 37(6), 1599–1606.
Sprechmann, P., & Sapiro, G. (2010). Dictionary learning and sparse coding for unsupervised clustering. In: Proceedings of the International Conference on Acoustics Speech and Signal Processing
Szabo, Z., Poczos, B., & Lorincz, A. (2011). Online group-structured dictionary learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Tropp, J. A., & Wright, S. J. (2010). Computational methods for sparse solution of linear inverse problems. Proceedings of the IEEE Conference Special Issue on Applications of Compressive Representation, 98(6), 948–958.
Turk, M., & Pentland, A. (1991). Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1), 71–86.
Viola, P., & Jones, M. J. (2004). Robust real-time face detection. International Journal of Computer Vision, 57, 137–154.
Wagner, A., Wright, J., Ganesh, A., Zhou, Z. H., Mobahi, H., & Ma, Y. (2012). Toward a practical face recognition system: Robust alignment and illumination by sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(2), 373–386.
Wang, H., Ullah, M., Klaser, A., Laptev, I., & Schmid C. (2009). Evaluation of local spatio-temporal features for actions recognition. In: Proceedings of the British Machine Vision Conference.
Wang, H., Yan, S.C., Xu, D., Tang, X.O., & Huang, T. (2007). Trace ratio versus ratio trace for dimensionality reduction. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition.
Wang, H. R., Yuan, C. F., Hu, W. M., & Sun, C. Y. (2012). Supervised class-specific dictionary learning for sparse modeling in action recognition. Pattern Recognition, 45(11), 3902–3911.
Wright, J., Yang, A. Y., Ganesh, A., Sastry, S. S., & Ma, Y. (2009b). Robust face recognition via sparse representation. IEEE Trans Pattern Analysis and Machine Intelligence, 31(2), 210–227.
Wright, J. S., Nowak, D. R., & Figueiredo, T. A. M. (2009a). Sparse reconstruction by separable approximation. IEEE Transactions on Signal Processing, 57(7), 2479–2493.
Wu, Y. N., Si, Z. Z., Gong, H. F., & Zhu, S. C. (2010). Learning active basis model for object detection and recognition. International Journal of Computer Vision, 90, 198–235.
Xie, N., Ling, H., Hu, W., & Zhang, X. (2010). Use bin-ratio information for category and scene classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Yang, A.Y., Ganesh, A., Zhou, Z. H., Sastry, S. S., & Ma, Y. (2010a). A review of fast \(l_{1}\)-minimization algorithms for robust face recognition. arXiv:1007.3753v2.
Yang, J. C., Wright, J., Ma, Y., & Huang, T. (2008). Image super-resolution as sparse representation of raw image patches. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Yang, J. C., Yu, K., Gong, Y., & Huang, T. (2009). Linear spatial pyramid matching using sparse coding for image classification.In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Yang, J. C., Yu, K., & Huang, T. (2010b). Supervised Translation-Invariant Sparse coding. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition
Yang, M., & Zhang, L. (2010). Gabor feature based sparse representation for face recognition with gabor occlusion dictionary. In: Proceedings of the European Conference on Computer Vision
Yang, M., Zhang, L., Feng, X. C., & Zhang, D. (2011b). Fisher discrimination dictionary learning for sparse representatio. In: Proceedings of the International Conference on Computer Vision
Yang, M., Zhang, L., Yang, J., & Zhang, D. (2010c). Metaface learning for sparse representation based face recognition. In: Proceedings of the IEEE Conference on Image Processing
Yang, M., Zhang, L., Yang, J., & Zhang, D. (2011a). Robust sparse coding for face recognition. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition
Yang, M., Zhang, L., & Zhang, D. (2012). Efficient misalignment robust representation for real-time face recognition. In: Proceedings of the European Conference on Computer Vision
Yao, A., Gall, J., & Gool, L. V. (2010). A hough transform-based voting framework for action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Ye, G. N., Liu, D., Jhuo, I.-H., & Chang, S.-F. (2012). Robust late fusion with rank minimization. In: Proceedings IEEE Conference on Computer Vision and Pattern Recognition
Yu, K., Xu, W., & Gong, Y. (2009). Deep learning with kernel regularization for visual recognition. In: Advances in Neural Information Processing Systems, p. 21.
Yuan, X. T., & Yan, S. C. (2010). Visual classification with multitask joint sparse representation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Zhang, L., Yang, M., & Feng, X. C. (2011). Sparse representation or collaborative representation: which helps face recognition?. In: Proceedings of the International Conference on Computer Vision
Zhang, Q., & Li, B. X. (2010). Discriminative K-SVD for dictionary learning in face recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Zhang, Z. D., Ganesh, A., Liang, X., & Ma, Y. (2012). TILT: Transformation invariant low-rank textures. International Journal of Computer Vision, 99, 1–24.
Zhou, M. Y., Chen, H. J., Paisley, J., Ren, L., Li, L. B., Xing, Z. M., et al. (2012). Nonparametric Bayesian dictionary learning for analysis of noisy and incomplete images. IEEE Transactions on Image Processing, 21(1), 130–144.
Zhou, N., & Fan, J. P. (2012). Learning inter-related visual dictionary for object recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
Zou, H., & Hastie, T. (2005). Regularization and variable selection via elastic net. Journal of the Royal Statistical Society B, 67(Part 2), 301–320.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by K. Ikeuchi.
Appendices
Appendix 1: \({{\varvec{tr}}}({{\varvec{S}}}_{B}({{\varvec{X}}}))\) when \(X_{i}^j =0,\;j\ne i\)
Denote by \({{\varvec{m}}}_{i}^{i}, {{\varvec{m}}}_{i}\) and \({{\varvec{m}}}\) the mean vectors of \(\varvec{X}_{i}^i , {{\varvec{X}}}_{i}\) and \({{\varvec{X}}}\), respectively. Because \(\varvec{X}_{i}^j =0\) for \(j\ne i\), we can rewrite \(\varvec{m}_{i} =\left[ {\mathbf{0};\ldots ;\varvec{m}_{i}^i ;\ldots ;\mathbf{0}} \right] \) and \(\varvec{m}={\left[ {n_1 \varvec{m}_1^1 ;\ldots ;n_{i} \varvec{m}_{i}^i ;\ldots ;n_K \varvec{m}_K^K } \right] }\Big /n\). Therefore, the between-class scatter, i.e. \(tr\left( {\varvec{S}_B \left( \varvec{X} \right) } \right) =\sum _{i=1}^K {n_{i} \left\| {\varvec{m}_{i} -\varvec{m}} \right\| _2^2 } ,\) becomes
Denote by \(\varvec{\kappa }_{i}=1-n_{i}/n\). After some derivation, the trace of \({{\varvec{S}}}_{B}({{\varvec{X}}})\) becomes
Because \(\varvec{m}_{i}^i \) is the mean representation vector of the samples from the same class, which will generally have non-neglected values, the trace of between-class scatter will have big energy in general.
Appendix 2: The Derivation of Simplified FDDL Model
Denote by \(\varvec{m}_{i}^i \) and \({{\varvec{m}}}_{i}\) the mean vector of \(\varvec{X}_{i}^i \) and \({{\varvec{X}}}_{i}\), respectively. Because \(\varvec{X}_{i}^j =0\) for \(j\ne i\), we can rewrite \(\varvec{m}_{i} =\left[ {\mathbf{0};\ldots ;\varvec{m}_{i}^i ;\ldots ;\mathbf{0}} \right] \). So the within-class scatter changes to
The trace of within-class scatter is
Based on Appendix 1, the trace of between-class scatter is \(tr\left( {\varvec{S}_B \left( \varvec{X} \right) } \right) =\sum \nolimits _{i=1}^K {\varvec{\kappa }_{i} n_{i} \left\| {\varvec{m}_{i}^i } \right\| _2^2 } \), where \(\kappa _{i}=1-n_{i}/n\). Therefore the discriminative coefficient term, i.e. \(f\left( \varvec{X} \right) =\left( {\varvec{S}_W \left( \varvec{X} \right) -\varvec{S}_B \left( \varvec{X} \right) } \right) +\eta \left\| \varvec{X} \right\| _F^2 \), could be simplified to
Denote by \(\varvec{E}_{i}^j =\left[ 1 \right] _{n_{i} \times n_j } \) the matrix of size \(n_{i}\times n_{j}\) with all entries being 1, then \(\varvec{M}_{i}^i =\left[ {\varvec{m}_{i}^i } \right] _{1\times n_{i} } ={\varvec{X}_{i}^i \varvec{E}_{i}^i }/{n_{i} }\). Because \({{\varvec{I}}}-{\varvec{E}_{i}^i }/{n_{i} }\left( {{\varvec{E}_{i}^i }/{n_{i} }} \right) ^{T}= ({{\varvec{I}}}-{\varvec{E}_{i}^i }/{n_{i} })({{\varvec{I}}}-{\varvec{E}_{i}^i }/{n_{i} })^{T}\), we have
Then the discriminative coefficient term could be written as
With the constraint that \(\varvec{X}_{i}^j =0\) for \(j\ne i\) in Eq. (10), we have
With Eqs. (21) and (22), the model of simplified FDDL [i.e. Eq. (10] could be written as
where \({\lambda }'_1 ={\lambda _1 }/2,{\lambda }'_2 ={\lambda _2 \left( {1+\kappa _{i} } \right) }/2\), and \({\lambda }'_3 ={\lambda _2 \left( {\eta -\kappa _{i} } \right) }/2\).
Appendix 3: The convexity of \({{\varvec{f}}}_{i}({{\varvec{X}}})\)
Let \(\varvec{E}_{i}^j =\left[ 1 \right] _{n_{i} \times n_j } \) be a matrix of size \(n_{i} \times n_{j}\) with all entries being 1, and let \(\varvec{N}_{i} =\varvec{I}_{n_{i} \times n_{i} } -{\varvec{E}_{i}^i }/{n_{i} },\,\varvec{P}_{i} ={\varvec{E}_{i}^i }/{n_{i} }-{\varvec{E}_{i}^i }/n,\,\varvec{C}_{i}^j ={\varvec{E}_{i}^j }/n\), where \(\varvec{I}_{n_{i} \times n_{i} } \) is an identity matrix of size \(n_{i}\times n_{i}\).
From \(f_{i} \left( {\varvec{X}_{i} } \right) =\left\| {\varvec{X}_{i} -\varvec{M}_{i} } \right\| _F^2 -\sum \nolimits _{k=1}^K {\left\| {\varvec{M}_k -M} \right\| _F^2 } +\eta \left\| {\varvec{X}_{i} } \right\| _F^2 \), we can derive that
where \(\varvec{G}=\sum \nolimits _{k=1,k\ne i}^K {\varvec{X}_k \varvec{C}_k^i } ,\,\varvec{Z}_k ={\varvec{X}_k \varvec{E}_k^k }/{n_k }-\sum \nolimits _{j=1,j\ne i}^K \varvec{X}_j \varvec{C}_j^k\).
Rewrite \({{\varvec{X}}}_{i}\) as a column vector, \(\varvec{\chi }_{i} \!=\!\left[ {\varvec{r}_{i,1} ,\varvec{r}_{i,2} ,\ldots ,\varvec{r}_{i,d} } \right] ^{T}\), where \({{\varvec{r}}}_{i,j}\) is the \(j^\mathrm{th}\) row vector of \({{\varvec{X}}}_{i}\), and \(d\) is the total number of row vectors in \({{\varvec{X}}}_{i}\). Then \(f_{i}({{\varvec{X}}}_{i})\) equals to
where diag \(({{\varvec{T}}})\) is to construct a block diagonal matrix with each block on the diagonal being matrix \({{\varvec{T}}}\), and vec(\({{\varvec{T}}})\) is to construct a column vector by concatenating all the column vectors of \({{\varvec{T}}}\).
The convexity of \(f_{i}(\varvec{\chi }_{i})\) depends on whether its Hessian matrix \(\nabla ^{2}f_{i}(\varvec{\chi }_{i})\) is positive definite or not (Boyd and Vandenberghe 2004). We could write the Hessian matrix of \(f_{i}(\varvec{\chi }_{i})\) as
\(\nabla ^{2}f_{i}(\varvec{\chi }_{i})\) will be positive definite if the following matrix \({{\varvec{S}}}\) is positive definite:
After some derivations, we have
In order to make \({{\varvec{S}}}\) positive define, each eigenvalue of \({{\varvec{S}}}\) should be greater than 0. Because the maximal eigenvalue of \({{\varvec{E}}}_{i}^{i}\) is \(n_{i}\), we should ensure
For \(n=n_{1}+n_{2}+{\ldots }+n_{K}\), we have \(\eta >\kappa _{i}\), which could guarantee that \(f_{i}({{\varvec{X}}}_{i})\) is convex to \({{\varvec{X}}}_{i}\). Here \(\kappa _{i}=1-n_{i}/n\).
Rights and permissions
About this article
Cite this article
Yang, M., Zhang, L., Feng, X. et al. Sparse Representation Based Fisher Discrimination Dictionary Learning for Image Classification. Int J Comput Vis 109, 209–232 (2014). https://doi.org/10.1007/s11263-014-0722-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11263-014-0722-8