Skip to main content
Log in

Generalized robust linear discriminant analysis for jointly sparse learning

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Linear discriminant analysis (LDA) is a well-known supervised method that can perform dimensionality reduction and feature extraction effectively. However, traditional LDA-based methods need to be turned into the trace ratio form to compute the closed-form solution, in which the within-class scatter matrix should be nonsingular. In this article, we design a new model named generalized robust linear discriminant analysis (GRLDA) method to tackle this disadvantage and improve the robustness. GRLDA uses \({L}_{\mathrm{2,1}}\)-norm on both loss functions to reduce the influence of outliers and on regularization term to obtain joint sparsity simultaneously. The intrinsic graph and the penalty graph are constructed to characterize the intraclass similarity and interclass separability, respectively. A novel optimization method is proposed to solve the proposed model, in which a quadratic problem on the Stiefel manifold is involved to avoid the inverse computation on a singular matrix. We also analyze the computational complexity rigorously. Finally, the experimental results on face, object, and medical images exhibit the superiority of GRLDA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data availability

The datasets analyzed during the current study are available in the following databases.

(1) COIL20 database (http://www.cs.columbia.edu/CAVE/software/softlib/coil-20.php),

(2) FERET database (http://www.nist.gov/itl/iad/ig/colorferet.cfm),

(3) ORL database (http://www.uk.research.att.com/facedatabase.html),

(4) AR database (http://www2.ece.ohio-state.edu/~aleix/ARdatabase.html).

(5) MedMNIST database (https://medmnist.com/).

(6) Extended Yale B Database (https://www.kaggle.com/datasets/tbourton/extyalebcroppedpng).

References

  1. Tao D, Tang X, Li X (2006) Direct kernel biased discriminant analysis: a new content based image retrieval relevance feedback algorithm. IEEE Trans Multimed 8(4):716–727

    Article  Google Scholar 

  2. Passalis N, Tefas A (2018) Dimensionality reduction using similarity-induced embeddings. IEEE Trans Neural Netw Learn Syst 29(8):3429–3441

    Article  MathSciNet  Google Scholar 

  3. Vashishtha G, Kumar R (2023) Unsupervised learning model of sparse filtering enhanced using wasserstein distance for intelligent fault diagnosis. J Vib Eng Technol 11(7):2985–3002

  4. Lou Q, Deng Z, Choi K-S, Shen H, Wang J, Wang S (2021) Robust multi-label relief feature selection based on fuzzy margin co-optimization, IEEE Trans Emerg Topics Comput Intell, early access

  5. Vashishtha G, Chauhan S, Kumar A, KumarAn R (2022) Ameliorated African Vulture Optimization Algorithm to Diagnose the Rolling Bearing Defects. Meas Sci Technol 33(7):075013

    Article  Google Scholar 

  6. Lai Z, Xu Y, Jin Z, Zhang D (2014) Human gait recognition via sparse discriminant projection learning. IEEE Trans Circuits Syst Video Technol 24(10):1651–1662

    Article  Google Scholar 

  7. Yang L, Song S, Gong Y (2019) Nonparametric dimension reduction via maximizing pairwise separation probability. IEEE Trans Neural Netw Learn Syst 30(10):3205–3210

    Article  MathSciNet  Google Scholar 

  8. Bhadra T, Maulik U (2022) Unsupervised Feature Selection Using Iterative Shrinking and Expansion Algorithm. IEEE Trans Emerg Topics Comput Intell 5(5):1453–1462

    Article  Google Scholar 

  9. Vashishtha G, Kumar R (2022) Feature Selection Based on Gaussian Ant Lion Optimizer for Fault Identification in Centrifugal Pump, Recent Advances in Machines and Mechanisms: Select Proceedings of the iNaCoMM 2021. Singapore: Springer Nature Singapore. 295–310

  10. Cunningham JP, Ghahramani Z (2015) Linear dimensionality reduction: Survey, insights, and generalizations. J Mach Learn Research 16(1):2859–2900

  11. Sumithra V, Surendran S (2015) A review of various linear and non linear dimensionality reduction techniques. Int J Comput Sci Inf Technol 6(3):2354–2360

    Google Scholar 

  12. Kwak N (2008) Principal component analysis based on L1-norm maximization. IEEE Trans Pattern Anal Mach Intell 30(9):1672–1680

    Article  Google Scholar 

  13. Martinez AM, Kak AC (2001) PCA versus LDA. IEEE Trans Pattern Anal Mach Intell 23(2):228–233

    Article  Google Scholar 

  14. He X (2004) Locality preserving projections. Proc Adv Neural Inf Process Syst 16(1):186–197

    Google Scholar 

  15. Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding, science, 2000, vol. 290, no. 5500, pp. 2323–2326

  16. Balasubramanian M, Schwartz EL (2002) The isomap algorithm and topological stability. Science 295(5552):7–7

    Article  Google Scholar 

  17. Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396

  18. Wen J, Fang X, Cui J, Fei L, Yan K, Chen Y, Xu Y (2018) Robust sparse linear discriminant analysis. IEEE Trans Circuits Syst Video Technol 29(2):390–403

    Article  Google Scholar 

  19. Wang H, Yan S, Xu D, Tang X, Huang T (2007) Trace ratio vs. ratio trace for dimensionality reduction, in Proc IEEE Comput Soc Conf Comput Vision Pattern Recognit, pp. 1–8

  20. Wang J, Wang L, Nie F, Li X (2021) A novel formulation of trace ratio linear discriminant analysis, IEEE Trans Neural Netw Learn Syst, pp. 1–11

  21. Wang J, Wang H, Nie F, Li X (2022) Ratio sum vs. sum ratio for linear discriminant analysis. IEEE Trans Pattern Anal Mach Intell 44(12):10171–10185

    Article  Google Scholar 

  22. Pang Y, Yuan Y (2010) Outlier-resisting graph embedding. Neurocomputing 73(4–6):968–974

  23. Yan S, Xu D, Zhang B, Zhang H-J, Yang Q, Lin S (2006) Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans Pattern Anal Mach Intell 29(1):40–51

    Article  Google Scholar 

  24. Ding C, Zhou D, He X, Zha H (2006) R1-PCA: Rotational invariant L1-norm principal component analysis for robust subspace factorization, Proc 23rd Int Conf Mach Learn: 281–288

  25. Li X, Hu W, Wang H, Zhang Z (2010) Linear discriminant analysis using rotational invariant L1-norm. Neurocomputing 73(13–15):2571–2579

    Article  Google Scholar 

  26. Pang Y, Li X, Yuan Y (2010) Robust tensor analysis with L1-norm. IEEE Trans Circuits Syst Video Technol 20(2):172–178

  27. Zou H, Hastie T, Tibshirani R (2004) Sparse principal component analysis. J Comput Graph Stat 15(2):265–286

    Article  MathSciNet  Google Scholar 

  28. Nie F, Wang Z, Wang R, Wang Z, Li X (2019) Towards robust discriminative projections learning via non-greedy L2,1-norm minmax. IEEE Trans Pattern Anal Mach Intell 43(6):2086–2100

    Article  Google Scholar 

  29. Nie F, Yuan J, Huang H (2014) Optimal mean robust principal component analysis, Int Conf Mach Learn, PMLR

  30. Zhao H, Wang Z, Nie F (2019) A new formulation of linear discriminant analysis for robust dimensionality reduction. IEEE Trans Knowl Data Eng 31(4):629–640

    Article  Google Scholar 

  31. Lai Z, Xu Y, Yang J, Shen L, Zhang D (2016) Rotational invariant dimensionality reduction algorithms. IEEE Trans Cybern 47(11):3733–3746

    Article  Google Scholar 

  32. Mo D, Lai Z, Wong WK (2019) Locally joint sparse marginal embedding for feature extraction. IEEE Trans Multimed 21(12):3038–3052

    Article  Google Scholar 

  33. Lin Y, Lai Z, Zhou J, Wen J, Kong H (2023) Multiview Jointly Sparse Discriminant Common Subspace Learning. Pattern Recognit 138:109342

    Article  Google Scholar 

  34. Li Z, Nie F, Wu D, Wang Z, Li X (2023) Sparse Trace Ratio LDA for Supervised Feature Selection, IEEE Trans Cybern, early access. https://doi.org/10.1109/TCYB.2023.3264907

  35. Mo D, Lai Z, Zhou J, Hu Q (2023) Scatter matrix decomposition for jointly sparse learning. Pattern Recognit 140:109485

  36. Nie F, Huang H, Cai X, Ding C (2010) Efficient and robust feature selection via joint L2,1-norms minimization, Proc Adv Neural Inf Process Syst, pp. 1813–1821

  37. Wong WK, Zhao HT (2012) Supervised optimal locality preserving projection. Pattern Recognit 45(1):186–197

    Article  Google Scholar 

  38. Yang Y, Shen HT, Ma Z, Huang Z, Zhou X (2011) L2,1-norm regularized discriminative feature selection for unsupervised, IJCAI Int Joint Conf Artif Intell

  39. Nie F, Wu D, Wang R, Li X (2007) Truncated robust principle component analysis with a general optimization framework. IEEE Trans Pattern Anal Mach Intell 40(1):339–342

    Google Scholar 

  40. Turk M, Pentland A (1991) Eigenfaces for recognition. J Cogn Neurosci 3(1):71–86

  41. Ye J, Xiong T, Madigan D (2006) Computational and theoretical analysis of null space and orthogonal linear discriminant analysis J Machine Learn Res, vol. 7, no. 7

  42. Mo D, Lai Z, Wang X, Wong WK (2019) Jointly sparse locality regression for image feature extraction. IEEE Trans Multimed 22(11):2873–2888

    Article  Google Scholar 

  43. Lai Z, Mo D, Wen J, Shen L, Wong WK (2018) Generalized robust regression for jointly sparse subspace learning. IEEE Trans Circuits Syst Video Technol 29(3):756–772

    Article  Google Scholar 

  44. Wang K, He R, Wang L, Wang W, Tan T (2015) Joint feature selection and subspace learning for cross-modal retrieval. IEEE Trans Pattern Anal Mach Intell 38(10):2010–2023

    Article  Google Scholar 

  45. Huang J, Li G, Huang Q, Wu X (2017) Joint feature selection and classification for multilabel learning. IEEE Trans Cybern 47(3):876–889

  46. Shi X, Yang Y, Guo Z, Lai Z (2014) Face recognition by sparse discriminant analysis via joint L2,1-normminimization. Pattern Recognit 47:2447–2453

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception and design. Material preparation, data collection and analysis were performed by Yufei Zhu, Zhihui Lai, Can Gao, and Heng Kong. The first draft of the manuscript was written by Yufei Zhu and Zhihui Lai, and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Heng Kong.

Ethics declarations

Competing interests

This work was supported in part by the Natural Science Foundation of China under Grant 62272319, and in part by the Natural Science Foundation of Guangdong Province (Grant 2023A1515010677, 2024A1515011637) and Science and Technology Planning Project of Shenzhen Municipality under Grant JCYJ20210324094413037 and JCYJ20220818095803007.

Ethical and informed consent for data used

I confirm that I have obtained informed consent from all participants whose data I used in my research.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

First, we introduce a lemma as follows:

Lemma 3

Assuming that f(x) is a convex function of x where x can be a scalar, vector or matrix variable, then we obtain:

$$\mathrm{f}({\mathrm{x}}_{1})-\mathrm{f}({\mathrm{x}}_{2})\ge \mathrm{Tr}({({\mathrm{f}}{{^{\prime}}}({\mathrm{x}}_{2}))}^{\mathrm{T}}({\mathrm{x}}_{1}-{\mathrm{x}}_{2})$$
(28)

where f'(x2) is the super-gradient of f(x) at x2.

Proof of the theorem 2

It is easy to know that \({\mathrm{f}}_{\mathrm{i}}({\mathrm{h}}_{\mathrm{i}}(\mathrm{m}))\) is an arbitrary convex function w.r.t. \({\mathrm{h}}_{\mathrm{i}}(\mathrm{m})\)  under the arbitrary constraint of \(\mathrm{m}\in\Omega\). We assume that \({\mathrm{f}}_{\mathrm{i}}({\mathrm{h}}_{\mathrm{i}}(\mathrm{m}))\le0\). In the t-th iteration, we denote \({\mathrm{G}}_{\mathrm{i}}^{\mathrm{t}}={{\mathrm{f}}{\prime}}_{\mathrm{i}}({\mathrm{h}}_{\mathrm{i}}({\mathrm{m}}^{\mathrm{t}-1}))\). For each i, according to the lemma 3, we have:

$$\begin{array}{l}{\mathrm{f}}_{\mathrm{i}}({\mathrm{h}}_{\mathrm{i}}({\mathrm{m}}^{\mathrm{t}}))-{\mathrm{f}}_{\mathrm{i}}({\mathrm{h}}_{\mathrm{i}}({\mathrm{m}}^{\mathrm{t}-1}))\\ \ge Tr({(}^{{\mathrm{G}}_{\mathrm{i}}^{\mathrm{t}}}{\mathrm{h}}_{\mathrm{i}}({\mathrm{m}}^{\mathrm{t}}))-Tr({(}^{{\mathrm{G}}_{\mathrm{i}}^{\mathrm{t}}}{\mathrm{h}}_{\mathrm{i}}({\mathrm{m}}^{\mathrm{t}-1}))\end{array}$$
(29)

According to (21), the following can be derived:

$$\mathrm{Tr}({({\mathrm{G}}_{\mathrm{i}}^{\mathrm{t}})}^{\mathrm{T}}{\mathrm{h}}_{\mathrm{i}}({\mathrm{m}}^{\mathrm{t}}))\ge \mathrm{Tr}({({\mathrm{G}}_{\mathrm{i}}^{\mathrm{t}})}^{\mathrm{T}}{\mathrm{h}}_{\mathrm{i}}({\mathrm{m}}^{\mathrm{t}-1}))$$
(30)

Summing (29) and (30), we have:

$$\sum_{\mathrm{i}}{\mathrm{f}}_{\mathrm{i}}({\mathrm{h}}_{\mathrm{i}}({\mathrm{m}}^{\mathrm{t}}))\ge \sum_{\mathrm{i}}{\mathrm{f}}_{\mathrm{i}}({\mathrm{h}}_{\mathrm{i}}({\mathrm{m}}^{\mathrm{t}-1}))$$
(31)

Summing (31) and \({\mathrm{f}}_{\mathrm{i}}({\mathrm{h}}_{\mathrm{i}}(\mathrm{m}))\le 0\), the value of the objective function (16) will monotonically increase until convergence.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, Y., Lai, Z., Gao, C. et al. Generalized robust linear discriminant analysis for jointly sparse learning. Appl Intell 54, 9508–9523 (2024). https://doi.org/10.1007/s10489-024-05632-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-024-05632-6

Keywords