Discriminative least squares regression for multiclass classification based on within-class scatter minimization

Ma, Jiajun; Zhou, Shuisheng

doi:10.1007/s10489-021-02258-w

Discriminative least squares regression for multiclass classification based on within-class scatter minimization

Published: 07 May 2021

Volume 52, pages 622–635, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

542 Accesses
5 Citations
Explore all metrics

Abstract

Least square regression has been widely used in pattern classification, due to the compact form and efficient solution. However, two main issues limit its performance for solving the multiclass classification problems. The first one is that employing the hard discrete labels as the regression targets is inappropriate for multiclass classification. The second one is that it focus only on exactly fitting the instances to the target matrix while ignoring the within-class similarity of the instances, resulting in overfitting. To address this issues, we propose a discriminative least squares regression for multiclass classification based on within-class scatter minimization (WCSDLSR). Specifically, a ε-dragging technique is first introduced to relax the hard discrete labels into the slack soft labels, which enlarges the between-class margin for the soft labels as much as possible. The within-class scatter for the soft labels is then constructed as a regularization term to make the transformed instances of the same class closer to each other. These factors ensure WCSDLSR can learn a more compact and discriminative transformation for classification, thus avoiding the overfitting problems. Furthermore, the proposed WCSDLSR can obtain a closed-form solution in each iteration with the lower computational costs. Experimental results on the benchmark datasets demonstrate that the proposed WCSDLSR achieves the better classification performance with the lower computational costs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 2

Inter-class Sparsity Based Non-negative Transition Sub-space Learning

Generalization improvement for regularized least squares classification

Article 22 June 2017

Haitao Gan, Qingshan She, … Ming Meng

Linear discriminant analysis with worst between-class separation and average within-class compactness

Article 27 June 2014

Leilei Yang & Songcan Chen

References

Suykens Johan AK et al (2002) Least Squares Support Vector Machines. Int J Circ Theory Appl 27(6):605–615
Li C, Li S, Liu Y (2016) A least squares support vector machine model optimized by moth-flame optimization algorithm for annual power load forecasting. Appl Intell 45(4):1166–1178
Article Google Scholar
Shalev-Shwartz S, Ben-David S (2014) Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press
Aydav PSS, Minz S (2020) Granulation-based self-training for the semi-supervised classification of remote-sensing images. Granular Comput 5(3):309–327
Amezcua J, Melin P (2019) A new fuzzy learning vector quantization method for classification problems based on a granular approach. Granular Comput 4(2):197–209
Bas E, Egrioglu E, Yolcu U, Grosan C (2019) Type 1 fuzzy function approach based on ridge regression for forecasting. Granular Comput 4(4):629–637
Lopez J, Maldonado S, Carrasco M (2016) A novel multi-class SVM model using second-order cone constraints. Appl Intell 44(2):457–469
Article Google Scholar
Doǧan U, Glasmachers T, Igel C (2016) A Unified View on Multi-class Support Vector Classification. Journal of Machine Learning Research, 17(1):1550–1831
Ma J, Zhou S, Chen L, Wang W, Zhang Z (2019) A sparse robust model for Large scale Multi-Class Classification based on K-SVCR. Pattern Recogn Lett 117(1):16–23
Cherkassky V (1997) The nature of statistical learning theory. IEEE Trans Neural Netw Learn Syst 8(6):1564–1564
Article Google Scholar
Allwein EL, Schapire RE, Singer Y (2001) Reducing multiclass to Binary: a unifying approach for margin classifiers. J Mach Learn Res 1(2):113–141
MathSciNet MATH Google Scholar
Crammer K, Singer Y (2002) On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines. J Mach Learn Res 2(2):265–292
Tsochantaridis I, Joachims T, Hofmann T, Altun Y, Singer Y (2006) Large Margin Methods for Structured and Interdependent Output Variables. J Mach Learn Res 6(2):1453–1484
Robles Guerrero A, Saucedo Anaya B, González Ramírez A, Rosa Vargas A (2019) Analysis of a multiclass classification problem by Lasso Logistic Regression and Singular Value Decomposition to identify sound patterns in queenless bee colonies. Comput Electron Agricul 159:69–74
Xiang S, Nie F, Meng G, Pan C, Zhang C (2012) Discriminative Least Squares Regression for Multiclass Classification and Feature Selection. IEEE Trans Neural Netw Learn Syst 23(11):1738–1754
Article Google Scholar
Zhang X, Wang L, Xiang S, Liu C (2015) Retargeted least squares regression algorithm. IEEE Trans Neural Netw Learn Syst 26(9):2206–2213
Article MathSciNet Google Scholar
Wang L., Zhang X., Pan C (2016) MSDLSR: Margin Scalable Discriminative Least Squares Regression for Multicategory Classification. IEEE Trans Neural Netw Learn Syst 27(12):2711–2717
Article Google Scholar
Wang L, Pan C (2018) Groupwise Retargeted Least-Squares Regression. IEEE Trans Neural Netw Learn Syst 29(4):1352–1358
Article MathSciNet Google Scholar
Fang X, Xu Y, Li X, Lai Z, Wong WK, Fang B (2018) Regularized Label Relaxation Linear Regression. IEEE Trans Neural Netw Learn Syst 29(4):1006–1018
Article Google Scholar
Wen J, Xu Y, Zuoyong Li Y, Ma Z, Xu Y (2018) Inter-class sparsity based discriminative least square regression. Neural Networks, 102:36–47
He K, Peng Y, Liu S, Li J (2020) Regularized negative label relaxation least squares regression for face recognition. Neural Process Lett 51(3):2629–2647
Zhang Y, Li W, Li HC, Tao R, Du Q (2020) Discriminative marginalized least-squares regression for hyperspectral image classification. IEEE Trans Geoence Remote Sens 58(5):3148–3161
Chen Z, Wu X-J, Kittler J (2020) Low-Rank Discriminative least squares regression for image classification. Signal Process 173:107485
Article Google Scholar
Xue H, Chen S, Yang Q (2009) Discriminatively regularized least-squares classification. Pattern Recogn 42(1):93–104
Article Google Scholar
Chang K, Hsieh C, Lin C (2008) Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines. J Mach Learn Res 9:1369–1398
MathSciNet MATH Google Scholar
Nie F, Wang X, Huang H (2017) Multiclass Capped p-norm Svm for Robust Classifications. In: The 31th AAAI Conference on Artificial Intelligence
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Foundations and Trends in Machine Learning, pp 1–122
Gene Golub H., Van Charles Loan F. (1996) Matrix Computations, 3rd ed. Matrix computations
Hong M, Luo X (2017) On the Linear Convergence of the Alternating Direction Method of Multipliers. Mathematical Programming: Series A and B, pp 165–199
Fang X, Teng S, Lai Z, He Z, Xie S, Wong W K (2018) Robust Latent Subspace Learning for Image Classification. IEEE Trans Neural Netw 29(6):2502–2515
Article MathSciNet Google Scholar
Georghiades A. S., Belhumeur P. N., Kriegman D. J. (2001) From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose. IEEE Trans Pattern Anal Mach Intell 23(6):643–660
Article Google Scholar
Martínez A, Benavente R (1998) The AR Face Database. Cvc Technical Report, pp 24
Sim T, Baker S, Bsat M (2003) The CMU pose, Illumination, and Expression Database. IEEE Trans Pattern Anal Mach Intell 25(12):1615–1618
Article Google Scholar
Huang GB, Mattar M, Berg T, Learned-Miller E (2008) Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments. Workshop on Faces in ‘Real-Life’ Images, Detection, Alignment, and Recognition
Lazebnik S, Schmid C, Ponce J (2006) Beyond Bags of Features: Spatial pyramid matching for recognizing natural scene categories. In: Computer Vision and Pattern Recognition, vol 2, pp 2169–2178
Van Der Maaten L, Hinton GE (2008) Visualizing Data using t-sne. J Mach Learn Res 9:2579–2605
MATH Google Scholar
Jiang Z, Lin Z, Davis L. S. (2013) Label Consistent k-SVD: Learning a Discriminative Dictionary for Recognition. IEEE Trans Pattern Anal Mach Intell 35(11):2651–2664
Article Google Scholar
Demiar J, Schuurmans D (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30
MathSciNet Google Scholar

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 61772020).

Author information

Authors and Affiliations

School of Mathematics and Statistics, Xidian University, Xi’an, 710071, Shaanxi, China
Jiajun Ma & Shuisheng Zhou
College of Mathematics and Computer Application, Shangluo University, Shangluo, 726000, Shaanxi, China
Jiajun Ma

Authors

Jiajun Ma
View author publications
You can also search for this author in PubMed Google Scholar
Shuisheng Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shuisheng Zhou.

Ethics declarations

conflict of interest

The authors declared that they have no conflicts of interest to this work.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A:: Proof of Theorem 1

Proof

For simplicity, Let L denotes the optimization problem (8). The KKT conditions for (8) are derived as follows (because the process of solving W and E does not involve the Lagrange multipliers, the KKT conditions for them are omitted):

$$ \begin{array}{@{}rcl@{}} \textbf{U}=\textbf{T}. \end{array} $$

(20)

$$ \begin{array}{@{}rcl@{}} \frac{\partial{L}}{\partial{\textbf{T}}}&=&(1+\alpha+\mu)\textbf{T}\\&&-\left( \textbf{X}\textbf{W}+\alpha(\textbf{Y}+\textbf{Y}\odot\textbf{E}) +\mu\textbf{U}-\textbf{H}\right)=0. \end{array} $$

(21)

$$ \begin{array}{@{}rcl@{}} \frac{\partial{L}}{\partial{\textbf{U}_{j}}}&=&\left( (\beta+\mu)\textbf{I}_{n_{j}}-\frac{\beta}{n_{j}}\textbf{1}_{n_{j}}\right)\textbf{U}_{j} \\&&-(\mu\textbf{T}_{j}+\textbf{H}_{j})=0. \end{array} $$

(22)

First, the Lagrangian multiplier H can be obtained from Algorithm 1, as follows

$$ \begin{array}{@{}rcl@{}} \textbf{H}^{k}=\textbf{H}+\mu(\textbf{T}-\textbf{U}). \end{array} $$

(23)

If sequence $\{\textbf {H}^{k}\}^{\infty }_{k=1}$ converges to a stationary point, i.e., $(\textbf {H}^k-\textbf {H})\rightarrow 0$, then $(\textbf {T}-\textbf {U})\rightarrow 0$. Thus, the first KKT condition (20) is proved.

For the second KKT condition, the following equation can be obtained from (10)

$$ \begin{array}{@{}rcl@{}} \textbf{T}^{k}-\textbf{T} &=&(1+\alpha+\mu)^{-1}\left( \textbf{X}\textbf{W}+\alpha(\textbf{Y}+\textbf{Y}\odot\textbf{E})\right.\\&&\left.+\mu\textbf{U}-\textbf{H}\right)-\textbf{T} \end{array} $$

(24)

From the (24), the following equation can be obtained

$$ \begin{array}{@{}rcl@{}} &&(1+\alpha+\mu)(\textbf{T}^{k}-\textbf{T})\\ &=&\left( \textbf{X}\textbf{W}+\alpha(\textbf{Y}+\textbf{Y}\odot\textbf{E})+\mu\textbf{U}-\textbf{H}\right)-(1+\alpha+\mu)\textbf{T} \end{array} $$

(25)

(25) indicates that when the sequence $\{\textbf {T}^{k}\}^{\infty }_{k=1}$ converges to a stationary point, i.e., $(\textbf {T}^k-\textbf {T})\rightarrow 0$, the second KKT condition (21) holds.

For the third KKT condition (22), the following equation is obtained by using the results of (12)

$$ \begin{array}{@{}rcl@{}} \textbf{U}_{j}^{k} - \textbf{U}_{j} = \left( (\beta+\mu)\textbf{I}_{n_{j}} - \frac{\beta}{n_{j}}\textbf{1}_{n_{j}}\right)^{-1} (\mu\textbf{T}_{j} + \textbf{H}_{j}) - \textbf{U}_{j} \end{array} $$

(26)

which is equivalent to

$$ \begin{array}{@{}rcl@{}} &\left( (\beta+\mu)\textbf{I}_{n_{j}}-\frac{\beta}{n_{j}}\textbf{1}_{n_{j}}\right)(\textbf{U}_{j}^{k}-\textbf{U}_{j})\\ &=(\mu\textbf{T}_{j}+\textbf{H}_{j})-\left( (\beta+\mu)\textbf{I}_{n_{j}}-\frac{\beta}{n_{j}}\textbf{1}_{n_{j}}\right)\textbf{U}_{j} \end{array} $$

(27)

(27) indicates that when the sequence $\{\textbf {U}^{k}\}^{\infty }_{k=1}$ converges to a stationary point, i.e., $(\textbf {U}^k-\textbf {U})\rightarrow 0$, the third KKT condition (22) holds.

In summary, the sequence solution $\{{{\varTheta }}^k\}^{\infty }_{k=1}$ is bounded and $\lim _{k\rightarrow \infty }({{\varTheta }}^{k+1}-{{\varTheta }}^{k})=0$ can deduce that the limit points of $\{{{\varTheta }}^k\}^{\infty }_{k=1}$ is the Karush-Kuhn-Tucker (KKT) point of problem (8). Thus, the proof is complete. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ma, J., Zhou, S. Discriminative least squares regression for multiclass classification based on within-class scatter minimization. Appl Intell 52, 622–635 (2022). https://doi.org/10.1007/s10489-021-02258-w

Download citation

Accepted: 03 February 2021
Published: 07 May 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s10489-021-02258-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Discriminative least squares regression for multiclass classification based on within-class scatter minimization

Abstract

Access this article

Similar content being viewed by others

Inter-class Sparsity Based Non-negative Transition Sub-space Learning

Generalization improvement for regularized least squares classification

Linear discriminant analysis with worst between-class separation and average within-class compactness

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

conflict of interest

Additional information

Publisher’s note

Appendix A:: Proof of Theorem 1

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Discriminative least squares regression for multiclass classification based on within-class scatter minimization

Abstract

Access this article

Similar content being viewed by others

Inter-class Sparsity Based Non-negative Transition Sub-space Learning

Generalization improvement for regularized least squares classification

Linear discriminant analysis with worst between-class separation and average within-class compactness

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

conflict of interest

Additional information

Publisher’s note

Appendix A:: Proof of Theorem 1

Appendix A:: Proof of Theorem 1

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation