Abstract
Least square regression has been widely used in pattern classification, due to the compact form and efficient solution. However, two main issues limit its performance for solving the multiclass classification problems. The first one is that employing the hard discrete labels as the regression targets is inappropriate for multiclass classification. The second one is that it focus only on exactly fitting the instances to the target matrix while ignoring the within-class similarity of the instances, resulting in overfitting. To address this issues, we propose a discriminative least squares regression for multiclass classification based on within-class scatter minimization (WCSDLSR). Specifically, a ε-dragging technique is first introduced to relax the hard discrete labels into the slack soft labels, which enlarges the between-class margin for the soft labels as much as possible. The within-class scatter for the soft labels is then constructed as a regularization term to make the transformed instances of the same class closer to each other. These factors ensure WCSDLSR can learn a more compact and discriminative transformation for classification, thus avoiding the overfitting problems. Furthermore, the proposed WCSDLSR can obtain a closed-form solution in each iteration with the lower computational costs. Experimental results on the benchmark datasets demonstrate that the proposed WCSDLSR achieves the better classification performance with the lower computational costs.
Similar content being viewed by others
References
Suykens Johan AK et al (2002) Least Squares Support Vector Machines. Int J Circ Theory Appl 27(6):605–615
Li C, Li S, Liu Y (2016) A least squares support vector machine model optimized by moth-flame optimization algorithm for annual power load forecasting. Appl Intell 45(4):1166–1178
Shalev-Shwartz S, Ben-David S (2014) Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press
Aydav PSS, Minz S (2020) Granulation-based self-training for the semi-supervised classification of remote-sensing images. Granular Comput 5(3):309–327
Amezcua J, Melin P (2019) A new fuzzy learning vector quantization method for classification problems based on a granular approach. Granular Comput 4(2):197–209
Bas E, Egrioglu E, Yolcu U, Grosan C (2019) Type 1 fuzzy function approach based on ridge regression for forecasting. Granular Comput 4(4):629–637
Lopez J, Maldonado S, Carrasco M (2016) A novel multi-class SVM model using second-order cone constraints. Appl Intell 44(2):457–469
Doǧan U, Glasmachers T, Igel C (2016) A Unified View on Multi-class Support Vector Classification. Journal of Machine Learning Research, 17(1):1550–1831
Ma J, Zhou S, Chen L, Wang W, Zhang Z (2019) A sparse robust model for Large scale Multi-Class Classification based on K-SVCR. Pattern Recogn Lett 117(1):16–23
Cherkassky V (1997) The nature of statistical learning theory. IEEE Trans Neural Netw Learn Syst 8(6):1564–1564
Allwein EL, Schapire RE, Singer Y (2001) Reducing multiclass to Binary: a unifying approach for margin classifiers. J Mach Learn Res 1(2):113–141
Crammer K, Singer Y (2002) On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines. J Mach Learn Res 2(2):265–292
Tsochantaridis I, Joachims T, Hofmann T, Altun Y, Singer Y (2006) Large Margin Methods for Structured and Interdependent Output Variables. J Mach Learn Res 6(2):1453–1484
Robles Guerrero A, Saucedo Anaya B, González Ramírez A, Rosa Vargas A (2019) Analysis of a multiclass classification problem by Lasso Logistic Regression and Singular Value Decomposition to identify sound patterns in queenless bee colonies. Comput Electron Agricul 159:69–74
Xiang S, Nie F, Meng G, Pan C, Zhang C (2012) Discriminative Least Squares Regression for Multiclass Classification and Feature Selection. IEEE Trans Neural Netw Learn Syst 23(11):1738–1754
Zhang X, Wang L, Xiang S, Liu C (2015) Retargeted least squares regression algorithm. IEEE Trans Neural Netw Learn Syst 26(9):2206–2213
Wang L., Zhang X., Pan C (2016) MSDLSR: Margin Scalable Discriminative Least Squares Regression for Multicategory Classification. IEEE Trans Neural Netw Learn Syst 27(12):2711–2717
Wang L, Pan C (2018) Groupwise Retargeted Least-Squares Regression. IEEE Trans Neural Netw Learn Syst 29(4):1352–1358
Fang X, Xu Y, Li X, Lai Z, Wong WK, Fang B (2018) Regularized Label Relaxation Linear Regression. IEEE Trans Neural Netw Learn Syst 29(4):1006–1018
Wen J, Xu Y, Zuoyong Li Y, Ma Z, Xu Y (2018) Inter-class sparsity based discriminative least square regression. Neural Networks, 102:36–47
He K, Peng Y, Liu S, Li J (2020) Regularized negative label relaxation least squares regression for face recognition. Neural Process Lett 51(3):2629–2647
Zhang Y, Li W, Li HC, Tao R, Du Q (2020) Discriminative marginalized least-squares regression for hyperspectral image classification. IEEE Trans Geoence Remote Sens 58(5):3148–3161
Chen Z, Wu X-J, Kittler J (2020) Low-Rank Discriminative least squares regression for image classification. Signal Process 173:107485
Xue H, Chen S, Yang Q (2009) Discriminatively regularized least-squares classification. Pattern Recogn 42(1):93–104
Chang K, Hsieh C, Lin C (2008) Coordinate Descent Method for Large-scale L2-loss Linear Support Vector Machines. J Mach Learn Res 9:1369–1398
Nie F, Wang X, Huang H (2017) Multiclass Capped p-norm Svm for Robust Classifications. In: The 31th AAAI Conference on Artificial Intelligence
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers. Foundations and Trends in Machine Learning, pp 1–122
Gene Golub H., Van Charles Loan F. (1996) Matrix Computations, 3rd ed. Matrix computations
Hong M, Luo X (2017) On the Linear Convergence of the Alternating Direction Method of Multipliers. Mathematical Programming: Series A and B, pp 165–199
Fang X, Teng S, Lai Z, He Z, Xie S, Wong W K (2018) Robust Latent Subspace Learning for Image Classification. IEEE Trans Neural Netw 29(6):2502–2515
Georghiades A. S., Belhumeur P. N., Kriegman D. J. (2001) From Few to Many: Illumination Cone Models for Face Recognition under Variable Lighting and Pose. IEEE Trans Pattern Anal Mach Intell 23(6):643–660
Martínez A, Benavente R (1998) The AR Face Database. Cvc Technical Report, pp 24
Sim T, Baker S, Bsat M (2003) The CMU pose, Illumination, and Expression Database. IEEE Trans Pattern Anal Mach Intell 25(12):1615–1618
Huang GB, Mattar M, Berg T, Learned-Miller E (2008) Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments. Workshop on Faces in ‘Real-Life’ Images, Detection, Alignment, and Recognition
Lazebnik S, Schmid C, Ponce J (2006) Beyond Bags of Features: Spatial pyramid matching for recognizing natural scene categories. In: Computer Vision and Pattern Recognition, vol 2, pp 2169–2178
Van Der Maaten L, Hinton GE (2008) Visualizing Data using t-sne. J Mach Learn Res 9:2579–2605
Jiang Z, Lin Z, Davis L. S. (2013) Label Consistent k-SVD: Learning a Discriminative Dictionary for Recognition. IEEE Trans Pattern Anal Mach Intell 35(11):2651–2664
Demiar J, Schuurmans D (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7(1):1–30
Acknowledgements
This work is supported by the National Natural Science Foundation of China (No. 61772020).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
conflict of interest
The authors declared that they have no conflicts of interest to this work.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A:: Proof of Theorem 1
Appendix A:: Proof of Theorem 1
Proof
For simplicity, Let L denotes the optimization problem (8). The KKT conditions for (8) are derived as follows (because the process of solving W and E does not involve the Lagrange multipliers, the KKT conditions for them are omitted):
First, the Lagrangian multiplier H can be obtained from Algorithm 1, as follows
If sequence \(\{\textbf {H}^{k}\}^{\infty }_{k=1}\) converges to a stationary point, i.e., \((\textbf {H}^k-\textbf {H})\rightarrow 0\), then \((\textbf {T}-\textbf {U})\rightarrow 0\). Thus, the first KKT condition (20) is proved.
For the second KKT condition, the following equation can be obtained from (10)
From the (24), the following equation can be obtained
(25) indicates that when the sequence \(\{\textbf {T}^{k}\}^{\infty }_{k=1}\) converges to a stationary point, i.e., \((\textbf {T}^k-\textbf {T})\rightarrow 0\), the second KKT condition (21) holds.
For the third KKT condition (22), the following equation is obtained by using the results of (12)
which is equivalent to
(27) indicates that when the sequence \(\{\textbf {U}^{k}\}^{\infty }_{k=1}\) converges to a stationary point, i.e., \((\textbf {U}^k-\textbf {U})\rightarrow 0\), the third KKT condition (22) holds.
In summary, the sequence solution \(\{{{\varTheta }}^k\}^{\infty }_{k=1}\) is bounded and \(\lim _{k\rightarrow \infty }({{\varTheta }}^{k+1}-{{\varTheta }}^{k})=0\) can deduce that the limit points of \(\{{{\varTheta }}^k\}^{\infty }_{k=1}\) is the Karush-Kuhn-Tucker (KKT) point of problem (8). Thus, the proof is complete. □
Rights and permissions
About this article
Cite this article
Ma, J., Zhou, S. Discriminative least squares regression for multiclass classification based on within-class scatter minimization. Appl Intell 52, 622–635 (2022). https://doi.org/10.1007/s10489-021-02258-w
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02258-w