Abstract
For classification problems, training accurate and sparse models in limited time has been a longstanding challenge. When a large number of irrelevant and redundant features are collected in datasets, this problem becomes more difficult. Feature selection is often used to solve this problem. However, feature selection suffers from high time complexity and a trade-off between the number of selected features and predictive errors. As a solution to this problem, we propose an accelerated multi-kernel sparse stochastic optimization classifier (AMSSOC) algorithm. which reconstructs \(\ell _{1}\)-norm support vector classifier by introducing the \(\ell _{0}\)-norm function. AMSSOC minimizes the number of features by using an approximate \(\ell _{0}\)-norm function to the support vector classifier. Meanwhile, the novel column-wise multi-kernel matrix is integrated into the classifier to select key features and obtain the explainable prediction. Moreover, the AMSSOC algorithm employs the Nesterov’s method to accelerate the training speed of the classifier. Our main purpose is to find the feature subset as small as possible for improving the prediction accuracy and explainability. As shown in experimental results, the proposed AMSSOC outperforms the existing four classification methods on thirteen real-world and two real image datasets. At the same time, critical feature subsets are discovered, which play an important role for explainable prediction in practical applications. Also, the achieved outcomes demonstrate the superiority of the proposed algorithm.







Similar content being viewed by others
Data availability
In our experiments, all classification datasets are public datasets, which from the UCI machine learning database. They are available online.
References
Natarajan BK (1995) Sparse approximate solutions to linear systems. SIAM J Comput 24(2):227–234
Luo J, Vong C-M, Wong P-K (2013) Sparse bayesian extreme learning machine for multi-classification. IEEE Trans Neural Netw Learn Syst 25(4):836–843
Zhou Z-H (2021) Feature selection and sparse learning. In: Machine Learning, pp. 265–285. Springer, Singapore
Vapnik VN (1995) The nature of statistical learning. Theory
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Wang H, Hu D (2005) Comparison of svm and ls-svm for regression. In: 2005 International Conference on Neural Networks and Brain, vol. 1, pp. 279–283. IEEE
Gönen M, Alpaydın E (2011) Multiple kernel learning algorithms. J Mach Learn Res 12:2211–2268
Aiolli F, Donini M (2015) Easymkl: a scalable multiple kernel learning algorithm. Neurocomputing 169:215–224
Zhang Z, Gao G, Yao T, He J, Tian Y (2020) An interpretable regression approach based on bi-sparse optimization. Appl Intell 50(11):4117–4142
Pan X, Xu Y (2021) A safe feature elimination rule for l1-regularized logistic regression. IEEE Trans Pattern Anal Mach Intell
Zhang Z, He J, Cao J, Li S, Li X, Zhang K, Wang P, Shi Y (2022) An explainable multi-sparsity multi-kernel nonconvex optimization least-squares classifier method via admm. Neural Comput Appl 8:1–26
Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97(1–2):245–271
John GH, Kohavi R, Pfleger K (1994) Irrelevant features and the subset selection problem. In: Machine Learning Proceedings 1994, pp. 121–129. Elsevier, Morgan Kaufmann
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32(2):407–499
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc: Ser B (Methodol) 58(1):267–288
Bradley PS, Mangasarian OL (1998) Feature selection via concave minimization and support vector machines. In: ICML, vol. 98, pp. 82–90. Citeseer
Zhu J, Rosset S, Tibshirani R, Hastie T (2003) 1-norm support vector machines. Adv Neural Inf Processing Syst 16:85
Maldonado S, López J (2015) An embedded feature selection approach for support vector classification via second-order cone programming. Intell Data Anal 19(6):1259–1273
Zheng X, Zhang L, Xu Z (2021) L1-norm laplacian support vector machine for data reduction in semi-supervised learning. Neural Comput Appl 2:1–18
Pappu V, Panagopoulos OP, Xanthopoulos P, Pardalos PM (2015) Sparse proximal support vector machines for feature selection in high dimensional datasets. Expert Syst Appl 42(23):9183–9191
Laporte L, Flamary R, Canu S, Déjean S, Mothe J (2013) Nonconvex regularizations for feature selection in ranking with sparse svm. IEEE Trans Neural Netw Learn Syst 25(6):1118–1130
Zhang T (2004) Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the Twenty-first International Conference on Machine Learning, p. 116
Cauchy A (1847) Méthode générale pour la résolution des systemes d’équations simultanées. Comp Rend Sci Paris 25:536–538
Haq AU, Li JP, Memon MH, Malik A, Ahmad T, Ali A, Nazir S, Ahad I, Shahid M (2019) Feature selection based on l1-norm support vector machine and effective recognition system for parkinson’s disease using voice recordings. IEEE Access 7:37718–37734
Gu Z, Zhang Z, Sun J, Li B (2017) Robust image recognition by l1-norm twin-projection support vector machine. Neurocomputing 223:1–11
Dekel O, Gilad-Bachrach R, Shamir O, Xiao L (2012) Optimal distributed online prediction using mini-batches. J Mach Learn Res 13:1
Roux N, Schmidt M, Bach F (2012) A stochastic gradient method with an exponential convergence rate for finite training sets. Adv Neural Inf Processing Syst 25:85
Johnson R, Zhang T (2013) Accelerating stochastic gradient descent using predictive variance reduction. Adv Neural Inf Processing Syst 26:52
Shang F, Zhou K, Liu H, Cheng J, Tsang IW, Zhang L, Tao D, Jiao L (2018) Vr-sgd: a simple stochastic variance reduction method for machine learning. IEEE Trans Knowl Data Eng 32(1):188–202
Chauhan VK, Sharma A, Dahiya K (2019) Saags: biased stochastic variance reduction methods for large-scale learning. Appl Intell 49(9):3331–3361
Nesterov Y (1983) A method of solving a convex programming problem with convergence rate o(k2). In: Doklady Akademii Nauk, vol. 269, pp. 543–547. Russian Academy of Sciences
Nesterov Y (2003) Introductory lectures on convex optimization: a basic course. Kluwer, Boston
Huang K, Zheng D, Sun J, Hotta Y, Fujimoto K, Naoi S (2010) Sparse learning for support vector classification. Pattern Recogn Lett 31(13):1944–1951
Zhang Z, He J, Gao G, Tian Y (2019) Bi-sparse optimization-based least squares regression. Appl Soft Comput 77:300–315
Gbeminiyi A (2018) Multi-class weather dataset for image classification. Mendeley Data 6:15–23
Meshram V, Patil K, Meshram V, Dhumane A, Thepade S, Hanchate D (2022) Smart low cost fruit picker for indian farmers. In: 2022 6th International conference on computing, communication, control and automation (ICCUBEA, pp. 1–7. IEEE)
Acknowledgements
The authors would like to thank editors and anonymous reviewers for their valuable comments and suggestions. This research has been partially supported by the Natural Science Foundation of Jiangsu under grant BK20241903, in part by the Major Program of the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under grant 22KJA520003, in part by the National Natural Science Foundation of China under grant 61877061, in part by the Key Program of the National Natural Science Foundation of China under grant 92046026 and 91646204, in part by the Jiangsu Provincial Key Research and Development Program under grant BE2020001-3, and in part by the Jiangsu Provincial Policy Guidance Program under grant BZ2020008.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no Conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed consent
Not applicable—no consent was needed to carry out this research.
Additional information
Publisher’s Note
Springer nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chen, Z., Zhang, Z., Li, S. et al. Accelerated multi-kernel sparse stochastic optimization classifier algorithm for explainable prediction. Pattern Anal Applic 27, 144 (2024). https://doi.org/10.1007/s10044-024-01367-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10044-024-01367-9