Accelerated multi-kernel sparse stochastic optimization classifier algorithm for explainable prediction

Chen, Zhirui; Zhang, Zhiwang; Li, Shuqing; Cao, Jie

doi:10.1007/s10044-024-01367-9

Accelerated multi-kernel sparse stochastic optimization classifier algorithm for explainable prediction

Theoretical Advances
Published: 14 November 2024

Volume 27, article number 144, (2024)
Cite this article

Pattern Analysis and Applications Aims and scope Submit manuscript

Zhirui Chen¹,
Zhiwang Zhang¹,
Shuqing Li¹ &
…
Jie Cao¹

79 Accesses
Explore all metrics

Abstract

For classification problems, training accurate and sparse models in limited time has been a longstanding challenge. When a large number of irrelevant and redundant features are collected in datasets, this problem becomes more difficult. Feature selection is often used to solve this problem. However, feature selection suffers from high time complexity and a trade-off between the number of selected features and predictive errors. As a solution to this problem, we propose an accelerated multi-kernel sparse stochastic optimization classifier (AMSSOC) algorithm. which reconstructs $\ell _{1}$-norm support vector classifier by introducing the $\ell _{0}$-norm function. AMSSOC minimizes the number of features by using an approximate $\ell _{0}$-norm function to the support vector classifier. Meanwhile, the novel column-wise multi-kernel matrix is integrated into the classifier to select key features and obtain the explainable prediction. Moreover, the AMSSOC algorithm employs the Nesterov’s method to accelerate the training speed of the classifier. Our main purpose is to find the feature subset as small as possible for improving the prediction accuracy and explainability. As shown in experimental results, the proposed AMSSOC outperforms the existing four classification methods on thirteen real-world and two real image datasets. At the same time, critical feature subsets are discovered, which play an important role for explainable prediction in practical applications. Also, the achieved outcomes demonstrate the superiority of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An explainable multi-sparsity multi-kernel nonconvex optimization least-squares classifier method via ADMM

Article 03 May 2022

Double sparse-representation feature selection algorithm for classification

Article 15 December 2016

Joint feature and instance selection using manifold data criteria: application to image classification

Article 20 August 2020

Data availability

In our experiments, all classification datasets are public datasets, which from the UCI machine learning database. They are available online.

References

Natarajan BK (1995) Sparse approximate solutions to linear systems. SIAM J Comput 24(2):227–234
Article MathSciNet Google Scholar
Luo J, Vong C-M, Wong P-K (2013) Sparse bayesian extreme learning machine for multi-classification. IEEE Trans Neural Netw Learn Syst 25(4):836–843
Google Scholar
Zhou Z-H (2021) Feature selection and sparse learning. In: Machine Learning, pp. 265–285. Springer, Singapore
Vapnik VN (1995) The nature of statistical learning. Theory
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Article Google Scholar
Wang H, Hu D (2005) Comparison of svm and ls-svm for regression. In: 2005 International Conference on Neural Networks and Brain, vol. 1, pp. 279–283. IEEE
Gönen M, Alpaydın E (2011) Multiple kernel learning algorithms. J Mach Learn Res 12:2211–2268
MathSciNet Google Scholar
Aiolli F, Donini M (2015) Easymkl: a scalable multiple kernel learning algorithm. Neurocomputing 169:215–224
Article Google Scholar
Zhang Z, Gao G, Yao T, He J, Tian Y (2020) An interpretable regression approach based on bi-sparse optimization. Appl Intell 50(11):4117–4142
Article Google Scholar
Pan X, Xu Y (2021) A safe feature elimination rule for l1-regularized logistic regression. IEEE Trans Pattern Anal Mach Intell
Zhang Z, He J, Cao J, Li S, Li X, Zhang K, Wang P, Shi Y (2022) An explainable multi-sparsity multi-kernel nonconvex optimization least-squares classifier method via admm. Neural Comput Appl 8:1–26
Google Scholar
Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97(1–2):245–271
Article MathSciNet Google Scholar
John GH, Kohavi R, Pfleger K (1994) Irrelevant features and the subset selection problem. In: Machine Learning Proceedings 1994, pp. 121–129. Elsevier, Morgan Kaufmann
Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32(2):407–499
Article MathSciNet Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc: Ser B (Methodol) 58(1):267–288
Article MathSciNet Google Scholar
Bradley PS, Mangasarian OL (1998) Feature selection via concave minimization and support vector machines. In: ICML, vol. 98, pp. 82–90. Citeseer
Zhu J, Rosset S, Tibshirani R, Hastie T (2003) 1-norm support vector machines. Adv Neural Inf Processing Syst 16:85
Google Scholar
Maldonado S, López J (2015) An embedded feature selection approach for support vector classification via second-order cone programming. Intell Data Anal 19(6):1259–1273
Article Google Scholar
Zheng X, Zhang L, Xu Z (2021) L1-norm laplacian support vector machine for data reduction in semi-supervised learning. Neural Comput Appl 2:1–18
Google Scholar
Pappu V, Panagopoulos OP, Xanthopoulos P, Pardalos PM (2015) Sparse proximal support vector machines for feature selection in high dimensional datasets. Expert Syst Appl 42(23):9183–9191
Article Google Scholar
Laporte L, Flamary R, Canu S, Déjean S, Mothe J (2013) Nonconvex regularizations for feature selection in ranking with sparse svm. IEEE Trans Neural Netw Learn Syst 25(6):1118–1130
Article Google Scholar
Zhang T (2004) Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the Twenty-first International Conference on Machine Learning, p. 116
Cauchy A (1847) Méthode générale pour la résolution des systemes d’équations simultanées. Comp Rend Sci Paris 25:536–538
Google Scholar
Haq AU, Li JP, Memon MH, Malik A, Ahmad T, Ali A, Nazir S, Ahad I, Shahid M (2019) Feature selection based on l1-norm support vector machine and effective recognition system for parkinson’s disease using voice recordings. IEEE Access 7:37718–37734
Article Google Scholar
Gu Z, Zhang Z, Sun J, Li B (2017) Robust image recognition by l1-norm twin-projection support vector machine. Neurocomputing 223:1–11
Article Google Scholar
Dekel O, Gilad-Bachrach R, Shamir O, Xiao L (2012) Optimal distributed online prediction using mini-batches. J Mach Learn Res 13:1
MathSciNet Google Scholar
Roux N, Schmidt M, Bach F (2012) A stochastic gradient method with an exponential convergence rate for finite training sets. Adv Neural Inf Processing Syst 25:85
Google Scholar
Johnson R, Zhang T (2013) Accelerating stochastic gradient descent using predictive variance reduction. Adv Neural Inf Processing Syst 26:52
Google Scholar
Shang F, Zhou K, Liu H, Cheng J, Tsang IW, Zhang L, Tao D, Jiao L (2018) Vr-sgd: a simple stochastic variance reduction method for machine learning. IEEE Trans Knowl Data Eng 32(1):188–202
Article Google Scholar
Chauhan VK, Sharma A, Dahiya K (2019) Saags: biased stochastic variance reduction methods for large-scale learning. Appl Intell 49(9):3331–3361
Article Google Scholar
Nesterov Y (1983) A method of solving a convex programming problem with convergence rate o(k2). In: Doklady Akademii Nauk, vol. 269, pp. 543–547. Russian Academy of Sciences
Nesterov Y (2003) Introductory lectures on convex optimization: a basic course. Kluwer, Boston
Google Scholar
Huang K, Zheng D, Sun J, Hotta Y, Fujimoto K, Naoi S (2010) Sparse learning for support vector classification. Pattern Recogn Lett 31(13):1944–1951
Article Google Scholar
Zhang Z, He J, Gao G, Tian Y (2019) Bi-sparse optimization-based least squares regression. Appl Soft Comput 77:300–315
Article Google Scholar
Gbeminiyi A (2018) Multi-class weather dataset for image classification. Mendeley Data 6:15–23
Google Scholar
Meshram V, Patil K, Meshram V, Dhumane A, Thepade S, Hanchate D (2022) Smart low cost fruit picker for indian farmers. In: 2022 6th International conference on computing, communication, control and automation (ICCUBEA, pp. 1–7. IEEE)

Download references

Acknowledgements

The authors would like to thank editors and anonymous reviewers for their valuable comments and suggestions. This research has been partially supported by the Natural Science Foundation of Jiangsu under grant BK20241903, in part by the Major Program of the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under grant 22KJA520003, in part by the National Natural Science Foundation of China under grant 61877061, in part by the Key Program of the National Natural Science Foundation of China under grant 92046026 and 91646204, in part by the Jiangsu Provincial Key Research and Development Program under grant BE2020001-3, and in part by the Jiangsu Provincial Policy Guidance Program under grant BZ2020008.

Author information

Authors and Affiliations

College of Information Engineering, Nanjing University of Finance and Economics, Nanjing, 210023, China
Zhirui Chen, Zhiwang Zhang, Shuqing Li & Jie Cao

Authors

Zhirui Chen
View author publications
You can also search for this author inPubMed Google Scholar
Zhiwang Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Shuqing Li
View author publications
You can also search for this author inPubMed Google Scholar
Jie Cao
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Zhiwang Zhang.

Ethics declarations

Conflict of interest

The authors declare no Conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

Not applicable—no consent was needed to carry out this research.

Additional information

Publisher’s Note

Springer nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, Z., Zhang, Z., Li, S. et al. Accelerated multi-kernel sparse stochastic optimization classifier algorithm for explainable prediction. Pattern Anal Applic 27, 144 (2024). https://doi.org/10.1007/s10044-024-01367-9

Download citation

Received: 21 May 2023
Accepted: 17 October 2024
Published: 14 November 2024
DOI: https://doi.org/10.1007/s10044-024-01367-9

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accelerated multi-kernel sparse stochastic optimization classifier algorithm for explainable prediction

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

An explainable multi-sparsity multi-kernel nonconvex optimization least-squares classifier method via ADMM

Double sparse-representation feature selection algorithm for classification

Joint feature and instance selection using manifold data criteria: application to image classification

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now