Skip to main content
Log in

Accelerated multi-kernel sparse stochastic optimization classifier algorithm for explainable prediction

  • Theoretical Advances
  • Published:
Pattern Analysis and Applications Aims and scope Submit manuscript

Abstract

For classification problems, training accurate and sparse models in limited time has been a longstanding challenge. When a large number of irrelevant and redundant features are collected in datasets, this problem becomes more difficult. Feature selection is often used to solve this problem. However, feature selection suffers from high time complexity and a trade-off between the number of selected features and predictive errors. As a solution to this problem, we propose an accelerated multi-kernel sparse stochastic optimization classifier (AMSSOC) algorithm. which reconstructs \(\ell _{1}\)-norm support vector classifier by introducing the \(\ell _{0}\)-norm function. AMSSOC minimizes the number of features by using an approximate \(\ell _{0}\)-norm function to the support vector classifier. Meanwhile, the novel column-wise multi-kernel matrix is integrated into the classifier to select key features and obtain the explainable prediction. Moreover, the AMSSOC algorithm employs the Nesterov’s method to accelerate the training speed of the classifier. Our main purpose is to find the feature subset as small as possible for improving the prediction accuracy and explainability. As shown in experimental results, the proposed AMSSOC outperforms the existing four classification methods on thirteen real-world and two real image datasets. At the same time, critical feature subsets are discovered, which play an important role for explainable prediction in practical applications. Also, the achieved outcomes demonstrate the superiority of the proposed algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

In our experiments, all classification datasets are public datasets, which from the UCI machine learning database. They are available online.

References

  1. Natarajan BK (1995) Sparse approximate solutions to linear systems. SIAM J Comput 24(2):227–234

    Article  MathSciNet  Google Scholar 

  2. Luo J, Vong C-M, Wong P-K (2013) Sparse bayesian extreme learning machine for multi-classification. IEEE Trans Neural Netw Learn Syst 25(4):836–843

    Google Scholar 

  3. Zhou Z-H (2021) Feature selection and sparse learning. In: Machine Learning, pp. 265–285. Springer, Singapore

  4. Vapnik VN (1995) The nature of statistical learning. Theory

  5. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    Article  Google Scholar 

  6. Wang H, Hu D (2005) Comparison of svm and ls-svm for regression. In: 2005 International Conference on Neural Networks and Brain, vol. 1, pp. 279–283. IEEE

  7. Gönen M, Alpaydın E (2011) Multiple kernel learning algorithms. J Mach Learn Res 12:2211–2268

    MathSciNet  Google Scholar 

  8. Aiolli F, Donini M (2015) Easymkl: a scalable multiple kernel learning algorithm. Neurocomputing 169:215–224

    Article  Google Scholar 

  9. Zhang Z, Gao G, Yao T, He J, Tian Y (2020) An interpretable regression approach based on bi-sparse optimization. Appl Intell 50(11):4117–4142

    Article  Google Scholar 

  10. Pan X, Xu Y (2021) A safe feature elimination rule for l1-regularized logistic regression. IEEE Trans Pattern Anal Mach Intell

  11. Zhang Z, He J, Cao J, Li S, Li X, Zhang K, Wang P, Shi Y (2022) An explainable multi-sparsity multi-kernel nonconvex optimization least-squares classifier method via admm. Neural Comput Appl 8:1–26

    Google Scholar 

  12. Blum AL, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97(1–2):245–271

    Article  MathSciNet  Google Scholar 

  13. John GH, Kohavi R, Pfleger K (1994) Irrelevant features and the subset selection problem. In: Machine Learning Proceedings 1994, pp. 121–129. Elsevier, Morgan Kaufmann

  14. Efron B, Hastie T, Johnstone I, Tibshirani R (2004) Least angle regression. Ann Stat 32(2):407–499

    Article  MathSciNet  Google Scholar 

  15. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc: Ser B (Methodol) 58(1):267–288

    Article  MathSciNet  Google Scholar 

  16. Bradley PS, Mangasarian OL (1998) Feature selection via concave minimization and support vector machines. In: ICML, vol. 98, pp. 82–90. Citeseer

  17. Zhu J, Rosset S, Tibshirani R, Hastie T (2003) 1-norm support vector machines. Adv Neural Inf Processing Syst 16:85

    Google Scholar 

  18. Maldonado S, López J (2015) An embedded feature selection approach for support vector classification via second-order cone programming. Intell Data Anal 19(6):1259–1273

    Article  Google Scholar 

  19. Zheng X, Zhang L, Xu Z (2021) L1-norm laplacian support vector machine for data reduction in semi-supervised learning. Neural Comput Appl 2:1–18

    Google Scholar 

  20. Pappu V, Panagopoulos OP, Xanthopoulos P, Pardalos PM (2015) Sparse proximal support vector machines for feature selection in high dimensional datasets. Expert Syst Appl 42(23):9183–9191

    Article  Google Scholar 

  21. Laporte L, Flamary R, Canu S, Déjean S, Mothe J (2013) Nonconvex regularizations for feature selection in ranking with sparse svm. IEEE Trans Neural Netw Learn Syst 25(6):1118–1130

    Article  Google Scholar 

  22. Zhang T (2004) Solving large scale linear prediction problems using stochastic gradient descent algorithms. In: Proceedings of the Twenty-first International Conference on Machine Learning, p. 116

  23. Cauchy A (1847) Méthode générale pour la résolution des systemes d’équations simultanées. Comp Rend Sci Paris 25:536–538

    Google Scholar 

  24. Haq AU, Li JP, Memon MH, Malik A, Ahmad T, Ali A, Nazir S, Ahad I, Shahid M (2019) Feature selection based on l1-norm support vector machine and effective recognition system for parkinson’s disease using voice recordings. IEEE Access 7:37718–37734

    Article  Google Scholar 

  25. Gu Z, Zhang Z, Sun J, Li B (2017) Robust image recognition by l1-norm twin-projection support vector machine. Neurocomputing 223:1–11

    Article  Google Scholar 

  26. Dekel O, Gilad-Bachrach R, Shamir O, Xiao L (2012) Optimal distributed online prediction using mini-batches. J Mach Learn Res 13:1

    MathSciNet  Google Scholar 

  27. Roux N, Schmidt M, Bach F (2012) A stochastic gradient method with an exponential convergence rate for finite training sets. Adv Neural Inf Processing Syst 25:85

    Google Scholar 

  28. Johnson R, Zhang T (2013) Accelerating stochastic gradient descent using predictive variance reduction. Adv Neural Inf Processing Syst 26:52

    Google Scholar 

  29. Shang F, Zhou K, Liu H, Cheng J, Tsang IW, Zhang L, Tao D, Jiao L (2018) Vr-sgd: a simple stochastic variance reduction method for machine learning. IEEE Trans Knowl Data Eng 32(1):188–202

    Article  Google Scholar 

  30. Chauhan VK, Sharma A, Dahiya K (2019) Saags: biased stochastic variance reduction methods for large-scale learning. Appl Intell 49(9):3331–3361

    Article  Google Scholar 

  31. Nesterov Y (1983) A method of solving a convex programming problem with convergence rate o(k2). In: Doklady Akademii Nauk, vol. 269, pp. 543–547. Russian Academy of Sciences

  32. Nesterov Y (2003) Introductory lectures on convex optimization: a basic course. Kluwer, Boston

    Google Scholar 

  33. Huang K, Zheng D, Sun J, Hotta Y, Fujimoto K, Naoi S (2010) Sparse learning for support vector classification. Pattern Recogn Lett 31(13):1944–1951

    Article  Google Scholar 

  34. Zhang Z, He J, Gao G, Tian Y (2019) Bi-sparse optimization-based least squares regression. Appl Soft Comput 77:300–315

    Article  Google Scholar 

  35. Gbeminiyi A (2018) Multi-class weather dataset for image classification. Mendeley Data 6:15–23

    Google Scholar 

  36. Meshram V, Patil K, Meshram V, Dhumane A, Thepade S, Hanchate D (2022) Smart low cost fruit picker for indian farmers. In: 2022 6th International conference on computing, communication, control and automation (ICCUBEA, pp. 1–7. IEEE)

Download references

Acknowledgements

The authors would like to thank editors and anonymous reviewers for their valuable comments and suggestions. This research has been partially supported by the Natural Science Foundation of Jiangsu under grant BK20241903, in part by the Major Program of the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under grant 22KJA520003, in part by the National Natural Science Foundation of China under grant 61877061, in part by the Key Program of the National Natural Science Foundation of China under grant 92046026 and 91646204, in part by the Jiangsu Provincial Key Research and Development Program under grant BE2020001-3, and in part by the Jiangsu Provincial Policy Guidance Program under grant BZ2020008.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhiwang Zhang.

Ethics declarations

Conflict of interest

The authors declare no Conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed consent

Not applicable—no consent was needed to carry out this research.

Additional information

Publisher’s Note

Springer nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, Z., Zhang, Z., Li, S. et al. Accelerated multi-kernel sparse stochastic optimization classifier algorithm for explainable prediction. Pattern Anal Applic 27, 144 (2024). https://doi.org/10.1007/s10044-024-01367-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10044-024-01367-9

Keywords