Skip to main content
Log in

Fast newton method to solve KLR based on multilevel circulant matrix with log-linear complexity

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Kernel logistic regression (KLR) is a conventional nonlinear classifier in machine learning. With the explosive growth of data size, the storage and computation of large dense kernel matrices is a major challenge in scaling KLR. Even when the nyström approximation is applied to solve KLR, the corresponding method faces time complexity of \(\varvec{O}(\varvec{nc}^{\varvec{2}})\) and space complexity of \(\varvec{O(nc)}\), where n is the number of training instances and c is the sample size. We propose a fast Newton method to efficiently solve large-scale KLR problems by exploiting the storage and computing advantages of a multilevel circulant matrix (MCM). By approximating the kernel matrix with an MCM, the storage space is reduced to \(\varvec{O(n)}\), and further approximating the coefficient matrix of the Newton equation as an MCM, the computational complexity of Newton iteration is reduced to \(\varvec{O}(\varvec{n \log n})\). The proposed method can run in log-linear time complexity per iteration, because the multiplication of an MCM (or its inverse) and a vector can be implemented by the multidimensional fast Fourier transform (mFFT). Experimental results on some large-scale binary- and multi-classification problems show that the proposed method enables KLR to scale to large scale problems with less memory consumption and less training time without sacrificing test accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data Availability

Some or all data, models, or code generated or used during the study are available from the corresponding author by request.

Notes

  1. Codes are available in https://github.com/cnmusco/recursive-nystrom.

  2. Codes are available in http://www.lfhsgre.org.

References

  1. Cawley GC, Talbot NLC (2008) Efficient approximate leave-one-out cross-validation for kernel logistic regression. Mach Learn 71(2):243–264

    Article  MATH  Google Scholar 

  2. Choudhury A (2021) Predicting cancer using supervised machine learning: Mesothelioma. Technol Health Care 29(1):45–58

    Article  Google Scholar 

  3. Yang G, Zhou Y, Sun L, Shi Y (2019) Logistic model based on Benford’s law and its application in fraud detection. Journal of Statistics and Information 34(8):50–56 ((in Chinese))

    Google Scholar 

  4. Chen W, Shahabi H, Shirzadi A, Hong H, Akgun A, Tian Y, Liu J, Zhu A, Li S et al (2019) Novel hybrid artificial intelligence approach of bivariate statistical-methods-based kernel logistic regression classifier for landslide susceptibility modeling. Bull Eng Geol Env 78(6):4397–4419

    Article  Google Scholar 

  5. Zhu J, Hastie T (2005) Kernel logistic regression and the import vector machine. J Comput Graph Stat 14(1):185–205

    Article  MathSciNet  Google Scholar 

  6. Sugiyama M, Simm J (2010) A computationally-efficient alternative to kernel logistic regression. In Proceedings of the 2010 IEEE International Workshop on Machine Learning for Signal Processing pages 124–129, Kittila, Finland

  7. Keerthi SS, Duan KB, Shevade SK, Poo AN (2005) A fast dual algorithm for kernel logistic regression. Mach Learn 61(1):151–165

    Article  MATH  Google Scholar 

  8. Lei D, Tang J, Li Z, Wu Y (2019) Using low-rank approximations to speed up kernel logistic regression algorithm. IEEE Access 7:84242–84252

    Article  Google Scholar 

  9. Williams C, Seeger M (2001) Using the Nyström method to speed up kernel machines. In Proceedings of the 13th International Conference on Neural Information Processing Systems pages 682–688, Cambridge, MA

  10. Fang K, Liu F, Huang X, Yang J (2023) End-to-end kernel learning via generative random fourier features. Pattern Recogn 134:109057

  11. He L, Zhang H (2018) Kernel k-means sampling for nyström approximation. IEEE Trans Image Process 27(5):2108–2120

    Article  MathSciNet  MATH  Google Scholar 

  12. Li M, Bi W, Kwok JT, Lu BL (2014) Large-scale Nyström kernel matrix approximation using randomized SVD. IEEE Transactions on Neural Networks and Learning Systems 26(1):152–164

    Google Scholar 

  13. Alaoui A, Mahoney MW (2015) Fast randomized kernel ridge regression with statistical guarantees. In Proceedings of the 28th International Conference on Neural Information Processing Systems pages 775–783, Montreal, Canada

  14. Musco C, Musco C (2017) Recursive sampling for the Nyström method. In Proceedings of the 31st International Conference on Neural Information Processing Systems pages 3836–3848, Red Hook, NY, USA

  15. Gittens A, Mahoney MW (2016) Revisiting the Nyström method for improved large-scale machine learning. The Journal of Machine Learning Research 17(1):3977–4041

    MATH  Google Scholar 

  16. Rahimi A, Recht B (2007) Random features for large-scale kernel machines. In Proceedings of the 20th International Conference on Neural Information Processing Systems pages 1177–1184, Red Hook, NY, USA

  17. He L, Ray N, Guan Y, Zhang H (2019) Fast large-scale spectral clustering via explicit feature mapping. IEEE Transactions on Cybernetics 49(3):1058–1071

    Article  Google Scholar 

  18. Feng C, Hu Q, Liao S (2015) Random feature mapping with signed circulant matrix projection. In Proceedings of the 24th International Joint Conference on Artificial Intelligence pages 3490–3496, Buenos Aires, Argentina

  19. Xiong K, Iu HHC, Wang S (2020) Kernel correntropy conjugate gradient algorithms based on half-quadratic optimization. IEEE Transactions on Cybernetics 51(11):5497–5510

    Article  Google Scholar 

  20. Li Z, Ton JF, Oglic D, Sejdinovic D (2021) Towards a unified analysis of random fourier features. J Mach Learn Res 22(108):1–51

    MathSciNet  MATH  Google Scholar 

  21. Liu F, Huang X, Chen Y, Suykens JAK (2021) Random features for kernel approximation: A survey on algorithms, theory, and beyond. IEEE Transactions on Pattern Analysis and Machine Intelligence

  22. Song G (2009) Approximation of kernel matrices in machine learning. PhD thesis, Department of Mathematics, Syracuse University, Syracuse, NY, USA

  23. Song G, Xu Y (2010) Approximation of high-dimensional kernel matrices by multilevel circulant matrices. J Complex 26(4):375–405

    Article  MathSciNet  MATH  Google Scholar 

  24. Ding L, Liao S (2011) Approximate model selection for large scale LSSVM. The 3rd Asian Conference on Machine Learning. Taoyuan, Taiwan, pp 165–180

    Google Scholar 

  25. Edwards RE, Zhang H, Parker LE, New JR (2013) Approximate \(l\)-fold cross-validation with least squares SVM and kernel ridge regression. In Proceedings of the 2013 12th International Conference on Machine Learning and Applications pages 58–64, NW Washington, DC, United States

  26. Ding L, Liao S (2017) An approximate approach to automatic kernel selection. IEEE Transactions on Cybernetics 47(3):554–565

    Article  Google Scholar 

  27. Ding L, Liao S, Liu Y, Liu L, Zhu F, Yao Y, Shao L, Gao X (2020) Approximate kernel selection via matrix approximation. IEEE Transactions on Neural Networks and Learning Systems 31(11):4881–4891

    Article  MathSciNet  Google Scholar 

  28. Yin R, Liu Y, Wang W, Meng D (2019) Sketch kernel ridge regression using circulant matrix: Algorithm and theory. IEEE Transactions on Neural Networks and Learning Systems 31(9):3512–3524

  29. Sato H (2022) Riemannian conjugate gradient methods: General framework and specific algorithms with convergence analyses. SIAM J Optim 32(4):2690–2717

    Article  MathSciNet  MATH  Google Scholar 

  30. Guevara J, Mendel JM, Hirata R (2022) Fuzzy-system kernel machines: A kernel method based on the connections between fuzzy inference systems and kernel machines. IEEE Trans Fuzzy Syst 30(10):4447–4459

    Article  Google Scholar 

  31. Dennis JE Jr, Schnabel RB (1983) Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Prentice-Hall, Englewoods Cliffs

    MATH  Google Scholar 

  32. Tyrtyshnikov EE (1996) A unifying approach to some old and new theorems on distribution and clustering. Linear Algebra Appl 232:1–43

    Article  MathSciNet  MATH  Google Scholar 

  33. Davis PJ (1979) Circulant Matrices. Wiley

    MATH  Google Scholar 

  34. Boyd S, Boyd SP, Vandenberghe L (2004) Convex Optimization. Cambridge University Press

    Book  MATH  Google Scholar 

  35. Galli L, Lin CJ (2022) A study on truncated newton methods for linear classification. IEEE Transactions on Neural Networks and Learning Systems 33(7):2828–2841

    Article  MathSciNet  Google Scholar 

  36. Lin CJ, Chang CC (2011) LIBSVM: a library for support vector machines. https://www.csie.ntu.edu.tw/~cjlin/libSVM/

  37. Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml

  38. Ho TK, Kleinberg EM (1996) Building projectable classifiers of arbitrary complexity. In Proceedings of 13th International Conference on Pattern Recognition pages 880–885, Vienna, Austria

  39. Mangasarian OL, Musicant DR (2001) Lagrangian support vector machines. The Journal of Machine Learning Research 1(3):161–177

    MathSciNet  MATH  Google Scholar 

  40. Zhou S (2016) Sparse LSSVM in primal using Cholesky factorization for large-scale problems. IEEE Transactions on Neural Networks and Learning Systems 27(4):783–795

    Article  MathSciNet  Google Scholar 

  41. Chen L, Zhou S (2018) Sparse algorithm for robust LSSVM in primal space. Neurocomputing 275(C):2880–2891

  42. Vapnik V (2013) The Nature of Statistical Learning Theory. Springer Science & Business Media

  43. Takahashi K, Yamamoto K, Kuchiba A, Koyama T (2022) Confidence interval for micro-averaged F1 and macro-averaged F1 scores. Appl Intell 52(5):4961–4972

  44. Zhou T, Lu H, Yang Z, Qiu S, Huo B, Dong Y (2021) The ensemble deep learning model for novel COVID-19 on CT images. Appl Soft Comput 98:106885

  45. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1-30

    MathSciNet  MATH  Google Scholar 

  46. Zhou S, Cui J, Ye F, Liu H, Zhu Q (2013) New smoothing SVM algorithm with tight error bound and efficient reduced techniques. Comput Optim Appl 56(3):599–617

    Article  MathSciNet  MATH  Google Scholar 

  47. Chauhan VK, Dahiya K, Sharma A (2019) Problem formulations and solvers in linear SVM: a review. Artif Intell Rev 52(2):803–855

    Article  Google Scholar 

  48. Zhou S, Zhou W (2021) Unified SVM algorithm based on LS-DC loss. Machine Learning

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China [Grants numbers 61772020].

Funding

This study was funded by the National Natural Science Foundation of China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shuisheng Zhou.

Ethics declarations

Conflict of interest/Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, J., Zhou, S., Fu, C. et al. Fast newton method to solve KLR based on multilevel circulant matrix with log-linear complexity. Appl Intell 53, 21407–21421 (2023). https://doi.org/10.1007/s10489-023-04606-4

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-023-04606-4

Keywords

Navigation