Abstract
Kernel logistic regression (KLR) is a conventional nonlinear classifier in machine learning. With the explosive growth of data size, the storage and computation of large dense kernel matrices is a major challenge in scaling KLR. Even when the nyström approximation is applied to solve KLR, the corresponding method faces time complexity of \(\varvec{O}(\varvec{nc}^{\varvec{2}})\) and space complexity of \(\varvec{O(nc)}\), where n is the number of training instances and c is the sample size. We propose a fast Newton method to efficiently solve large-scale KLR problems by exploiting the storage and computing advantages of a multilevel circulant matrix (MCM). By approximating the kernel matrix with an MCM, the storage space is reduced to \(\varvec{O(n)}\), and further approximating the coefficient matrix of the Newton equation as an MCM, the computational complexity of Newton iteration is reduced to \(\varvec{O}(\varvec{n \log n})\). The proposed method can run in log-linear time complexity per iteration, because the multiplication of an MCM (or its inverse) and a vector can be implemented by the multidimensional fast Fourier transform (mFFT). Experimental results on some large-scale binary- and multi-classification problems show that the proposed method enables KLR to scale to large scale problems with less memory consumption and less training time without sacrificing test accuracy.
Similar content being viewed by others
Data Availability
Some or all data, models, or code generated or used during the study are available from the corresponding author by request.
Notes
Codes are available in https://github.com/cnmusco/recursive-nystrom.
Codes are available in http://www.lfhsgre.org.
References
Cawley GC, Talbot NLC (2008) Efficient approximate leave-one-out cross-validation for kernel logistic regression. Mach Learn 71(2):243–264
Choudhury A (2021) Predicting cancer using supervised machine learning: Mesothelioma. Technol Health Care 29(1):45–58
Yang G, Zhou Y, Sun L, Shi Y (2019) Logistic model based on Benford’s law and its application in fraud detection. Journal of Statistics and Information 34(8):50–56 ((in Chinese))
Chen W, Shahabi H, Shirzadi A, Hong H, Akgun A, Tian Y, Liu J, Zhu A, Li S et al (2019) Novel hybrid artificial intelligence approach of bivariate statistical-methods-based kernel logistic regression classifier for landslide susceptibility modeling. Bull Eng Geol Env 78(6):4397–4419
Zhu J, Hastie T (2005) Kernel logistic regression and the import vector machine. J Comput Graph Stat 14(1):185–205
Sugiyama M, Simm J (2010) A computationally-efficient alternative to kernel logistic regression. In Proceedings of the 2010 IEEE International Workshop on Machine Learning for Signal Processing pages 124–129, Kittila, Finland
Keerthi SS, Duan KB, Shevade SK, Poo AN (2005) A fast dual algorithm for kernel logistic regression. Mach Learn 61(1):151–165
Lei D, Tang J, Li Z, Wu Y (2019) Using low-rank approximations to speed up kernel logistic regression algorithm. IEEE Access 7:84242–84252
Williams C, Seeger M (2001) Using the Nyström method to speed up kernel machines. In Proceedings of the 13th International Conference on Neural Information Processing Systems pages 682–688, Cambridge, MA
Fang K, Liu F, Huang X, Yang J (2023) End-to-end kernel learning via generative random fourier features. Pattern Recogn 134:109057
He L, Zhang H (2018) Kernel k-means sampling for nyström approximation. IEEE Trans Image Process 27(5):2108–2120
Li M, Bi W, Kwok JT, Lu BL (2014) Large-scale Nyström kernel matrix approximation using randomized SVD. IEEE Transactions on Neural Networks and Learning Systems 26(1):152–164
Alaoui A, Mahoney MW (2015) Fast randomized kernel ridge regression with statistical guarantees. In Proceedings of the 28th International Conference on Neural Information Processing Systems pages 775–783, Montreal, Canada
Musco C, Musco C (2017) Recursive sampling for the Nyström method. In Proceedings of the 31st International Conference on Neural Information Processing Systems pages 3836–3848, Red Hook, NY, USA
Gittens A, Mahoney MW (2016) Revisiting the Nyström method for improved large-scale machine learning. The Journal of Machine Learning Research 17(1):3977–4041
Rahimi A, Recht B (2007) Random features for large-scale kernel machines. In Proceedings of the 20th International Conference on Neural Information Processing Systems pages 1177–1184, Red Hook, NY, USA
He L, Ray N, Guan Y, Zhang H (2019) Fast large-scale spectral clustering via explicit feature mapping. IEEE Transactions on Cybernetics 49(3):1058–1071
Feng C, Hu Q, Liao S (2015) Random feature mapping with signed circulant matrix projection. In Proceedings of the 24th International Joint Conference on Artificial Intelligence pages 3490–3496, Buenos Aires, Argentina
Xiong K, Iu HHC, Wang S (2020) Kernel correntropy conjugate gradient algorithms based on half-quadratic optimization. IEEE Transactions on Cybernetics 51(11):5497–5510
Li Z, Ton JF, Oglic D, Sejdinovic D (2021) Towards a unified analysis of random fourier features. J Mach Learn Res 22(108):1–51
Liu F, Huang X, Chen Y, Suykens JAK (2021) Random features for kernel approximation: A survey on algorithms, theory, and beyond. IEEE Transactions on Pattern Analysis and Machine Intelligence
Song G (2009) Approximation of kernel matrices in machine learning. PhD thesis, Department of Mathematics, Syracuse University, Syracuse, NY, USA
Song G, Xu Y (2010) Approximation of high-dimensional kernel matrices by multilevel circulant matrices. J Complex 26(4):375–405
Ding L, Liao S (2011) Approximate model selection for large scale LSSVM. The 3rd Asian Conference on Machine Learning. Taoyuan, Taiwan, pp 165–180
Edwards RE, Zhang H, Parker LE, New JR (2013) Approximate \(l\)-fold cross-validation with least squares SVM and kernel ridge regression. In Proceedings of the 2013 12th International Conference on Machine Learning and Applications pages 58–64, NW Washington, DC, United States
Ding L, Liao S (2017) An approximate approach to automatic kernel selection. IEEE Transactions on Cybernetics 47(3):554–565
Ding L, Liao S, Liu Y, Liu L, Zhu F, Yao Y, Shao L, Gao X (2020) Approximate kernel selection via matrix approximation. IEEE Transactions on Neural Networks and Learning Systems 31(11):4881–4891
Yin R, Liu Y, Wang W, Meng D (2019) Sketch kernel ridge regression using circulant matrix: Algorithm and theory. IEEE Transactions on Neural Networks and Learning Systems 31(9):3512–3524
Sato H (2022) Riemannian conjugate gradient methods: General framework and specific algorithms with convergence analyses. SIAM J Optim 32(4):2690–2717
Guevara J, Mendel JM, Hirata R (2022) Fuzzy-system kernel machines: A kernel method based on the connections between fuzzy inference systems and kernel machines. IEEE Trans Fuzzy Syst 30(10):4447–4459
Dennis JE Jr, Schnabel RB (1983) Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Prentice-Hall, Englewoods Cliffs
Tyrtyshnikov EE (1996) A unifying approach to some old and new theorems on distribution and clustering. Linear Algebra Appl 232:1–43
Davis PJ (1979) Circulant Matrices. Wiley
Boyd S, Boyd SP, Vandenberghe L (2004) Convex Optimization. Cambridge University Press
Galli L, Lin CJ (2022) A study on truncated newton methods for linear classification. IEEE Transactions on Neural Networks and Learning Systems 33(7):2828–2841
Lin CJ, Chang CC (2011) LIBSVM: a library for support vector machines. https://www.csie.ntu.edu.tw/~cjlin/libSVM/
Dua D, Graff C (2017) UCI machine learning repository. http://archive.ics.uci.edu/ml
Ho TK, Kleinberg EM (1996) Building projectable classifiers of arbitrary complexity. In Proceedings of 13th International Conference on Pattern Recognition pages 880–885, Vienna, Austria
Mangasarian OL, Musicant DR (2001) Lagrangian support vector machines. The Journal of Machine Learning Research 1(3):161–177
Zhou S (2016) Sparse LSSVM in primal using Cholesky factorization for large-scale problems. IEEE Transactions on Neural Networks and Learning Systems 27(4):783–795
Chen L, Zhou S (2018) Sparse algorithm for robust LSSVM in primal space. Neurocomputing 275(C):2880–2891
Vapnik V (2013) The Nature of Statistical Learning Theory. Springer Science & Business Media
Takahashi K, Yamamoto K, Kuchiba A, Koyama T (2022) Confidence interval for micro-averaged F1 and macro-averaged F1 scores. Appl Intell 52(5):4961–4972
Zhou T, Lu H, Yang Z, Qiu S, Huo B, Dong Y (2021) The ensemble deep learning model for novel COVID-19 on CT images. Appl Soft Comput 98:106885
Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1-30
Zhou S, Cui J, Ye F, Liu H, Zhu Q (2013) New smoothing SVM algorithm with tight error bound and efficient reduced techniques. Comput Optim Appl 56(3):599–617
Chauhan VK, Dahiya K, Sharma A (2019) Problem formulations and solvers in linear SVM: a review. Artif Intell Rev 52(2):803–855
Zhou S, Zhou W (2021) Unified SVM algorithm based on LS-DC loss. Machine Learning
Acknowledgements
This work was supported by the National Natural Science Foundation of China [Grants numbers 61772020].
Funding
This study was funded by the National Natural Science Foundation of China.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest/Competing interests
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhang, J., Zhou, S., Fu, C. et al. Fast newton method to solve KLR based on multilevel circulant matrix with log-linear complexity. Appl Intell 53, 21407–21421 (2023). https://doi.org/10.1007/s10489-023-04606-4
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04606-4