Major advancements in kernel function approximation

Francis, Deena P.; Raimond, Kumudha

doi:10.1007/s10462-020-09880-z

Major advancements in kernel function approximation

Published: 01 August 2020

Volume 54, pages 843–876, (2021)
Cite this article

Artificial Intelligence Review Aims and scope Submit manuscript

1093 Accesses
11 Citations
Explore all metrics

Abstract

Kernel based methods have become popular in a wide variety of machine learning tasks. They rely on the computation of kernel functions, which implicitly transform the data in its input space to data in a very high dimensional space. Efficient application of these functions have been subject to study in the last 10 years. The main focus was on improving the scalability of kernel based methods. In this regard, kernel function approximation using explicit feature maps have emerged as a substitute for traditional kernel based methods. Over the years, various advancements from the theoretical perspective have been made to explicit kernel maps, especially to the method of random Fourier features (RFF), which is the main focus of our work. In this work, the major developments in the theory of kernel function approximation are reviewed in a systematic manner and the practical applications are discussed. Furthermore, we identify the shortcomings of the current research, and discuss possible avenues for future work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Kernel Methods

Continuous Kernel Learning

References

Ailon N, Chazelle B (2006) Approximate nearest neighbors and the fast Johnson–Lindenstrauss transform. In: Proceedings of the thirty-eighth annual ACM symposium on theory of computing, pp 557–563
Aronszajn N (1950) Theory of reproducing kernels. Trans Am Math Soc 68(3):337–404
Article MathSciNet Google Scholar
Avron H, Sindhwani V, Yang J, Mahoney MW (2016) Quasi-Monte Carlo feature maps for shift-invariant kernels. J Mach Learn Res 17(1):4096–4133
MathSciNet MATH Google Scholar
Avron H, Kapralov M, Musco C, Musco C, Velingker A, Zandieh A (2017) Random Fourier features for kernel ridge regression: approximation bounds and statistical guarantees. In: Proceedings of the 34th international conference on machine learning, PMLR, Proceedings of machine learning research, vol 70, pp 253–262, http://proceedings.mlr.press/v70/avron17a.html
Bach F (2013) Sharp analysis of low-rank kernel matrix approximations. In: Conference on learning theory, pp 185–209
Bach F (2017) On the equivalence between kernel quadrature rules and random feature expansions. J Mach Learn Res 18(21):1–38
MathSciNet MATH Google Scholar
Bouboulis P, Chouvardas S, Theodoridis S (2017) Online distributed learning over networks in RKH spaces using random Fourier features. IEEE Trans Signal Process 66(7):1920–1932
Article MathSciNet Google Scholar
Carratino L, Rudi A, Rosasco L (2018) Learning with SGD and random features. In: Advances in neural information processing systems, pp 10192–10203
Cesa-Bianchi N, Conconi A, Gentile C (2004) On the generalization ability of on-line learning algorithms. IEEE Trans Inf Theory 50(9):2050–2057
Article MathSciNet Google Scholar
Chang PC, Wu JL (2015) A critical feature extraction by kernel PCA in stock trading model. Soft Comput Fusion Found Methodol Appl 19(5):1393–1408
Google Scholar
Charikar M, Chen K, Farach-Colton M (2002) Finding frequent items in data streams. In: International colloquium on automata, languages, and programming. Springer, pp 693–703
Chaudhuri K, Monteleoni C, Sarwate AD (2011) Differentially private empirical risk minimization. J Mach Learn Res 12(Mar):1069–1109
MathSciNet MATH Google Scholar
Chitta R, Jin R, Jain AK (2012) Efficient kernel clustering using random Fourier features. In: 2012 IEEE 12th international conference on data mining (ICDM). IEEE, pp 161–170
Choromanski K, Sindhwani V (2016) Recycling randomness with structure for sublinear time kernel expansions. In: Proceedings of the 33rd international conference on international conference on machine learning, vol 48, JMLR.org, pp 2502–2510
Choromanski KM, Rowland M, Weller A (2017) The unreasonable effectiveness of structured random orthogonal embeddings. In: Advances in neural information processing systems, pp 219–228
Cohen MB, Musco C, Musco C (2015) Ridge leverage scores for low-rank approximation. arXiv preprint arXiv:1511.07263 6
Cutajar K, Bonilla EV, Michiardi P, Filippone M (2016) Practical learning of deep Gaussian processes via random Fourier features. arXiv preprint arXiv:1610.04386
Damianou A, Lawrence N (2013) Deep Gaussian processes. In: Artificial intelligence and statistics, pp 207–215
Damodaran BB, Courty N, Gosselin PH (2017) Data dependent kernel approximation using pseudo random Fourier features. arXiv preprint arXiv:1711.09783
Dao T, De Sa CM, Ré C (2017) Gaussian quadrature for kernel features. In: Advances in neural information processing systems, pp 6109–6119
Drineas P, Mahoney MW (2005) On the Nyström method for approximating a gram matrix for improved kernel-based learning. J Mach Learn Res 6(Dec):2153–2175
MathSciNet MATH Google Scholar
Felix XY, Suresh AT, Choromanski KM, Holtmann-Rice DN, Kumar S (2016) Orthogonal random features. In: Advances in neural information processing systems, pp 1975–1983
Francis DP, Raimond K (2016) Comparison of machine learning techniques for the identification of the stages of Parkinson’s disease. In: Senthilkumar M, Ramasamy V, Sheen S, Veeramani C, Bonato A, Batten L (eds) Computational intelligence, cyber security and computational models. Springer, Singapore, pp 247–259. https://doi.org/10.1007/978-981-10-0251-9_25
Chapter Google Scholar
Francis DP, Raimond K (2017) Empirical evaluation of kernel PCA approximation methods in classification tasks. arXiv preprint arXiv:1712.04196
Francis DP, Raimond K (2018a) An improvement of the parameterized frequent directions algorithm. Data Min Knowl Discov 32(2):453–482
Article MathSciNet Google Scholar
Francis DP, Raimond K (2018b) A random Fourier features based streaming algorithm for anomaly detection in large datasets. In: Advances in big data and cloud computing. Springer, pp 209–217
Francis DP, Raimond K (2020) A fast and accurate explicit kernel map. Appl Intell 50(3):647–662
Article Google Scholar
Ghashami M, Desai A, Phillips JM (2014) Improved practical matrix sketching with guarantees. In: European symposium on algorithms. Springer, pp 467–479
Ghashami M, Perry DJ, Phillips J (2016) Streaming kernel principal component analysis. In: Proceedings of the 19th international conference on artificial intelligence and statistics, pp 1365–1374
Halko N, Martinsson PG, Tropp JA (2011) Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev 53(2):217–288
Article MathSciNet Google Scholar
Hamid R, Xiao Y, Gittens A, DeCoste D (2014) Compact random feature maps. In: International conference on machine learning, pp 19–27
He L, Li Y, Zhang X, Chen C, Zhu L, Leng C (2018) Incremental spectral clustering via fastfood features and its application to stream image segmentation. Symmetry 10(7):272
Article Google Scholar
Hu Z, Lin M, Zhang C (2015) Dependent online kernel learning with constant number of random Fourier features. IEEE Trans Neural Netw Learn Syst 26(10):2464–2476
Article MathSciNet Google Scholar
Huang PS, Deng L, Hasegawa-Johnson M, He X (2013) Random features for kernel deep convex network. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 3143–3147
Huang PS, Avron H, Sainath TN, Sindhwani V, Ramabhadran B (2014) Kernel methods match deep neural networks on timit. In: ICASSP, pp 205–209
Kar P, Karnick H (2012) Random feature maps for dot product kernels. AISTATS 22:583–591
Google Scholar
Kulis B, Grauman K (2012) Kernelized locality-sensitive hashing. IEEE Trans Pattern Anal Mach Intell 34(6):1092–1104
Article Google Scholar
Le Q, Sarlós T, Smola A (2013) Fastfood-approximating kernel expansions in loglinear time. In: Proceedings of the international conference on machine learning
Lee JM, Yoo C, Choi SW, Vanrolleghem PA, Lee IB (2004) Nonlinear process monitoring using kernel principal component analysis. Chem Eng Sci 59(1):223–234
Article Google Scholar
Liberty E (2013) Simple and deterministic matrix sketching. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 581–588
Lin M, Weng S, Zhang C (2014) On the sample complexity of random Fourier features for online learning: How many random Fourier features do we need? ACM Trans Knowl Discov Data (TKDD) 8(3):13
Google Scholar
Lopez-Paz D, Sra S, Smola AJ, Ghahramani Z, Schölkopf B (2014) Randomized nonlinear component analysis. In: ICML, pp 1359–1367
Lu J, Hoi SC, Wang J, Zhao P, Liu ZY (2016) Large scale online kernel learning. J Mach Learn Res 17(1):1613–1655
MathSciNet MATH Google Scholar
Mackey L, Jordan MI, Chen RY, Farrell B, Tropp JA et al (2014) Matrix concentration inequalities via the method of exchangeable pairs. Ann Probab 42(3):906–945
Article MathSciNet Google Scholar
Mahoney MW (2011) Randomized algorithms for matrices and data. Found Trends® Mach Learn 3(2):123–224
MATH Google Scholar
Marek K, Jennings D, Lasch S, Siderowf A, Tanner C, Simuni T, Coffey C, Kieburtz K, Flagg E, Chowdhury S et al (2011) The Parkinson progression marker initiative (PPMI). Prog Neurobiol 95(4):629–635
Article Google Scholar
May A, Garakani AB, Lu Z, Guo D, Liu K, Bellet A, Fan L, Collins M, Hsu D, Kingsbury B, et al (2017) Kernel approximation methods for speech recognition. arXiv preprint arXiv:1701.03577
Mehrkanoon S, Suykens JA (2016) Scalable semi-supervised kernel spectral learning using random Fourier features. In: 2016 IEEE symposium series on computational intelligence (SSCI). IEEE, pp 1–8
Munkhoeva M, Kapushev Y, Burnaev E, Oseledets I (2018) Quadrature-based features for kernel approximation. In: Advances in neural information processing systems, pp 9147–9156
Musco C, Musco C (2016) Provably useful kernel matrix approximation in linear time. arXiv preprint arXiv:1605.07583
Nelson J, Price E, Wootters M (2014) New constructions of RIP matrices with fast multiplication and fewer rows. In: Proceedings of the twenty-fifth annual ACM-SIAM symposium on discrete algorithms. SIAM, pp 1515–1528
Pennington J, Felix XY, Kumar S (2015) Spherical random features for polynomial kernels. In: Advances in neural information processing systems, pp 1846–1854
Pham N, Pagh R (2013) Fast and scalable polynomial kernels via explicit feature maps. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 239–247
Rahimi A, Recht B (2009) Weighted sums of random kitchen sinks: replacing minimization with randomization in learning. In: Advances in neural information processing systems, pp 1313–1320
Rahimi A, Recht B, et al. (2007) Random features for large-scale kernel machines. In: NIPS, vol 3, p 5
Rudi A, Rosasco L (2017) Generalization properties of learning with random features. In: Advances in neural information processing systems, pp 3218–3228
Schleif FM, Tino P (2017) Indefinite core vector machine. Pattern Recognit 71:187–195
Article Google Scholar
Schleif FM, Raab C, Tino P (2020) Sparsification of core set models in non-metric supervised learning. Pattern Recognit Lett 129:1–7
Article Google Scholar
Schoenberg IJ (1938) Metric spaces and completely monotone functions. Ann Math 39:811–841
Article MathSciNet Google Scholar
Shahrampour S, Beirami A, Tarokh V (2017) On data-dependent random features for improved generalization in supervised learning. arXiv preprint arXiv:1712.07102
Shahrampour S, Beirami A, Tarokh V (2018) Supervised learning using data-dependent random features with application to seizure detection. In: 2018 IEEE conference on decision and control (CDC). IEEE, pp 1168–1173
Sinha A, Duchi JC (2016) Learning kernels with random features. In: Advances in neural information processing systems, pp 1298–1306
Sriperumbudur B, Szabó Z (2015) Optimal rates for random Fourier features. In: Advances in neural information processing systems, pp 1144–1152
Sutherland DJ, Schneider J (2015) On the error of random Fourier features. In: Proceedings of the thirty-first conference on uncertainty in artificial intelligence. AUAI Press, pp 862–871
Uzilov AV, Keegan JM, Mathews DH (2006) Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change. BMC Bioinform 7(1):143–173
Article Google Scholar
Wang Q (2012) Kernel principal component analysis and its applications in face recognition and active shape models. arXiv preprint arXiv:1207.3538
Williams CK, Seeger M (2000) Using the Nyström method to speed up kernel machines. In: Proceedings of the 13th international conference on neural information processing systems. MIT press, pp 661–667

Download references

Acknowledgements

This publication is an outcome of the R&D work undertaken project under the Visvesvaraya Ph.D. Scheme of Ministry of Electronics and Information Technology, Government of India, being implemented by Digital India Corporation.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Karunya Institute of Technology and Sciences, Coimbatore, India
Deena P. Francis & Kumudha Raimond

Authors

Deena P. Francis
View author publications
You can also search for this author in PubMed Google Scholar
Kumudha Raimond
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Deena P. Francis.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Francis, D.P., Raimond, K. Major advancements in kernel function approximation. Artif Intell Rev 54, 843–876 (2021). https://doi.org/10.1007/s10462-020-09880-z

Download citation

Published: 01 August 2020
Issue Date: February 2021
DOI: https://doi.org/10.1007/s10462-020-09880-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Major advancements in kernel function approximation

Abstract

Access this article

Similar content being viewed by others

Kernel Methods

Kernel Methods

Continuous Kernel Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Major advancements in kernel function approximation

Abstract

Access this article

Similar content being viewed by others

Kernel Methods

Kernel Methods

Continuous Kernel Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation