Skip to main content
Log in

Major advancements in kernel function approximation

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Kernel based methods have become popular in a wide variety of machine learning tasks. They rely on the computation of kernel functions, which implicitly transform the data in its input space to data in a very high dimensional space. Efficient application of these functions have been subject to study in the last 10 years. The main focus was on improving the scalability of kernel based methods. In this regard, kernel function approximation using explicit feature maps have emerged as a substitute for traditional kernel based methods. Over the years, various advancements from the theoretical perspective have been made to explicit kernel maps, especially to the method of random Fourier features (RFF), which is the main focus of our work. In this work, the major developments in the theory of kernel function approximation are reviewed in a systematic manner and the practical applications are discussed. Furthermore, we identify the shortcomings of the current research, and discuss possible avenues for future work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Ailon N, Chazelle B (2006) Approximate nearest neighbors and the fast Johnson–Lindenstrauss transform. In: Proceedings of the thirty-eighth annual ACM symposium on theory of computing, pp 557–563

  • Aronszajn N (1950) Theory of reproducing kernels. Trans Am Math Soc 68(3):337–404

    Article  MathSciNet  Google Scholar 

  • Avron H, Sindhwani V, Yang J, Mahoney MW (2016) Quasi-Monte Carlo feature maps for shift-invariant kernels. J Mach Learn Res 17(1):4096–4133

    MathSciNet  MATH  Google Scholar 

  • Avron H, Kapralov M, Musco C, Musco C, Velingker A, Zandieh A (2017) Random Fourier features for kernel ridge regression: approximation bounds and statistical guarantees. In: Proceedings of the 34th international conference on machine learning, PMLR, Proceedings of machine learning research, vol 70, pp 253–262, http://proceedings.mlr.press/v70/avron17a.html

  • Bach F (2013) Sharp analysis of low-rank kernel matrix approximations. In: Conference on learning theory, pp 185–209

  • Bach F (2017) On the equivalence between kernel quadrature rules and random feature expansions. J Mach Learn Res 18(21):1–38

    MathSciNet  MATH  Google Scholar 

  • Bouboulis P, Chouvardas S, Theodoridis S (2017) Online distributed learning over networks in RKH spaces using random Fourier features. IEEE Trans Signal Process 66(7):1920–1932

    Article  MathSciNet  Google Scholar 

  • Carratino L, Rudi A, Rosasco L (2018) Learning with SGD and random features. In: Advances in neural information processing systems, pp 10192–10203

  • Cesa-Bianchi N, Conconi A, Gentile C (2004) On the generalization ability of on-line learning algorithms. IEEE Trans Inf Theory 50(9):2050–2057

    Article  MathSciNet  Google Scholar 

  • Chang PC, Wu JL (2015) A critical feature extraction by kernel PCA in stock trading model. Soft Comput Fusion Found Methodol Appl 19(5):1393–1408

    Google Scholar 

  • Charikar M, Chen K, Farach-Colton M (2002) Finding frequent items in data streams. In: International colloquium on automata, languages, and programming. Springer, pp 693–703

  • Chaudhuri K, Monteleoni C, Sarwate AD (2011) Differentially private empirical risk minimization. J Mach Learn Res 12(Mar):1069–1109

    MathSciNet  MATH  Google Scholar 

  • Chitta R, Jin R, Jain AK (2012) Efficient kernel clustering using random Fourier features. In: 2012 IEEE 12th international conference on data mining (ICDM). IEEE, pp 161–170

  • Choromanski K, Sindhwani V (2016) Recycling randomness with structure for sublinear time kernel expansions. In: Proceedings of the 33rd international conference on international conference on machine learning, vol 48, JMLR.org, pp 2502–2510

  • Choromanski KM, Rowland M, Weller A (2017) The unreasonable effectiveness of structured random orthogonal embeddings. In: Advances in neural information processing systems, pp 219–228

  • Cohen MB, Musco C, Musco C (2015) Ridge leverage scores for low-rank approximation. arXiv preprint arXiv:1511.07263 6

  • Cutajar K, Bonilla EV, Michiardi P, Filippone M (2016) Practical learning of deep Gaussian processes via random Fourier features. arXiv preprint arXiv:1610.04386

  • Damianou A, Lawrence N (2013) Deep Gaussian processes. In: Artificial intelligence and statistics, pp 207–215

  • Damodaran BB, Courty N, Gosselin PH (2017) Data dependent kernel approximation using pseudo random Fourier features. arXiv preprint arXiv:1711.09783

  • Dao T, De Sa CM, Ré C (2017) Gaussian quadrature for kernel features. In: Advances in neural information processing systems, pp 6109–6119

  • Drineas P, Mahoney MW (2005) On the Nyström method for approximating a gram matrix for improved kernel-based learning. J Mach Learn Res 6(Dec):2153–2175

    MathSciNet  MATH  Google Scholar 

  • Felix XY, Suresh AT, Choromanski KM, Holtmann-Rice DN, Kumar S (2016) Orthogonal random features. In: Advances in neural information processing systems, pp 1975–1983

  • Francis DP, Raimond K (2016) Comparison of machine learning techniques for the identification of the stages of Parkinson’s disease. In: Senthilkumar M, Ramasamy V, Sheen S, Veeramani C, Bonato A, Batten L (eds) Computational intelligence, cyber security and computational models. Springer, Singapore, pp 247–259. https://doi.org/10.1007/978-981-10-0251-9_25

    Chapter  Google Scholar 

  • Francis DP, Raimond K (2017) Empirical evaluation of kernel PCA approximation methods in classification tasks. arXiv preprint arXiv:1712.04196

  • Francis DP, Raimond K (2018a) An improvement of the parameterized frequent directions algorithm. Data Min Knowl Discov 32(2):453–482

    Article  MathSciNet  Google Scholar 

  • Francis DP, Raimond K (2018b) A random Fourier features based streaming algorithm for anomaly detection in large datasets. In: Advances in big data and cloud computing. Springer, pp 209–217

  • Francis DP, Raimond K (2020) A fast and accurate explicit kernel map. Appl Intell 50(3):647–662

    Article  Google Scholar 

  • Ghashami M, Desai A, Phillips JM (2014) Improved practical matrix sketching with guarantees. In: European symposium on algorithms. Springer, pp 467–479

  • Ghashami M, Perry DJ, Phillips J (2016) Streaming kernel principal component analysis. In: Proceedings of the 19th international conference on artificial intelligence and statistics, pp 1365–1374

  • Halko N, Martinsson PG, Tropp JA (2011) Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev 53(2):217–288

    Article  MathSciNet  Google Scholar 

  • Hamid R, Xiao Y, Gittens A, DeCoste D (2014) Compact random feature maps. In: International conference on machine learning, pp 19–27

  • He L, Li Y, Zhang X, Chen C, Zhu L, Leng C (2018) Incremental spectral clustering via fastfood features and its application to stream image segmentation. Symmetry 10(7):272

    Article  Google Scholar 

  • Hu Z, Lin M, Zhang C (2015) Dependent online kernel learning with constant number of random Fourier features. IEEE Trans Neural Netw Learn Syst 26(10):2464–2476

    Article  MathSciNet  Google Scholar 

  • Huang PS, Deng L, Hasegawa-Johnson M, He X (2013) Random features for kernel deep convex network. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 3143–3147

  • Huang PS, Avron H, Sainath TN, Sindhwani V, Ramabhadran B (2014) Kernel methods match deep neural networks on timit. In: ICASSP, pp 205–209

  • Kar P, Karnick H (2012) Random feature maps for dot product kernels. AISTATS 22:583–591

    Google Scholar 

  • Kulis B, Grauman K (2012) Kernelized locality-sensitive hashing. IEEE Trans Pattern Anal Mach Intell 34(6):1092–1104

    Article  Google Scholar 

  • Le Q, Sarlós T, Smola A (2013) Fastfood-approximating kernel expansions in loglinear time. In: Proceedings of the international conference on machine learning

  • Lee JM, Yoo C, Choi SW, Vanrolleghem PA, Lee IB (2004) Nonlinear process monitoring using kernel principal component analysis. Chem Eng Sci 59(1):223–234

    Article  Google Scholar 

  • Liberty E (2013) Simple and deterministic matrix sketching. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 581–588

  • Lin M, Weng S, Zhang C (2014) On the sample complexity of random Fourier features for online learning: How many random Fourier features do we need? ACM Trans Knowl Discov Data (TKDD) 8(3):13

    Google Scholar 

  • Lopez-Paz D, Sra S, Smola AJ, Ghahramani Z, Schölkopf B (2014) Randomized nonlinear component analysis. In: ICML, pp 1359–1367

  • Lu J, Hoi SC, Wang J, Zhao P, Liu ZY (2016) Large scale online kernel learning. J Mach Learn Res 17(1):1613–1655

    MathSciNet  MATH  Google Scholar 

  • Mackey L, Jordan MI, Chen RY, Farrell B, Tropp JA et al (2014) Matrix concentration inequalities via the method of exchangeable pairs. Ann Probab 42(3):906–945

    Article  MathSciNet  Google Scholar 

  • Mahoney MW (2011) Randomized algorithms for matrices and data. Found Trends® Mach Learn 3(2):123–224

    MATH  Google Scholar 

  • Marek K, Jennings D, Lasch S, Siderowf A, Tanner C, Simuni T, Coffey C, Kieburtz K, Flagg E, Chowdhury S et al (2011) The Parkinson progression marker initiative (PPMI). Prog Neurobiol 95(4):629–635

    Article  Google Scholar 

  • May A, Garakani AB, Lu Z, Guo D, Liu K, Bellet A, Fan L, Collins M, Hsu D, Kingsbury B, et al (2017) Kernel approximation methods for speech recognition. arXiv preprint arXiv:1701.03577

  • Mehrkanoon S, Suykens JA (2016) Scalable semi-supervised kernel spectral learning using random Fourier features. In: 2016 IEEE symposium series on computational intelligence (SSCI). IEEE, pp 1–8

  • Munkhoeva M, Kapushev Y, Burnaev E, Oseledets I (2018) Quadrature-based features for kernel approximation. In: Advances in neural information processing systems, pp 9147–9156

  • Musco C, Musco C (2016) Provably useful kernel matrix approximation in linear time. arXiv preprint arXiv:1605.07583

  • Nelson J, Price E, Wootters M (2014) New constructions of RIP matrices with fast multiplication and fewer rows. In: Proceedings of the twenty-fifth annual ACM-SIAM symposium on discrete algorithms. SIAM, pp 1515–1528

  • Pennington J, Felix XY, Kumar S (2015) Spherical random features for polynomial kernels. In: Advances in neural information processing systems, pp 1846–1854

  • Pham N, Pagh R (2013) Fast and scalable polynomial kernels via explicit feature maps. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 239–247

  • Rahimi A, Recht B (2009) Weighted sums of random kitchen sinks: replacing minimization with randomization in learning. In: Advances in neural information processing systems, pp 1313–1320

  • Rahimi A, Recht B, et al. (2007) Random features for large-scale kernel machines. In: NIPS, vol 3, p 5

  • Rudi A, Rosasco L (2017) Generalization properties of learning with random features. In: Advances in neural information processing systems, pp 3218–3228

  • Schleif FM, Tino P (2017) Indefinite core vector machine. Pattern Recognit 71:187–195

    Article  Google Scholar 

  • Schleif FM, Raab C, Tino P (2020) Sparsification of core set models in non-metric supervised learning. Pattern Recognit Lett 129:1–7

    Article  Google Scholar 

  • Schoenberg IJ (1938) Metric spaces and completely monotone functions. Ann Math 39:811–841

    Article  MathSciNet  Google Scholar 

  • Shahrampour S, Beirami A, Tarokh V (2017) On data-dependent random features for improved generalization in supervised learning. arXiv preprint arXiv:1712.07102

  • Shahrampour S, Beirami A, Tarokh V (2018) Supervised learning using data-dependent random features with application to seizure detection. In: 2018 IEEE conference on decision and control (CDC). IEEE, pp 1168–1173

  • Sinha A, Duchi JC (2016) Learning kernels with random features. In: Advances in neural information processing systems, pp 1298–1306

  • Sriperumbudur B, Szabó Z (2015) Optimal rates for random Fourier features. In: Advances in neural information processing systems, pp 1144–1152

  • Sutherland DJ, Schneider J (2015) On the error of random Fourier features. In: Proceedings of the thirty-first conference on uncertainty in artificial intelligence. AUAI Press, pp 862–871

  • Uzilov AV, Keegan JM, Mathews DH (2006) Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change. BMC Bioinform 7(1):143–173

    Article  Google Scholar 

  • Wang Q (2012) Kernel principal component analysis and its applications in face recognition and active shape models. arXiv preprint arXiv:1207.3538

  • Williams CK, Seeger M (2000) Using the Nyström method to speed up kernel machines. In: Proceedings of the 13th international conference on neural information processing systems. MIT press, pp 661–667

Download references

Acknowledgements

This publication is an outcome of the R&D work undertaken project under the Visvesvaraya Ph.D. Scheme of Ministry of Electronics and Information Technology, Government of India, being implemented by Digital India Corporation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Deena P. Francis.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Francis, D.P., Raimond, K. Major advancements in kernel function approximation. Artif Intell Rev 54, 843–876 (2021). https://doi.org/10.1007/s10462-020-09880-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-020-09880-z

Keywords

Navigation