Abstract
Kernel based methods have become popular in a wide variety of machine learning tasks. They rely on the computation of kernel functions, which implicitly transform the data in its input space to data in a very high dimensional space. Efficient application of these functions have been subject to study in the last 10 years. The main focus was on improving the scalability of kernel based methods. In this regard, kernel function approximation using explicit feature maps have emerged as a substitute for traditional kernel based methods. Over the years, various advancements from the theoretical perspective have been made to explicit kernel maps, especially to the method of random Fourier features (RFF), which is the main focus of our work. In this work, the major developments in the theory of kernel function approximation are reviewed in a systematic manner and the practical applications are discussed. Furthermore, we identify the shortcomings of the current research, and discuss possible avenues for future work.
Similar content being viewed by others
References
Ailon N, Chazelle B (2006) Approximate nearest neighbors and the fast Johnson–Lindenstrauss transform. In: Proceedings of the thirty-eighth annual ACM symposium on theory of computing, pp 557–563
Aronszajn N (1950) Theory of reproducing kernels. Trans Am Math Soc 68(3):337–404
Avron H, Sindhwani V, Yang J, Mahoney MW (2016) Quasi-Monte Carlo feature maps for shift-invariant kernels. J Mach Learn Res 17(1):4096–4133
Avron H, Kapralov M, Musco C, Musco C, Velingker A, Zandieh A (2017) Random Fourier features for kernel ridge regression: approximation bounds and statistical guarantees. In: Proceedings of the 34th international conference on machine learning, PMLR, Proceedings of machine learning research, vol 70, pp 253–262, http://proceedings.mlr.press/v70/avron17a.html
Bach F (2013) Sharp analysis of low-rank kernel matrix approximations. In: Conference on learning theory, pp 185–209
Bach F (2017) On the equivalence between kernel quadrature rules and random feature expansions. J Mach Learn Res 18(21):1–38
Bouboulis P, Chouvardas S, Theodoridis S (2017) Online distributed learning over networks in RKH spaces using random Fourier features. IEEE Trans Signal Process 66(7):1920–1932
Carratino L, Rudi A, Rosasco L (2018) Learning with SGD and random features. In: Advances in neural information processing systems, pp 10192–10203
Cesa-Bianchi N, Conconi A, Gentile C (2004) On the generalization ability of on-line learning algorithms. IEEE Trans Inf Theory 50(9):2050–2057
Chang PC, Wu JL (2015) A critical feature extraction by kernel PCA in stock trading model. Soft Comput Fusion Found Methodol Appl 19(5):1393–1408
Charikar M, Chen K, Farach-Colton M (2002) Finding frequent items in data streams. In: International colloquium on automata, languages, and programming. Springer, pp 693–703
Chaudhuri K, Monteleoni C, Sarwate AD (2011) Differentially private empirical risk minimization. J Mach Learn Res 12(Mar):1069–1109
Chitta R, Jin R, Jain AK (2012) Efficient kernel clustering using random Fourier features. In: 2012 IEEE 12th international conference on data mining (ICDM). IEEE, pp 161–170
Choromanski K, Sindhwani V (2016) Recycling randomness with structure for sublinear time kernel expansions. In: Proceedings of the 33rd international conference on international conference on machine learning, vol 48, JMLR.org, pp 2502–2510
Choromanski KM, Rowland M, Weller A (2017) The unreasonable effectiveness of structured random orthogonal embeddings. In: Advances in neural information processing systems, pp 219–228
Cohen MB, Musco C, Musco C (2015) Ridge leverage scores for low-rank approximation. arXiv preprint arXiv:1511.07263 6
Cutajar K, Bonilla EV, Michiardi P, Filippone M (2016) Practical learning of deep Gaussian processes via random Fourier features. arXiv preprint arXiv:1610.04386
Damianou A, Lawrence N (2013) Deep Gaussian processes. In: Artificial intelligence and statistics, pp 207–215
Damodaran BB, Courty N, Gosselin PH (2017) Data dependent kernel approximation using pseudo random Fourier features. arXiv preprint arXiv:1711.09783
Dao T, De Sa CM, Ré C (2017) Gaussian quadrature for kernel features. In: Advances in neural information processing systems, pp 6109–6119
Drineas P, Mahoney MW (2005) On the Nyström method for approximating a gram matrix for improved kernel-based learning. J Mach Learn Res 6(Dec):2153–2175
Felix XY, Suresh AT, Choromanski KM, Holtmann-Rice DN, Kumar S (2016) Orthogonal random features. In: Advances in neural information processing systems, pp 1975–1983
Francis DP, Raimond K (2016) Comparison of machine learning techniques for the identification of the stages of Parkinson’s disease. In: Senthilkumar M, Ramasamy V, Sheen S, Veeramani C, Bonato A, Batten L (eds) Computational intelligence, cyber security and computational models. Springer, Singapore, pp 247–259. https://doi.org/10.1007/978-981-10-0251-9_25
Francis DP, Raimond K (2017) Empirical evaluation of kernel PCA approximation methods in classification tasks. arXiv preprint arXiv:1712.04196
Francis DP, Raimond K (2018a) An improvement of the parameterized frequent directions algorithm. Data Min Knowl Discov 32(2):453–482
Francis DP, Raimond K (2018b) A random Fourier features based streaming algorithm for anomaly detection in large datasets. In: Advances in big data and cloud computing. Springer, pp 209–217
Francis DP, Raimond K (2020) A fast and accurate explicit kernel map. Appl Intell 50(3):647–662
Ghashami M, Desai A, Phillips JM (2014) Improved practical matrix sketching with guarantees. In: European symposium on algorithms. Springer, pp 467–479
Ghashami M, Perry DJ, Phillips J (2016) Streaming kernel principal component analysis. In: Proceedings of the 19th international conference on artificial intelligence and statistics, pp 1365–1374
Halko N, Martinsson PG, Tropp JA (2011) Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev 53(2):217–288
Hamid R, Xiao Y, Gittens A, DeCoste D (2014) Compact random feature maps. In: International conference on machine learning, pp 19–27
He L, Li Y, Zhang X, Chen C, Zhu L, Leng C (2018) Incremental spectral clustering via fastfood features and its application to stream image segmentation. Symmetry 10(7):272
Hu Z, Lin M, Zhang C (2015) Dependent online kernel learning with constant number of random Fourier features. IEEE Trans Neural Netw Learn Syst 26(10):2464–2476
Huang PS, Deng L, Hasegawa-Johnson M, He X (2013) Random features for kernel deep convex network. In: 2013 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 3143–3147
Huang PS, Avron H, Sainath TN, Sindhwani V, Ramabhadran B (2014) Kernel methods match deep neural networks on timit. In: ICASSP, pp 205–209
Kar P, Karnick H (2012) Random feature maps for dot product kernels. AISTATS 22:583–591
Kulis B, Grauman K (2012) Kernelized locality-sensitive hashing. IEEE Trans Pattern Anal Mach Intell 34(6):1092–1104
Le Q, Sarlós T, Smola A (2013) Fastfood-approximating kernel expansions in loglinear time. In: Proceedings of the international conference on machine learning
Lee JM, Yoo C, Choi SW, Vanrolleghem PA, Lee IB (2004) Nonlinear process monitoring using kernel principal component analysis. Chem Eng Sci 59(1):223–234
Liberty E (2013) Simple and deterministic matrix sketching. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 581–588
Lin M, Weng S, Zhang C (2014) On the sample complexity of random Fourier features for online learning: How many random Fourier features do we need? ACM Trans Knowl Discov Data (TKDD) 8(3):13
Lopez-Paz D, Sra S, Smola AJ, Ghahramani Z, Schölkopf B (2014) Randomized nonlinear component analysis. In: ICML, pp 1359–1367
Lu J, Hoi SC, Wang J, Zhao P, Liu ZY (2016) Large scale online kernel learning. J Mach Learn Res 17(1):1613–1655
Mackey L, Jordan MI, Chen RY, Farrell B, Tropp JA et al (2014) Matrix concentration inequalities via the method of exchangeable pairs. Ann Probab 42(3):906–945
Mahoney MW (2011) Randomized algorithms for matrices and data. Found Trends® Mach Learn 3(2):123–224
Marek K, Jennings D, Lasch S, Siderowf A, Tanner C, Simuni T, Coffey C, Kieburtz K, Flagg E, Chowdhury S et al (2011) The Parkinson progression marker initiative (PPMI). Prog Neurobiol 95(4):629–635
May A, Garakani AB, Lu Z, Guo D, Liu K, Bellet A, Fan L, Collins M, Hsu D, Kingsbury B, et al (2017) Kernel approximation methods for speech recognition. arXiv preprint arXiv:1701.03577
Mehrkanoon S, Suykens JA (2016) Scalable semi-supervised kernel spectral learning using random Fourier features. In: 2016 IEEE symposium series on computational intelligence (SSCI). IEEE, pp 1–8
Munkhoeva M, Kapushev Y, Burnaev E, Oseledets I (2018) Quadrature-based features for kernel approximation. In: Advances in neural information processing systems, pp 9147–9156
Musco C, Musco C (2016) Provably useful kernel matrix approximation in linear time. arXiv preprint arXiv:1605.07583
Nelson J, Price E, Wootters M (2014) New constructions of RIP matrices with fast multiplication and fewer rows. In: Proceedings of the twenty-fifth annual ACM-SIAM symposium on discrete algorithms. SIAM, pp 1515–1528
Pennington J, Felix XY, Kumar S (2015) Spherical random features for polynomial kernels. In: Advances in neural information processing systems, pp 1846–1854
Pham N, Pagh R (2013) Fast and scalable polynomial kernels via explicit feature maps. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 239–247
Rahimi A, Recht B (2009) Weighted sums of random kitchen sinks: replacing minimization with randomization in learning. In: Advances in neural information processing systems, pp 1313–1320
Rahimi A, Recht B, et al. (2007) Random features for large-scale kernel machines. In: NIPS, vol 3, p 5
Rudi A, Rosasco L (2017) Generalization properties of learning with random features. In: Advances in neural information processing systems, pp 3218–3228
Schleif FM, Tino P (2017) Indefinite core vector machine. Pattern Recognit 71:187–195
Schleif FM, Raab C, Tino P (2020) Sparsification of core set models in non-metric supervised learning. Pattern Recognit Lett 129:1–7
Schoenberg IJ (1938) Metric spaces and completely monotone functions. Ann Math 39:811–841
Shahrampour S, Beirami A, Tarokh V (2017) On data-dependent random features for improved generalization in supervised learning. arXiv preprint arXiv:1712.07102
Shahrampour S, Beirami A, Tarokh V (2018) Supervised learning using data-dependent random features with application to seizure detection. In: 2018 IEEE conference on decision and control (CDC). IEEE, pp 1168–1173
Sinha A, Duchi JC (2016) Learning kernels with random features. In: Advances in neural information processing systems, pp 1298–1306
Sriperumbudur B, Szabó Z (2015) Optimal rates for random Fourier features. In: Advances in neural information processing systems, pp 1144–1152
Sutherland DJ, Schneider J (2015) On the error of random Fourier features. In: Proceedings of the thirty-first conference on uncertainty in artificial intelligence. AUAI Press, pp 862–871
Uzilov AV, Keegan JM, Mathews DH (2006) Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change. BMC Bioinform 7(1):143–173
Wang Q (2012) Kernel principal component analysis and its applications in face recognition and active shape models. arXiv preprint arXiv:1207.3538
Williams CK, Seeger M (2000) Using the Nyström method to speed up kernel machines. In: Proceedings of the 13th international conference on neural information processing systems. MIT press, pp 661–667
Acknowledgements
This publication is an outcome of the R&D work undertaken project under the Visvesvaraya Ph.D. Scheme of Ministry of Electronics and Information Technology, Government of India, being implemented by Digital India Corporation.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Francis, D.P., Raimond, K. Major advancements in kernel function approximation. Artif Intell Rev 54, 843–876 (2021). https://doi.org/10.1007/s10462-020-09880-z
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10462-020-09880-z