Abstract
The problem of optimization in machine learning is well established but it entails several approximations. The theory of Hilbert spaces, which is principled and well established, helps solve the representation problem in machine learning by providing a rich (universal) class of functions where the optimization can be conducted. Working with functions is cumbersome, but for the class of reproducing kernel Hilbert spaces (GlossaryTerm
RKHS
s) it is still manageable provided the algorithm is restricted to inner products. The best example is the support vector machine (GlossaryTermSVM
), which is a batch mode algorithm that uses a very efficient (supralinear) optimization procedure. However, the problem of GlossaryTermSVM
s is that they display large memory and computational complexity. For the large-scale data limit, GlossaryTermSVM
s are restrictive because for fast operation the Gram matrix, which increases with the square of the number of samples, must fit in computer memory. The computation in this best-case scenario is also proportional to number of samples square. This is not specific to the GlossaryTermSVM
algorithm and is shared by kernel regression. There are also other relevant data processing scenarios such as streaming data (also called a time series) where the size of the data is unbounded and potentially nonstationary, therefore batch mode is not directly applicable and brings added difficulties.Online learning in kernel space is more efficient in many practical large scale data applications. As the training data are sequentially presented to the learning system, online kernel learning, in general, requires much less memory and computational bandwidth. The drawback is that online algorithms only converge weakly (in mean square) to the optimal solution, i. e., they only have guaranteed convergence within a ball of radius around the optimum ( is controlled by the user). But because the theoretical optimal ML solution has many approximations, this is one more approximation that is worth exploring practically. The most important recent advance in this field is the development of the kernel adaptive filters (GlossaryTerm
KAF
s). The GlossaryTermKAF
algorithms are developed in reproducing kernel Hilbert space (GlossaryTermRKHS
), by using the linear structure of this space to implement well-established linear adaptive algorithms (e. g., GlossaryTermLMS
, GlossaryTermRLS
, GlossaryTermAPA
, etc.) and to obtain nonlinear filters in the original input space. The main goal of this chapter is to bring closer to readers, from both machine learning and signal processing communities, these new online learning techniques. In this chapter, we focus mainly on the kernel least mean square (GlossaryTermKLMS
), kernel recursive least squares (GlossaryTermKRLS
s), and the kernel affine projection algorithms (GlossaryTermKAPA
s). The derivation of the algorithms and some key aspects, such as the mean-square convergence and the sparsification of the solutions, are discussed. Several illustration examples are also presented to demonstrate the learning performance.This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsAbbreviations
- ALD:
-
approximate linear dependency
- APA:
-
affine projection algorithm
- BER:
-
bit error rate
- CC:
-
coherence criterion
- CIP:
-
cross information potential
- EMSE:
-
excess mean square error
- EW-KRLS:
-
exponentially weighted KRLS
- EX-KRLS:
-
extended kernel recursive least square
- FB-KRLS:
-
fixed-budget KRLS
- FIR:
-
finite impulse response
- i.i.d.:
-
independent, identically distributed
- ITL:
-
information theoretic learning
- KAF:
-
kernel adaptive filter
- KAPA:
-
kernel affine projection algorithm
- KLMS:
-
kernel least mean square
- KMC:
-
kernel Maximum Correntropy
- KRLS:
-
kernel recursive least square
- LMS:
-
least mean square
- LS:
-
least square
- MG:
-
Mackey–Glass
- MLP:
-
multilayer perceptron
- MSE:
-
mean square error
- NC:
-
novelty criterion
- NLMS:
-
normalized LMS
- NR:
-
noise reduction
- OKL:
-
online kernel learning
- QIP:
-
quadratic information potential
- QKLMS:
-
quantized KLMS
- RBF:
-
radial basis function
- RKHS:
-
reproducing kernel Hilbert space
- RLS:
-
recursive least square
- RN:
-
regularization network
- SC:
-
surprise criterion
- SNR:
-
signal-noise-ratio
- SPD:
-
strictly positive definite
- SVM:
-
support vector machine
- SW-KRLS:
-
sliding window KRLS
- VQ:
-
vector quantization
- WEP:
-
weight error power
References
S. Haykin: Neural Networks and Learning Machines, 3rd edn. (Prentice Hall, Upper Saddle River 2009)
E. Alpaydin: Introduction to Machine Learning (MIT Press, Cambridge 2004)
L. Bottou, O. Bousquet: The tradeoffs of large-scale learning. In: Optimization for Machine Learning, ed. by S. Sra, S. Nowozin, S.J. Wright (MIT Press, Cambridge 2011) pp. 351–368
V. Vapnik: The Nature of Statistical Learning Theory (Springer, New York 1995)
B. Scholkopf, A.J. Smola: Learning with Kernels, Support Vector Machines, Regularization, Optimization and Beyond (MIT Press, Cambridge 2002)
B. Widrow, S.D. Stearns: Adaptive Signal Processing (Prentice-Hall, Englewood Cliffs 1985)
S. Haykin: Adaptive Filtering Theory, 3rd edn. (Prentice Hall, Upper Saddle River 1996)
A.H. Sayed: Fundamentals of Adaptive Filtering (Wiley, Hoboken 2003)
W. Liu, J.C. Principe, S. Haykin: Kernel Adaptive Filtering: A Comprehensive Introduction (Wiley, Hoboken 2010)
W. Liu, P. Pokharel, J. Principe: The kernel least mean square algorithm, IEEE Trans. Signal Process. 56, 543–554 (2008)
W. Liu, J. Principe: Kernel affine projection algorithm, EURASIP J. Adv. Signal Process. 2008, 784292 (2008)
Y. Engel, S. Mannor, R. Meir: The kernel recursive least-squares algorithm, IEEE Trans. Signal Process. 52, 2275–2285 (2004)
W. Liu, Il Park, Y. Wang, J.C. Principe: Extended kernel recursive least squares algorithm, IEEE Trans. Signal Process. 57, 3801–3814 (2009)
J. Platt: A resource-allocating network for function interpolation, Neural Comput. 3, 213–225 (1991)
C. Richard, J.C.M. Bermudez, P. Honeine: Online prediction of time series data with kernels, IEEE Trans. Signal Process. 57, 1058–1066 (2009)
W. Liu, Il Park, J.C. Principe: An information theoretic approach of designing sparse kernel adaptive filters, IEEE Trans. Neural Netw. 20, 1950–1961 (2009)
B. Chen, S. Zhao, P. Zhu, J.C. Principe: Quantized kernel least mean square algorithm, IEEE Trans. Neural Netw. Learn. Syst. 23(1), 22–32 (2012)
J.C. Principe: Information Theoretic Learning: Renyi's Entropy and Kernel Perspectives (Springer, New York 2010)
W. Liu, P.P. Pokharel, J.C. Principe: Correntropy: properties and applications in non-Gaussian signal processing, IEEE Trans. Signal Process. 55(11), 5286–5298 (2007)
J.-W. Xu, A. Paiva, I. Park, J.C. Principe: A reproducing kernel Hilbert space framework for information-theoretic learning, IEEE Trans. Signal Process. 56(12), 5891–5902 (2008)
E. Moore: On properly positive Hermitian matrices, Bull. Am. Math. Soc. 23(59), 66–67 (1916)
N. Aronszajn: The theory of reproducing kernels and their applications, Cambr. Philos. Soc. Proc. 39, 133–153 (1943)
J. Mercer: Functions of positive and negative type, and their connection with the theory of integral equations, Philos. Trans. R. Soc. Lond. 209, 415–446 (1909)
B. Hassibi, A.H. Sayed, T. Kailath: The $H_{{\infty}}$ optimality of the LMS algorithm, IEEE Trans. Signal Process. 44, 267–280 (1996)
B.W. Silverman: Density Estimation for Statistics and Data Analysis (Chapman Hall/CRC, London 1986)
B. Chen, S. Zhao, P. Zhu, J.C. Principe: Mean square convergence analysis of the kernel least mean square algorithm, Signal Process. 92, 2624–2632 (2012)
I. Steinwart: On the infuence of the kernel on the consistency of support vector machines, J. Mach. Learn. Res. 2, 67–93 (2001)
N.R. Yousef, A.H. Sayed: A unified approach to the steady-state and tracking analysis of adaptive filters, IEEE Trans. Signal Process. 49, 314–324 (2001)
T.Y. Al-Naffouri, A.H. Sayed: Adaptive filters with error nonlinearities: mean-square analysis and optimum design, EURASIP J. Appl. Signal Process. 4, 192–205 (2001)
T.Y. Al-Naffouri, A.H. Sayed: Transient analysis of adaptive filters with error nonlinearities, IEEE Trans. Signal Process. 51, 653–663 (2003)
S. Zhao, B. Chen, J.C. Principe: Kernel adaptive filtering with maximum correntropy criterion, Proc. Int. Joint Conf. Neural Netw. (IJCNN) (2011) pp. 2012–2017
S. Zhao, B. Chen, J.C. Principe: An adaptive kernel width update for correntropy, Proc. Intern. Joint Conf. Neural Netw. (IJCNN) (2012), pp. 1–5
C.J.C. Burges: A tutorial on support vector machines for pattern recognition, Data Min. Knowl. Discov. 2, 121–167 (1998)
S. Van Vaerenbergh, J. Via, I. Santamaria: A sliding window kernel RLS algorithm and its application to nonlinear channel identification, IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Toulouse (2006)
S. Van Vaerenbergh, I. Santamaria, W. Liu, J.C. Principe: Fixed-budget kernel recursive least-squares, 2010 IEEE Int. Conf. Acoust, Speech Signal Process. (ICASSP), Dallas (2010) pp. 1882–1885
J. Kivinen, A. Smola, R.C. Williamson: Online learning with kernels, IEEE Trans. Signal Process. 52(8), 2165–2176 (2004)
T.-T. Frieb, R.F. Harrison: A kernel-based ADALINE, Proc. Eur. Symp. Artif. Neural Netw. 1999 (1999) pp. 245–250
W. Liu, P.P. Pokharel, J.C. Principe: Recursively adapted radial basis function networks and its relationship to resource allocating networks and online kernel learning, 2007 IEEE Workshop Mach. Learn. Signal Process., Thessaloniki (2007) pp. 300–305
F. Girosi, M. Jones, T. Poggio: Regularization theory and neural networks architectures, Neural Comput. 7, 219–269 (1995)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Chen, B., Liu, W., Principe, J.C. (2015). Theoretical Methods in Machine Learning. In: Kacprzyk, J., Pedrycz, W. (eds) Springer Handbook of Computational Intelligence. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43505-2_30
Download citation
DOI: https://doi.org/10.1007/978-3-662-43505-2_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-43504-5
Online ISBN: 978-3-662-43505-2
eBook Packages: EngineeringEngineering (R0)