Theoretical Methods in Machine Learning

Chen, Badong; Liu, Weifeng; Principe, José C.

doi:10.1007/978-3-662-43505-2_30

Theoretical Methods in Machine Learning

Badong Chen³,
Weifeng Liu⁴ &
José C. Principe⁵

Chapter

10k Accesses
2 Citations

Part of the book series: Springer Handbooks ((SHB))

Abstract

The problem of optimization in machine learning is well established but it entails several approximations. The theory of Hilbert spaces, which is principled and well established, helps solve the representation problem in machine learning by providing a rich (universal) class of functions where the optimization can be conducted. Working with functions is cumbersome, but for the class of reproducing kernel Hilbert spaces (GlossaryTerm

RKHS

s) it is still manageable provided the algorithm is restricted to inner products. The best example is the support vector machine (GlossaryTerm

SVM

), which is a batch mode algorithm that uses a very efficient (supralinear) optimization procedure. However, the problem of GlossaryTerm

SVM

s is that they display large memory and computational complexity. For the large-scale data limit, GlossaryTerm

SVM

s are restrictive because for fast operation the Gram matrix, which increases with the square of the number of samples, must fit in computer memory. The computation in this best-case scenario is also proportional to number of samples square. This is not specific to the GlossaryTerm

SVM

algorithm and is shared by kernel regression. There are also other relevant data processing scenarios such as streaming data (also called a time series) where the size of the data is unbounded and potentially nonstationary, therefore batch mode is not directly applicable and brings added difficulties.

Online learning in kernel space is more efficient in many practical large scale data applications. As the training data are sequentially presented to the learning system, online kernel learning, in general, requires much less memory and computational bandwidth. The drawback is that online algorithms only converge weakly (in mean square) to the optimal solution, i. e., they only have guaranteed convergence within a ball of radius $ε$ around the optimum ( $ε$ is controlled by the user). But because the theoretical optimal ML solution has many approximations, this is one more approximation that is worth exploring practically. The most important recent advance in this field is the development of the kernel adaptive filters (GlossaryTerm

KAF

s). The GlossaryTerm

KAF

algorithms are developed in reproducing kernel Hilbert space (GlossaryTerm

RKHS

), by using the linear structure of this space to implement well-established linear adaptive algorithms (e. g., GlossaryTerm

LMS

, GlossaryTerm

RLS

, GlossaryTerm

APA

, etc.) and to obtain nonlinear filters in the original input space. The main goal of this chapter is to bring closer to readers, from both machine learning and signal processing communities, these new online learning techniques. In this chapter, we focus mainly on the kernel least mean square (GlossaryTerm

KLMS

), kernel recursive least squares (GlossaryTerm

KRLS

s), and the kernel affine projection algorithms (GlossaryTerm

KAPA

s). The derivation of the algorithms and some key aspects, such as the mean-square convergence and the sparsification of the solutions, are discussed. Several illustration examples are also presented to demonstrate the learning performance.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 269.00; Price excludes VAT (USA)

Hardcover Book: USD 349.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Abbreviations

ALD:: approximate linear dependency
APA:: affine projection algorithm
BER:: bit error rate
CC:: coherence criterion
CIP:: cross information potential
EMSE:: excess mean square error
EW-KRLS:: exponentially weighted KRLS
EX-KRLS:: extended kernel recursive least square
FB-KRLS:: fixed-budget KRLS
FIR:: finite impulse response
i.i.d.:: independent, identically distributed
ITL:: information theoretic learning
KAF:: kernel adaptive filter
KAPA:: kernel affine projection algorithm
KLMS:: kernel least mean square
KMC:: kernel Maximum Correntropy
KRLS:: kernel recursive least square
LMS:: least mean square
LS:: least square
MG:: Mackey–Glass
MLP:: multilayer perceptron
MSE:: mean square error
NC:: novelty criterion
NLMS:: normalized LMS
NR:: noise reduction
OKL:: online kernel learning
QIP:: quadratic information potential
QKLMS:: quantized KLMS
RBF:: radial basis function
RKHS:: reproducing kernel Hilbert space
RLS:: recursive least square
RN:: regularization network
SC:: surprise criterion
SNR:: signal-noise-ratio
SPD:: strictly positive definite
SVM:: support vector machine
SW-KRLS:: sliding window KRLS
VQ:: vector quantization
WEP:: weight error power

References

S. Haykin: Neural Networks and Learning Machines, 3rd edn. (Prentice Hall, Upper Saddle River 2009)
Google Scholar
E. Alpaydin: Introduction to Machine Learning (MIT Press, Cambridge 2004)
MATH Google Scholar
L. Bottou, O. Bousquet: The tradeoffs of large-scale learning. In: Optimization for Machine Learning, ed. by S. Sra, S. Nowozin, S.J. Wright (MIT Press, Cambridge 2011) pp. 351–368
Google Scholar
V. Vapnik: The Nature of Statistical Learning Theory (Springer, New York 1995)
Book MATH Google Scholar
B. Scholkopf, A.J. Smola: Learning with Kernels, Support Vector Machines, Regularization, Optimization and Beyond (MIT Press, Cambridge 2002)
Google Scholar
B. Widrow, S.D. Stearns: Adaptive Signal Processing (Prentice-Hall, Englewood Cliffs 1985)
MATH Google Scholar
S. Haykin: Adaptive Filtering Theory, 3rd edn. (Prentice Hall, Upper Saddle River 1996)
Google Scholar
A.H. Sayed: Fundamentals of Adaptive Filtering (Wiley, Hoboken 2003)
Google Scholar
W. Liu, J.C. Principe, S. Haykin: Kernel Adaptive Filtering: A Comprehensive Introduction (Wiley, Hoboken 2010)
Book Google Scholar
W. Liu, P. Pokharel, J. Principe: The kernel least mean square algorithm, IEEE Trans. Signal Process. 56, 543–554 (2008)
Article MathSciNet Google Scholar
W. Liu, J. Principe: Kernel affine projection algorithm, EURASIP J. Adv. Signal Process. 2008, 784292 (2008)
Article MATH Google Scholar
Y. Engel, S. Mannor, R. Meir: The kernel recursive least-squares algorithm, IEEE Trans. Signal Process. 52, 2275–2285 (2004)
Article MathSciNet Google Scholar
W. Liu, Il Park, Y. Wang, J.C. Principe: Extended kernel recursive least squares algorithm, IEEE Trans. Signal Process. 57, 3801–3814 (2009)
Article MathSciNet Google Scholar
J. Platt: A resource-allocating network for function interpolation, Neural Comput. 3, 213–225 (1991)
Article MathSciNet Google Scholar
C. Richard, J.C.M. Bermudez, P. Honeine: Online prediction of time series data with kernels, IEEE Trans. Signal Process. 57, 1058–1066 (2009)
Article MathSciNet Google Scholar
W. Liu, Il Park, J.C. Principe: An information theoretic approach of designing sparse kernel adaptive filters, IEEE Trans. Neural Netw. 20, 1950–1961 (2009)
Article Google Scholar
B. Chen, S. Zhao, P. Zhu, J.C. Principe: Quantized kernel least mean square algorithm, IEEE Trans. Neural Netw. Learn. Syst. 23(1), 22–32 (2012)
Article Google Scholar
J.C. Principe: Information Theoretic Learning: Renyi's Entropy and Kernel Perspectives (Springer, New York 2010)
Book MATH Google Scholar
W. Liu, P.P. Pokharel, J.C. Principe: Correntropy: properties and applications in non-Gaussian signal processing, IEEE Trans. Signal Process. 55(11), 5286–5298 (2007)
Article MathSciNet Google Scholar
J.-W. Xu, A. Paiva, I. Park, J.C. Principe: A reproducing kernel Hilbert space framework for information-theoretic learning, IEEE Trans. Signal Process. 56(12), 5891–5902 (2008)
Article MathSciNet Google Scholar
E. Moore: On properly positive Hermitian matrices, Bull. Am. Math. Soc. 23(59), 66–67 (1916)
MATH Google Scholar
N. Aronszajn: The theory of reproducing kernels and their applications, Cambr. Philos. Soc. Proc. 39, 133–153 (1943)
Article MathSciNet Google Scholar
J. Mercer: Functions of positive and negative type, and their connection with the theory of integral equations, Philos. Trans. R. Soc. Lond. 209, 415–446 (1909)
Article MATH Google Scholar
B. Hassibi, A.H. Sayed, T. Kailath: The $H_{{\infty}}$ optimality of the LMS algorithm, IEEE Trans. Signal Process. 44, 267–280 (1996)
Article Google Scholar
B.W. Silverman: Density Estimation for Statistics and Data Analysis (Chapman Hall/CRC, London 1986)
Book MATH Google Scholar
B. Chen, S. Zhao, P. Zhu, J.C. Principe: Mean square convergence analysis of the kernel least mean square algorithm, Signal Process. 92, 2624–2632 (2012)
Article Google Scholar
I. Steinwart: On the infuence of the kernel on the consistency of support vector machines, J. Mach. Learn. Res. 2, 67–93 (2001)
MathSciNet MATH Google Scholar
N.R. Yousef, A.H. Sayed: A unified approach to the steady-state and tracking analysis of adaptive filters, IEEE Trans. Signal Process. 49, 314–324 (2001)
Article Google Scholar
T.Y. Al-Naffouri, A.H. Sayed: Adaptive filters with error nonlinearities: mean-square analysis and optimum design, EURASIP J. Appl. Signal Process. 4, 192–205 (2001)
Article MATH Google Scholar
T.Y. Al-Naffouri, A.H. Sayed: Transient analysis of adaptive filters with error nonlinearities, IEEE Trans. Signal Process. 51, 653–663 (2003)
Article Google Scholar
S. Zhao, B. Chen, J.C. Principe: Kernel adaptive filtering with maximum correntropy criterion, Proc. Int. Joint Conf. Neural Netw. (IJCNN) (2011) pp. 2012–2017
Google Scholar
S. Zhao, B. Chen, J.C. Principe: An adaptive kernel width update for correntropy, Proc. Intern. Joint Conf. Neural Netw. (IJCNN) (2012), pp. 1–5
Google Scholar
C.J.C. Burges: A tutorial on support vector machines for pattern recognition, Data Min. Knowl. Discov. 2, 121–167 (1998)
Article Google Scholar
S. Van Vaerenbergh, J. Via, I. Santamaria: A sliding window kernel RLS algorithm and its application to nonlinear channel identification, IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Toulouse (2006)
Google Scholar
S. Van Vaerenbergh, I. Santamaria, W. Liu, J.C. Principe: Fixed-budget kernel recursive least-squares, 2010 IEEE Int. Conf. Acoust, Speech Signal Process. (ICASSP), Dallas (2010) pp. 1882–1885
Chapter Google Scholar
J. Kivinen, A. Smola, R.C. Williamson: Online learning with kernels, IEEE Trans. Signal Process. 52(8), 2165–2176 (2004)
Article MathSciNet Google Scholar
T.-T. Frieb, R.F. Harrison: A kernel-based ADALINE, Proc. Eur. Symp. Artif. Neural Netw. 1999 (1999) pp. 245–250
Google Scholar
W. Liu, P.P. Pokharel, J.C. Principe: Recursively adapted radial basis function networks and its relationship to resource allocating networks and online kernel learning, 2007 IEEE Workshop Mach. Learn. Signal Process., Thessaloniki (2007) pp. 300–305
Chapter Google Scholar
F. Girosi, M. Jones, T. Poggio: Regularization theory and neural networks architectures, Neural Comput. 7, 219–269 (1995)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Inst. Artificial Intelligence and Robotics, Xi’an Jiaotong University, 710049, Xi’an, China
Badong Chen
Jump Trading, 600 W. Chicago Ave., IL 60654, Chicago, USA
Weifeng Liu
Dep. Electrical and Computer Engineering, University of Florida, FL 32611, Gainesville, USA
José C. Principe

Authors

Badong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Weifeng Liu
View author publications
You can also search for this author in PubMed Google Scholar
José C. Principe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Badong Chen .

Editor information

Editors and Affiliations

Systems Research Inst., Polish Academy of Sciences, ul. Newelska 6, 01-447, Warsaw, Poland
Janusz Kacprzyk
Dep. Electrical and Computer Engineering, University of Alberta, 116 Street 9107, T6J 2V4, Edmonton, Alberta, Canada
Witold Pedrycz

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Chen, B., Liu, W., Principe, J.C. (2015). Theoretical Methods in Machine Learning. In: Kacprzyk, J., Pedrycz, W. (eds) Springer Handbook of Computational Intelligence. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43505-2_30

Download citation

DOI: https://doi.org/10.1007/978-3-662-43505-2_30
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-43504-5
Online ISBN: 978-3-662-43505-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics