Skip to main content

Theoretical Methods in Machine Learning

  • Chapter

Part of the book series: Springer Handbooks ((SHB))

Abstract

The problem of optimization in machine learning is well established but it entails several approximations. The theory of Hilbert spaces, which is principled and well established, helps solve the representation problem in machine learning by providing a rich (universal) class of functions where the optimization can be conducted. Working with functions is cumbersome, but for the class of reproducing kernel Hilbert spaces (GlossaryTerm

RKHS

s) it is still manageable provided the algorithm is restricted to inner products. The best example is the support vector machine (GlossaryTerm

SVM

), which is a batch mode algorithm that uses a very efficient (supralinear) optimization procedure. However, the problem of GlossaryTerm

SVM

s is that they display large memory and computational complexity. For the large-scale data limit, GlossaryTerm

SVM

s are restrictive because for fast operation the Gram matrix, which increases with the square of the number of samples, must fit in computer memory. The computation in this best-case scenario is also proportional to number of samples square. This is not specific to the GlossaryTerm

SVM

algorithm and is shared by kernel regression. There are also other relevant data processing scenarios such as streaming data (also called a time series) where the size of the data is unbounded and potentially nonstationary, therefore batch mode is not directly applicable and brings added difficulties.

Online learning in kernel space is more efficient in many practical large scale data applications. As the training data are sequentially presented to the learning system, online kernel learning, in general, requires much less memory and computational bandwidth. The drawback is that online algorithms only converge weakly (in mean square) to the optimal solution, i. e., they only have guaranteed convergence within a ball of radius ε around the optimum (ε is controlled by the user). But because the theoretical optimal ML solution has many approximations, this is one more approximation that is worth exploring practically. The most important recent advance in this field is the development of the kernel adaptive filters (GlossaryTerm

KAF

s). The GlossaryTerm

KAF

algorithms are developed in reproducing kernel Hilbert space (GlossaryTerm

RKHS

), by using the linear structure of this space to implement well-established linear adaptive algorithms (e. g., GlossaryTerm

LMS

, GlossaryTerm

RLS

, GlossaryTerm

APA

, etc.) and to obtain nonlinear filters in the original input space. The main goal of this chapter is to bring closer to readers, from both machine learning and signal processing communities, these new online learning techniques. In this chapter, we focus mainly on the kernel least mean square (GlossaryTerm

KLMS

), kernel recursive least squares (GlossaryTerm

KRLS

s), and the kernel affine projection algorithms (GlossaryTerm

KAPA

s). The derivation of the algorithms and some key aspects, such as the mean-square convergence and the sparsification of the solutions, are discussed. Several illustration examples are also presented to demonstrate the learning performance.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   269.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   349.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Abbreviations

ALD:

approximate linear dependency

APA:

affine projection algorithm

BER:

bit error rate

CC:

coherence criterion

CIP:

cross information potential

EMSE:

excess mean square error

EW-KRLS:

exponentially weighted KRLS

EX-KRLS:

extended kernel recursive least square

FB-KRLS:

fixed-budget KRLS

FIR:

finite impulse response

i.i.d.:

independent, identically distributed

ITL:

information theoretic learning

KAF:

kernel adaptive filter

KAPA:

kernel affine projection algorithm

KLMS:

kernel least mean square

KMC:

kernel Maximum Correntropy

KRLS:

kernel recursive least square

LMS:

least mean square

LS:

least square

MG:

Mackey–Glass

MLP:

multilayer perceptron

MSE:

mean square error

NC:

novelty criterion

NLMS:

normalized LMS

NR:

noise reduction

OKL:

online kernel learning

QIP:

quadratic information potential

QKLMS:

quantized KLMS

RBF:

radial basis function

RKHS:

reproducing kernel Hilbert space

RLS:

recursive least square

RN:

regularization network

SC:

surprise criterion

SNR:

signal-noise-ratio

SPD:

strictly positive definite

SVM:

support vector machine

SW-KRLS:

sliding window KRLS

VQ:

vector quantization

WEP:

weight error power

References

  1. S. Haykin: Neural Networks and Learning Machines, 3rd edn. (Prentice Hall, Upper Saddle River 2009)

    Google Scholar 

  2. E. Alpaydin: Introduction to Machine Learning (MIT Press, Cambridge 2004)

    MATH  Google Scholar 

  3. L. Bottou, O. Bousquet: The tradeoffs of large-scale learning. In: Optimization for Machine Learning, ed. by S. Sra, S. Nowozin, S.J. Wright (MIT Press, Cambridge 2011) pp. 351–368

    Google Scholar 

  4. V. Vapnik: The Nature of Statistical Learning Theory (Springer, New York 1995)

    Book  MATH  Google Scholar 

  5. B. Scholkopf, A.J. Smola: Learning with Kernels, Support Vector Machines, Regularization, Optimization and Beyond (MIT Press, Cambridge 2002)

    Google Scholar 

  6. B. Widrow, S.D. Stearns: Adaptive Signal Processing (Prentice-Hall, Englewood Cliffs 1985)

    MATH  Google Scholar 

  7. S. Haykin: Adaptive Filtering Theory, 3rd edn. (Prentice Hall, Upper Saddle River 1996)

    Google Scholar 

  8. A.H. Sayed: Fundamentals of Adaptive Filtering (Wiley, Hoboken 2003)

    Google Scholar 

  9. W. Liu, J.C. Principe, S. Haykin: Kernel Adaptive Filtering: A Comprehensive Introduction (Wiley, Hoboken 2010)

    Book  Google Scholar 

  10. W. Liu, P. Pokharel, J. Principe: The kernel least mean square algorithm, IEEE Trans. Signal Process. 56, 543–554 (2008)

    Article  MathSciNet  Google Scholar 

  11. W. Liu, J. Principe: Kernel affine projection algorithm, EURASIP J. Adv. Signal Process. 2008, 784292 (2008)

    Article  MATH  Google Scholar 

  12. Y. Engel, S. Mannor, R. Meir: The kernel recursive least-squares algorithm, IEEE Trans. Signal Process. 52, 2275–2285 (2004)

    Article  MathSciNet  Google Scholar 

  13. W. Liu, Il Park, Y. Wang, J.C. Principe: Extended kernel recursive least squares algorithm, IEEE Trans. Signal Process. 57, 3801–3814 (2009)

    Article  MathSciNet  Google Scholar 

  14. J. Platt: A resource-allocating network for function interpolation, Neural Comput. 3, 213–225 (1991)

    Article  MathSciNet  Google Scholar 

  15. C. Richard, J.C.M. Bermudez, P. Honeine: Online prediction of time series data with kernels, IEEE Trans. Signal Process. 57, 1058–1066 (2009)

    Article  MathSciNet  Google Scholar 

  16. W. Liu, Il Park, J.C. Principe: An information theoretic approach of designing sparse kernel adaptive filters, IEEE Trans. Neural Netw. 20, 1950–1961 (2009)

    Article  Google Scholar 

  17. B. Chen, S. Zhao, P. Zhu, J.C. Principe: Quantized kernel least mean square algorithm, IEEE Trans. Neural Netw. Learn. Syst. 23(1), 22–32 (2012)

    Article  Google Scholar 

  18. J.C. Principe: Information Theoretic Learning: Renyi's Entropy and Kernel Perspectives (Springer, New York 2010)

    Book  MATH  Google Scholar 

  19. W. Liu, P.P. Pokharel, J.C. Principe: Correntropy: properties and applications in non-Gaussian signal processing, IEEE Trans. Signal Process. 55(11), 5286–5298 (2007)

    Article  MathSciNet  Google Scholar 

  20. J.-W. Xu, A. Paiva, I. Park, J.C. Principe: A reproducing kernel Hilbert space framework for information-theoretic learning, IEEE Trans. Signal Process. 56(12), 5891–5902 (2008)

    Article  MathSciNet  Google Scholar 

  21. E. Moore: On properly positive Hermitian matrices, Bull. Am. Math. Soc. 23(59), 66–67 (1916)

    MATH  Google Scholar 

  22. N. Aronszajn: The theory of reproducing kernels and their applications, Cambr. Philos. Soc. Proc. 39, 133–153 (1943)

    Article  MathSciNet  Google Scholar 

  23. J. Mercer: Functions of positive and negative type, and their connection with the theory of integral equations, Philos. Trans. R. Soc. Lond. 209, 415–446 (1909)

    Article  MATH  Google Scholar 

  24. B. Hassibi, A.H. Sayed, T. Kailath: The $H_{{\infty}}$ optimality of the LMS algorithm, IEEE Trans. Signal Process. 44, 267–280 (1996)

    Article  Google Scholar 

  25. B.W. Silverman: Density Estimation for Statistics and Data Analysis (Chapman Hall/CRC, London 1986)

    Book  MATH  Google Scholar 

  26. B. Chen, S. Zhao, P. Zhu, J.C. Principe: Mean square convergence analysis of the kernel least mean square algorithm, Signal Process. 92, 2624–2632 (2012)

    Article  Google Scholar 

  27. I. Steinwart: On the infuence of the kernel on the consistency of support vector machines, J. Mach. Learn. Res. 2, 67–93 (2001)

    MathSciNet  MATH  Google Scholar 

  28. N.R. Yousef, A.H. Sayed: A unified approach to the steady-state and tracking analysis of adaptive filters, IEEE Trans. Signal Process. 49, 314–324 (2001)

    Article  Google Scholar 

  29. T.Y. Al-Naffouri, A.H. Sayed: Adaptive filters with error nonlinearities: mean-square analysis and optimum design, EURASIP J. Appl. Signal Process. 4, 192–205 (2001)

    Article  MATH  Google Scholar 

  30. T.Y. Al-Naffouri, A.H. Sayed: Transient analysis of adaptive filters with error nonlinearities, IEEE Trans. Signal Process. 51, 653–663 (2003)

    Article  Google Scholar 

  31. S. Zhao, B. Chen, J.C. Principe: Kernel adaptive filtering with maximum correntropy criterion, Proc. Int. Joint Conf. Neural Netw. (IJCNN) (2011) pp. 2012–2017

    Google Scholar 

  32. S. Zhao, B. Chen, J.C. Principe: An adaptive kernel width update for correntropy, Proc. Intern. Joint Conf. Neural Netw. (IJCNN) (2012), pp. 1–5

    Google Scholar 

  33. C.J.C. Burges: A tutorial on support vector machines for pattern recognition, Data Min. Knowl. Discov. 2, 121–167 (1998)

    Article  Google Scholar 

  34. S. Van Vaerenbergh, J. Via, I. Santamaria: A sliding window kernel RLS algorithm and its application to nonlinear channel identification, IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP), Toulouse (2006)

    Google Scholar 

  35. S. Van Vaerenbergh, I. Santamaria, W. Liu, J.C. Principe: Fixed-budget kernel recursive least-squares, 2010 IEEE Int. Conf. Acoust, Speech Signal Process. (ICASSP), Dallas (2010) pp. 1882–1885

    Chapter  Google Scholar 

  36. J. Kivinen, A. Smola, R.C. Williamson: Online learning with kernels, IEEE Trans. Signal Process. 52(8), 2165–2176 (2004)

    Article  MathSciNet  Google Scholar 

  37. T.-T. Frieb, R.F. Harrison: A kernel-based ADALINE, Proc. Eur. Symp. Artif. Neural Netw. 1999 (1999) pp. 245–250

    Google Scholar 

  38. W. Liu, P.P. Pokharel, J.C. Principe: Recursively adapted radial basis function networks and its relationship to resource allocating networks and online kernel learning, 2007 IEEE Workshop Mach. Learn. Signal Process., Thessaloniki (2007) pp. 300–305

    Chapter  Google Scholar 

  39. F. Girosi, M. Jones, T. Poggio: Regularization theory and neural networks architectures, Neural Comput. 7, 219–269 (1995)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Badong Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Chen, B., Liu, W., Principe, J.C. (2015). Theoretical Methods in Machine Learning. In: Kacprzyk, J., Pedrycz, W. (eds) Springer Handbook of Computational Intelligence. Springer Handbooks. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-43505-2_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-43505-2_30

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-43504-5

  • Online ISBN: 978-3-662-43505-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics