A kernel-based Adaline for function approximation
Introduction
By expanding a function in series form it can be represented to an arbitrary degree of accuracy by taking enough terms. It is therefore possible, in principle, to conduct a linear regression on a new set of variables, transformed by a fixed mapping. This leads to a large computational burden and to the need for an infeasible amount of data from which the coefficients must be estimated and is not generally practical for function approximation. The algorithm studied in [1] is a linear Perceptron [12], which is computed implicitly in a space of infinite-dimension (called the linearisation space) using potential functions (sometimes also called kernel functions). The idea has been exploited in the potential function algorithm [1], and recently in the support vector machine (SVM) [2], and linear programming machine [5], [10], [11]. The kernel Adatron [4], [6] provides a simple and robust alternative to these classifiers and provides arbitrary, large-margin discriminant functions sequentially, avoiding the intensive quadratic programming problem of the SVM.
We now use potential function kernels to develop a non-linear version of the Adaline [15], yielding a general, non-linear adaptive mapping device via an algorithm with well-exploited properties. Selecting a appropriate potential function and its parameter specifies the mapping to the linearisation space, which can be done via the usual train/test cycle of neural networks.
Section snippets
The Adaline
The Adaline [15] has the formwhere x is an n-vector of input data, w a set of n weights, b a bias term, and 〈w,x〉=∑i=1nwixi denotes a scalar product.
The Adaline is a member of the general class known as Perceptrons. Its objective is, given some measured (target-) data, , to adjust the weight vector, w, so that the mean squared error (MSE) between the model's output, f(x), and the target values, yi, of the training patterns, , is minimised. Replacing the true
Potential function Adaline for non-linear function approximation
If a fixed mapping of the input data, zj=ψ(xj), where known that transform ψ(·) is rich enough [1] to capture the underlying functional form of the signal to learn, the data could be transformed and fitted by a linear combination of the z's. Through the use of positive semidefinite potential function kernels (Mercer kernels) it is possible to perform the mappings implicitly. Mercer kernels represent inner products in some Hilbert space [3], and thus the mapping of two samples into a high
Computer experiments
Static curve fitting: Here we fit the sinc function, uniformly sampled on the interval (−10,10) with added noise ∼N(0,0.04). Identifying a minimum (or infimum) in the OOS MSE and reverting to that point allows us to perform early stopping (weight decay) regularisation.
A radial basis potential function kernel is used with three values of σ and η=.01. A near zero value of training MSE is achieved for σ=0.2 (not shown) indicating that the fit is over-specialised in the training data. This effect
Conclusion
The kernel-Adaline is a sequential and numerically robust algorithm of the potential function Perceptron type, which uses kernels for high (possible infinite) order expansions and applies the LMS rule in the high-dimensional linearisation space. A non-linear version of the hitherto linear Adaline has been developed whose convergence properties follow directly from analysis of the LMS algorithm as a consequence of operating in the linearisation space.
Initial experiments show that, while the
References (15)
- et al.
Theoretical foundations of the potential function method in pattern recognition learning
Automations and Remote Control
(1964) - et al.
Support vector networks
Machine Learning
(1995) - et al.
Methods of Mathematical Physics
(1951) - T-T. Frieß, N. Cristianini, et al. The kernel-Adatron algorithm: a fast and simple learning procedure for support...
- T-T. Frieß, R.F. Harrison, Perceptrons in kernel feature spaces, Research Report No. 720, The University of Sheffield,...
- T-T. FrießR.F. Harrison, Support vector neural networks: the kernel Adatron with bias and soft margin, Research Report...
- T-T. Frieß, R.F., Harrison, The kernel adatron with bias unit: analysis of the algorithm, Research Report No. 728, The...