A kernel-based Adaline for function approximation

https://doi.org/10.1016/S1088-467X(99)00025-6Get rights and content

Abstract

In this work the kernel Adaline algorithm is presented. The new algorithm is a generalisation of Widrow's and Hoff's linear Adaline, and allows to approximate non-linear functional relationships. Similar to the linear adaline, the proposed neural network algorithm minimises the least-mean-squared (LMS) cost function. It can be guaranteed that the kernel Adaline's cost function is always convex, therefore the method does not suffer from local optima as known in conventional neural networks. The algorithm uses potential function operators due to Aizerman and colleagues to map the training points in a first stage into a very high dimensional non-linear “feature” space. In the second stage the LMS solution in this space is determined by the algorithm. Weight decay regularisation allows to avoid overfitting effects, and can be performed efficiently. The kernel Adaline algorithm works in a sequential fashion, is conceptually simple, and numerically robust. The method shows a high performace in tasks like one dimensional curve fitting, system identification, and speech processing.

Introduction

By expanding a function in series form it can be represented to an arbitrary degree of accuracy by taking enough terms. It is therefore possible, in principle, to conduct a linear regression on a new set of variables, transformed by a fixed mapping. This leads to a large computational burden and to the need for an infeasible amount of data from which the coefficients must be estimated and is not generally practical for function approximation. The algorithm studied in [1] is a linear Perceptron [12], which is computed implicitly in a space of infinite-dimension (called the linearisation space) using potential functions (sometimes also called kernel functions). The idea has been exploited in the potential function algorithm [1], and recently in the support vector machine (SVM) [2], and linear programming machine [5], [10], [11]. The kernel Adatron [4], [6] provides a simple and robust alternative to these classifiers and provides arbitrary, large-margin discriminant functions sequentially, avoiding the intensive quadratic programming problem of the SVM.

We now use potential function kernels to develop a non-linear version of the Adaline [15], yielding a general, non-linear adaptive mapping device via an algorithm with well-exploited properties. Selecting a appropriate potential function and its parameter specifies the mapping to the linearisation space, which can be done via the usual train/test cycle of neural networks.

Section snippets

The Adaline

The Adaline [15] has the formf(x)=〈w,x〉+b,where x is an n-vector of input data, w a set of n weights, b a bias term, and 〈w,x〉=∑i=1nwixi denotes a scalar product.

The Adaline is a member of the general class known as Perceptrons. Its objective is, given some measured (target-) data, yii∈{1..L}, to adjust the weight vector, w, so that the mean squared error (MSE) between the model's output, f(x), and the target values, yi, of the training patterns, xii∈{1..L}, is minimised. Replacing the true

Potential function Adaline for non-linear function approximation

If a fixed mapping of the input data, zj=ψ(xj), where known that transform ψ(·) is rich enough [1] to capture the underlying functional form of the signal to learn, the data could be transformed and fitted by a linear combination of the z's. Through the use of positive semidefinite potential function kernels (Mercer kernels) it is possible to perform the mappings implicitly. Mercer kernels represent inner products in some Hilbert space [3], and thus the mapping of two samples into a high

Computer experiments

Static curve fitting: Here we fit the sinc function, uniformly sampled on the interval (−10,10) with added noise ∼N(0,0.04). Identifying a minimum (or infimum) in the OOS MSE and reverting to that point allows us to perform early stopping (weight decay) regularisation.

A radial basis potential function kernel is used with three values of σ and η=.01. A near zero value of training MSE is achieved for σ=0.2 (not shown) indicating that the fit is over-specialised in the training data. This effect

Conclusion

The kernel-Adaline is a sequential and numerically robust algorithm of the potential function Perceptron type, which uses kernels for high (possible infinite) order expansions and applies the LMS rule in the high-dimensional linearisation space. A non-linear version of the hitherto linear Adaline has been developed whose convergence properties follow directly from analysis of the LMS algorithm as a consequence of operating in the linearisation space.

Initial experiments show that, while the

References (15)

  • M. Aizerman et al.

    Theoretical foundations of the potential function method in pattern recognition learning

    Automations and Remote Control

    (1964)
  • C. Cortes et al.

    Support vector networks

    Machine Learning

    (1995)
  • A. Courant et al.

    Methods of Mathematical Physics

    (1951)
  • T-T. Frieß, N. Cristianini, et al. The kernel-Adatron algorithm: a fast and simple learning procedure for support...
  • T-T. Frieß, R.F. Harrison, Perceptrons in kernel feature spaces, Research Report No. 720, The University of Sheffield,...
  • T-T. FrießR.F. Harrison, Support vector neural networks: the kernel Adatron with bias and soft margin, Research Report...
  • T-T. Frieß, R.F., Harrison, The kernel adatron with bias unit: analysis of the algorithm, Research Report No. 728, The...
There are more references available in the full text version of this article.

Cited by (0)

View full text