Elsevier

Neurocomputing

Volume 70, Issues 7–9, March 2007, Pages 1342-1347
Neurocomputing

Improving generalization of MLPs with sliding mode control and the Levenberg–Marquardt algorithm

https://doi.org/10.1016/j.neucom.2006.09.003Get rights and content

Abstract

A variation of the well-known Levenberg–Marquardt for training neural networks is proposed in this work. The algorithm presented restricts the norm of the weights vector to a preestablished norm value and finds the minimum error solution for that norm value. The norm constrain controls the neural networks degree of freedom. The more the norm increases, the more flexible is the neural model. Therefore, more fitted to the training set. A range of different norm solutions is generated and the best generalization solution is selected according to the validation set error. The results show the efficiency of the algorithm in terms of generalization performance.

Introduction

The generalization performance of a neural network is defined by its capability to learn from a data set minimizing the influence of a stochastic variable associated with a noise. The error measurement of a validation set is a choice to estimate how fitted is the neural network to the training set. In general, the final model should provide the balance between the training and validation sets errors.

The topology of the network can be pruned to improve generalization [8]. The network parameters can also be optimized to estimate the appropriate amplitude of the weights using validation criterion [1], [9], [11]. The subject about generalization is known in the literature as the bias and variance dilemma [4] and it is the main concern in training neural networks.

The multi-objective [10] evaluates a balance between bias and variance of a neural network by selecting solutions within the Pareto set in the space of the two objectives: norm of the weight vectors w and training set error V(w). In order to obtain the optimal solution within the Pareto set, multi-objective optimization techniques have been applied to the training problem [10]. More recently, a sliding mode algorithm that is able to generate arbitrary trajectories in the space of objectives has also been proposed [3]. The goal of these algorithms is to reach an arbitrary target point (V(wt),wt) in the Pareto set by minimizing the distance between the current network solution (V(wi),wi) at iteration i and (V(wt),wt).

Error minimization algorithms for neural networks learning most frequently use local surface information to improve updates. The gradient descent or steepest-descent method [2] implements a linear approximation of the error function at the operating point w(k). The main advantage consists on the simplicity of implementation, but its drawback is the slow speed of convergence. In order to improve convergence performance, higher-order surface information near the operating point can be used. Newton's method uses a quadratic approximation of the error surface and is able to reach the minimum of quadratic surfaces in one step. The method is efficient for lower dimensional problems, but due to the calculation of the inverse hessian matrix, its computational complexity increases drastically with problem dimension. In addition, it does not guarantee convergence for non-quadratic functions.

The Levenberg–Marquardt algorithm [6] is a variation of Newton's method that approximates locally the surface by a quadratic function, what simplifies the calculation of the hessian matrix. The Levenberg–Marquardt algorithm for neural networks training is one of the most efficient learning algorithms for neural networks learning [5]. The main advantage of the Levenberg–Marquardt consists in its speed of converge. However, its original proposal [6] does not concern about the generalization performance.

The algorithm presented in this paper aims at improving the generalization performance as well as the speed of convergence. It is based on the Levenberg–Marquardt algorithm and on the Sliding Mode Control Algorithm [3] to generate approximations to the Pareto set solutions. Sliding mode control minimizes the distance of the norm in relation to the target norm value, while the Levenberg–Marquardt algorithm is used to accelerate the minimization of the training set error. Arbitrary points in the Pareto set can be estimated, from which the smallest validation error solution is selected. The results show that the association of the proposed methods allows to generate approximations to the Pareto set solutions faster than the Sliding Mode Control Algorithm. The final solution presents good generalization performance comparable with the Pareto set approximation solution using the standard sliding model control [3] approach but with increased convergence speed.

Section snippets

The Levenberg–Marquardt algorithm

The Levenberg–Marquardt optimization algorithm [6] is an approximation of Newton's method and its adaptation for neural networks training [5] is much more efficient than the usual gradient based techniques. For the equations that follow, the sum of squared error function V(w) for the current weight vector w is defined asV(w)=i=1Nei2(w),where ei is the output error for every input pattern and w the weight vector.

The one-step weight update equation using Newton's method, considering a quadratic

The sliding mode control algorithm

The Sliding Mode Control Multi-Objective (SMC-MOBJ) algorithm [3] can reach any solution (Vt,wt) in the feasible space of objectives defined by the sum of squared errors Vt and the norm of the weight vectors wt. Any trajectory in the space of objectives can be generated with the SMC-MOBJ algorithm by means of a Sliding Mode Control Algorithm [7]. The controlled trajectory aims at reaching the Pareto set region that contains the solution that minimizes the error in relation to the generator

The proposed algorithm

According to [10], MLPs solutions can be visualized in the space of objectives defined by the two cost functions: sum of squared errors (Eq. (1)) and norm of the weight vectors. High generalization models are obtained by restricting the solutions to the Pareto set [10].w=jiwji2.The norm of the weight vectors function (Eq. (11)) has a quadratic form with its minimum at the origin (null weights). The norm function does not have local minima and its shape is independent of the training data.

Results

In order to test the proposed algorithm and compare it with the standard Levenberg–Marquardt without norm constraint, a regression problem is presented first for an oversized MLP with 15 hidden nodes, chosen arbitrarily. Training data were generated from the following function:f(t)=4.26(e-t-4e-2t+3e-3t)+ε,where ε is a random variable normally distributed with mean 0 and standard deviation σ=0.2. The training set consists on 100 observations within the interval (0;3.25). The Pareto set was

Conclusions

The Levenberg–Marquardt algorithm is known to be very fast for lower dimensional problems. Although its efficiency decreases with large data sets and networks, it is considered one of the most efficient learning algorithms in terms of convergence speed, measured by the number of epochs. The solutions obtained with Levenberg–Marquardt tend to have very small value of error and large value of norm, as shown in Fig. 2. This is due to the well-known trade-off between the conflicting behavior of the

Marcelo Azevedo Costa is an Associate Professor at the Department of Statistics at the Federal University of Minas Gerais, Brazil. He received a Ph.D. in Electrical Engineering from the Federal University of Minas Gerais, in 2002. His current research interest areas are Statistical Learning, Neural Networks, Spatial-Temporal Statistics and applications.

References (11)

  • P.L. Bartlett

    For valid generalization, the size of the weights is more important than the size of the network

  • M.S. Bazaraa et al.

    Nonlinear Programming: Theory and Algorithms

    (January 1993)
  • M.A. Costa, A.P. Braga, B.R. Menezes, R.A. Teixeira, G.G. Parma, Training neural networks with a multi-objective...
  • S. Geman et al.

    Neural networks and the bias/variance dilemma

    Neural Comput.

    (1992)
  • M.T. Hagan et al.

    Training feedforward networks with the marquardt algorithm

    IEEE Trans. Neural Networks

    (1994)
There are more references available in the full text version of this article.

Cited by (0)

Marcelo Azevedo Costa is an Associate Professor at the Department of Statistics at the Federal University of Minas Gerais, Brazil. He received a Ph.D. in Electrical Engineering from the Federal University of Minas Gerais, in 2002. His current research interest areas are Statistical Learning, Neural Networks, Spatial-Temporal Statistics and applications.

Antônio P. Braga is an Adjunct Professor at Department of Electronics at Federal University of Minas Gerais, Brazil. He has published several papers on international journals, conferences and has written one book on NN. He is the head of the Computational Intelligence Lab and co-editor in chief of International Journal of Computational Intelligence and Applications, published by Imperial College Press, London. His current research interest areas are learning, hybrid systems, applications and distributed learning systems.

Benjamim R. Menezes received the B.S. degree in Electrical Engineering from the Federal University of Minas Gerais, Brazil, in 1977, the M.Sc. degree in Electrical Engineering from the Federal University of Rio de Janeiro, Brazil, in 1980, and the D.Eng. degree in Electrical Engineering from the Institut National Polytechnique de Lorraine, France, in 1985. He is currently a Professor at the Department of Electronics Engineering at Federal University of Minas Gerais. His areas of interest are intelligent control and control of electrical drives.

View full text