Elsevier

Neurocomputing

Volume 14, Issue 1, January 1997, Pages 63-83
Neurocomputing

Paper
High performance training of feedforward and simple recurrent networks

https://doi.org/10.1016/0925-2312(95)00132-8Get rights and content

Abstract

TRAINREC is a system for training feedforward and recurrent neural networks that incorporates several ideas. It uses the conjugate-gradient method which is demonstrably more efficient than traditional backward error propagation. We assume epoch-based training and derive a new error function having several desirable properties absent from the traditional sum-of-squares-error function. We argue for skip (shortcut) connections where appropriate and the preference for a bipolar sigmoidal yielding values over the [−1, 1] interval. The input feature space is often over-analyzed, but by using singular value decomposition, input patterns can be conditioned for better learning often with a reduced number of input units. Recurrent networks, in their most general form, require special handling and cannot be simply a re-wiring of the architecture without a corresponding revision of the derivative calculations. There is a careful balance required among the network architecture (specifically, hidden and feedback units), the amount of training applied, and the ability of the network to generalize. These issues often hinge on selecting the proper stopping criterion.

Discovering methods that work in theory as well as in practice is difficult and we have spent a substantial amount of effort evaluating and testing these ideas on real problems to determine their value. This paper encapsulates a number of such ideas ranging from those motivated by a desire for efficiency of training to those motivated by correctness and accuracy of the result While this paper is intended to be self-contained, several references are provided to other work upon which many of our claims are based.

References (33)

  • J. Pollack

    Recursive distributed representations

    Artificial Intelligence

    (1990)
  • A.K. Rigler et al.

    Rescaling of variables in back propagation learning

    Neural Networks

    (1991)
  • E. Barnard

    Optimization for training neural nets

    IEEE Trans. Neural Networks

    (1992)
  • J.L. Elman

    Finding Structure in Time

  • L. Fausett

    Fundamentals of Neural Networks

  • S.D.D. Goggin et al.

    An asymptotic singular value decomposition analysis of nonlinear multilayer neural networks

  • G.H. Golub et al.
  • D. Gorse et al.

    A classical algorithm for avoiding local minima

  • B.L Kalman

    Superlinear Learning in Back-Propagation Neural Networks

  • G.M. Kuhn et al.

    Some Variations on Training of Recurrent Networks

    (1991)
  • S.C. Kwasny et al.

    Recurrent natural language parsing

  • B.L. Kalman et al.

    A superior error function for training neural networks

  • B.L. Kalman et al.

    Why tanh: Choosing a sigmoidal function

  • B.L. Kalman et al.

    Decomposing input patterns to facilitate training

  • S.C. Kwasny et al.

    Real-time identification of language from raw speech waveforms

  • S.C. Kwasny et al.

    Tail-recursive distributed representations and simple recurrent networks

    Connection Science

    (1995)
  • Cited by (10)

    • Predicting equilibrium vapour pressure isotope effects by using artificial neural networks or multi-linear regression - A quantitative structure property relationship approach

      2015, Chemosphere
      Citation Excerpt :

      During this process, the connection weights are constantly adjusted by means of propagation error from the output to the input layer. The learning algorithm used in this study is a conjugated gradient especially used for a reduce number of data (Kalman and Kwasny, 1997; Livieris and Pintelas, 2013). The explanatory variables were standardized before being introduced into the MLP.

    • Reinforced learning systems based on merged and cumulative knowledge to predict human actions

      2014, Information Sciences
      Citation Excerpt :

      The feedback process uses current control errors and creates links between input data and output data. This process can be implemented through recurrent neural networks [4,6,15,16,37], such as Self-Organizing Maps (SOM) [4,7,18,52,56]. The input data for such systems can include the type and state of data used to manage knowledge content.

    • Combining conditional volatility forecasts using neural networks: An application to the EMS exchange rates

      1999, Journal of International Financial Markets, Institutions and Money
    View all citing articles on Scopus

    This material is based upon work supported by the National Science Foundation under Grant No. IRI-9201987.

    View full text