Elsevier

Neural Networks

Volume 20, Issue 3, April 2007, Pages 335-352
Neural Networks

2007 Special Issue
Optimization and applications of echo state networks with leaky- integrator neurons

https://doi.org/10.1016/j.neunet.2007.04.016Get rights and content

Abstract

Standard echo state networks (ESNs) are built from simple additive units with a sigmoid activation function. Here we investigate ESNs whose reservoir units are leaky integrator units. Units of this type have individual state dynamics, which can be exploited in various ways to accommodate the network to the temporal characteristics of a learning task. We present stability conditions, introduce and investigate a stochastic gradient descent method for the optimization of the global learning parameters (input and output feedback scalings, leaking rate, spectral radius) and demonstrate the usefulness of leaky-integrator ESNs for (i) learning very slow dynamic systems and replaying the learnt system at different speeds, (ii) classifying relatively slow and noisy time series (the Japanese Vowel dataset — here we obtain a zero test error rate), and (iii) recognizing strongly time-warped dynamic patterns.

Introduction

The idea that gave birth to the twin pair of echo state networks (ESNs) (Jaeger, 2001) and liquid state machines (LSMs) (Maass, Natschlaeger, & Markram, 2002) is simple. Use a large, random, recurrent neural network as an excitable medium–the “reservoir” or “liquid”,–which under the influence of input signals u(t) creates a high-dimensional collection of nonlinearly transformed versions xi(t)–the activations of its neurons–of u(t), from which a desired output signal y(t) can be combined. This simple idea leads to likewise simple offline (Jaeger, 2001) and online (Jaeger, 2003) learning algorithms, sometimes amazingly accurate models (Jaeger & Haas, 2004), and may also be realized in vertebrate brains (Mauk and Buonomano, 2004, Stanley et al., 1999).

It is still largely unknown what properties of the reservoir are responsible for which strengths or weaknesses of an ESN for a particular task. Clearly, reservoirs differing in size, connectivity structure, type of neuron or other characteristics will behave differently when put to different learning tasks. A closer analytical investigation and/or optimization schemes of reservoir dynamics has attracted the attention of several authors (Ozturk, Xu and Principe, 2007, Schiller and Steil, 2005, Schmidhuber et al., 2007, Zant et al., 2004). A door-opener result for a deeper understanding of reservoirs/liquids in our view is the work of Maass, Joshi, and Sontag (in press) who show that LSMs with possibly nonlinear output readout functions can approximate dynamic systems of nth order arbitrarily well, if the liquid is augmented by n additional units which are trained on suitable auxiliary signals. Finally, it deserves to be mentioned that in theoretical neuroscience the question of how biological networks can process temporal information has been approached in a fashion that is related in spirit to ESNs/LSMs. Precise timing phenomena can be explained as emerging from the network dynamics as such, without the necessity of special timing mechanisms like clocks or delay lines (Mauk & Buonomano, 2004). Buonomano (2005) presents an unsupervised learning rule for randomly connected, spiking neural networks that results in the emergence of neurons representing a continuum of differently timed stimulus responses, while preserving global network stability.

In this paper we add to this growing body of “reservoir research” and take a closer look at ESNs whose reservoir is made from leaky- integrator neurons. Leaky-integrator ESNs were in passing introduced in Jaeger, 2001, Jaeger, 2002b; fragments of what will be reported here appeared first in a technical report (Lukoševičius, Popovici, Jaeger, & Siewert, 2006).

This article is composed as follows. In Section 2 we provide the system equations and point out basic stability conditions — amounting to algebraic criteria for the echo state property (Jaeger, 2001) in leaky-integrator ESNs. Leaky- integrator ESNs have one more global control parameter than the standard sigmoid unit ESNs have: in addition to the input and output feedback scaling, and the spectral radius of the reservoir weight matrix, a leaking rate has to be optimized. Section 3 explores the impact of these global controls on learning performance and introduces a stochastic gradient descent method for finding the optimal settings. The remainder is devoted to three case studies. First, managing very slow timescales by adjusting the leaky neurons’ time constants is demonstrated with the “figure eight” problem (Section 4). This is an autonomous pattern generation task which also presents interesting dynamic stability challenges. Second, we treat the “Japanese Vowel” dataset. Using leaky- integrator neurons and some tricks of the trade we were able to achieve for the first time a zero test misclassification rate on this benchmark (Section 5). Finally, in Section 6 we demonstrate how leaky integrator ESNs can be designed which are inherently time-warping invariant.

For all computations reported in this article we used Matlab. The Matlab code concerning the global parameter optimization method, the “figure eight” and the Japanese Vowel studies is available online at http://www.faculty.iu-bremen.de/hjaeger/pubs.html.

Section snippets

System equations

We consider ESNs with K inputs, N reservoir neurons and L output neurons. Let u=u(t) denote the K-dimensional external input, x=x(t) the N-dimensional reservoir activation state, y=y(t) the L-dimensional output vector, Win, W, Wout and Wfb the input/internal/output/output feedback connection weight matrices of sizes N×K, N×N, L×(K+N) and N×L, respectively. Then the continuous-time dynamics of a leaky-integrator ESN is given by ẋ=1c(ax+f(Winu+Wx+Wfby)),y=g(Wout[x;u]), where c>0 is a time

Optimizing the global parameters

In this section we discuss practical issues around the optimization of the various global parameters that occur in (3). By “optimization” we mainly refer to the goal of achieving a minimal training error. Achieving a minimal test error is delegated to cross-validation schemes which need a method for minimizing the training error as a substep.

We first observe that optimizing δ is by and large a non-issue. Raw training data will almost always be available in a discrete-time version with a given

The lazy figure eight

In this section we will train a leaky integrator ESN to generate a slow “figure eight” pattern in two output neurons, and we will dynamically change the time constant in the ESN equations to slow down and speed up the generated pattern.

The “figure 8” generation task is a perennial exercise for RNNs (for example, see Pearlmutter (1995), Zegers and Sundareshan (2003) and references therein). The task appears not very complicated, because a “figure 8” can be interpreted as the superposition of a

Data and task description

The “Japanese Vowels” (JV) dataset1 is a frequently used benchmark for time series classification. The data record utterances of nine Japanese male speakers of the vowel /ae/. Each utterance is represented by 12 LPC cepstrum coefficients. There are 30 utterances per speaker in the training set, totaling to 270 samples, and a total of 370 test samples lengths distributed unevenly over the speakers

Time warping invariant echo state networks

Time warping of input patterns is a common problem when recognizing human generated input or dealing with data artificially transformed into time series. The most widely used technique for dealing with time-warped patterns is probably dynamic time warping (DTW) (Itakura, 1975) and its modifications. It is based on finding the cheapest (w.r.t. some cost function) mapping between the observed signal and a prototype pattern. The price of the mapping is then taken as a classification criterion.

Conclusion

Leaky-integrator ESNs are only slightly more complicated to implement and to use than standard ESNs and appear to us as quite flexible devices when timescale phenomena are involved, where standard ESNs run into difficulties. Caution is, however, advised when simple Euler approximations to the continuous-time leaky- integrator dynamics are used.

Two questions were encountered which we consider to be of longer-lasting importance:

  • Find computationally efficient ways to optimize the global scaling

Acknowledgments

The work on time-warping invariant ESNs reported here was supported by student contract grants for ML and DP from Planet intelligent systems GmbH, Raben Steinfeld, Germany. The authors would also like to thank five (!) anonymous reviewers of the NIPS 2005 conference, who helped to improve the presentation of Section 6, which once was a NIPS submission. The treatment of the lazy eight task owes much to discussions with J. Steil, R. Linsker, J. Principe and B. Schrauwen. The authors also express

References (31)

  • M.R. Hinder et al.

    The case for an internal dynamics model versus equilibrium point control in human movement

    Journal of Physiology

    (2003)
  • F. Itakura

    Minimum prediction residual principle applied to speech recognition

    IEEE Transactions on Acoustics, Speech and Signal Processing

    (1975)
  • Jaeger, H. (2001). The “echo state” approach to analysing and training recurrent neural networks. GMD report no. 148....
  • Jaeger, H. (2002a). Short term memory in echo state networks. GMD-report no. 152. GMD — German National Research...
  • Jaeger, H. (2002b). Tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the echo state network...
  • Cited by (0)

    View full text