2007 Special IssueOptimization and applications of echo state networks with leaky- integrator neurons
Introduction
The idea that gave birth to the twin pair of echo state networks (ESNs) (Jaeger, 2001) and liquid state machines (LSMs) (Maass, Natschlaeger, & Markram, 2002) is simple. Use a large, random, recurrent neural network as an excitable medium–the “reservoir” or “liquid”,–which under the influence of input signals creates a high-dimensional collection of nonlinearly transformed versions –the activations of its neurons–of , from which a desired output signal can be combined. This simple idea leads to likewise simple offline (Jaeger, 2001) and online (Jaeger, 2003) learning algorithms, sometimes amazingly accurate models (Jaeger & Haas, 2004), and may also be realized in vertebrate brains (Mauk and Buonomano, 2004, Stanley et al., 1999).
It is still largely unknown what properties of the reservoir are responsible for which strengths or weaknesses of an ESN for a particular task. Clearly, reservoirs differing in size, connectivity structure, type of neuron or other characteristics will behave differently when put to different learning tasks. A closer analytical investigation and/or optimization schemes of reservoir dynamics has attracted the attention of several authors (Ozturk, Xu and Principe, 2007, Schiller and Steil, 2005, Schmidhuber et al., 2007, Zant et al., 2004). A door-opener result for a deeper understanding of reservoirs/liquids in our view is the work of Maass, Joshi, and Sontag (in press) who show that LSMs with possibly nonlinear output readout functions can approximate dynamic systems of th order arbitrarily well, if the liquid is augmented by additional units which are trained on suitable auxiliary signals. Finally, it deserves to be mentioned that in theoretical neuroscience the question of how biological networks can process temporal information has been approached in a fashion that is related in spirit to ESNs/LSMs. Precise timing phenomena can be explained as emerging from the network dynamics as such, without the necessity of special timing mechanisms like clocks or delay lines (Mauk & Buonomano, 2004). Buonomano (2005) presents an unsupervised learning rule for randomly connected, spiking neural networks that results in the emergence of neurons representing a continuum of differently timed stimulus responses, while preserving global network stability.
In this paper we add to this growing body of “reservoir research” and take a closer look at ESNs whose reservoir is made from leaky- integrator neurons. Leaky-integrator ESNs were in passing introduced in Jaeger, 2001, Jaeger, 2002b; fragments of what will be reported here appeared first in a technical report (Lukoševičius, Popovici, Jaeger, & Siewert, 2006).
This article is composed as follows. In Section 2 we provide the system equations and point out basic stability conditions — amounting to algebraic criteria for the echo state property (Jaeger, 2001) in leaky-integrator ESNs. Leaky- integrator ESNs have one more global control parameter than the standard sigmoid unit ESNs have: in addition to the input and output feedback scaling, and the spectral radius of the reservoir weight matrix, a leaking rate has to be optimized. Section 3 explores the impact of these global controls on learning performance and introduces a stochastic gradient descent method for finding the optimal settings. The remainder is devoted to three case studies. First, managing very slow timescales by adjusting the leaky neurons’ time constants is demonstrated with the “figure eight” problem (Section 4). This is an autonomous pattern generation task which also presents interesting dynamic stability challenges. Second, we treat the “Japanese Vowel” dataset. Using leaky- integrator neurons and some tricks of the trade we were able to achieve for the first time a zero test misclassification rate on this benchmark (Section 5). Finally, in Section 6 we demonstrate how leaky integrator ESNs can be designed which are inherently time-warping invariant.
For all computations reported in this article we used Matlab. The Matlab code concerning the global parameter optimization method, the “figure eight” and the Japanese Vowel studies is available online at http://www.faculty.iu-bremen.de/hjaeger/pubs.html.
Section snippets
System equations
We consider ESNs with inputs, reservoir neurons and output neurons. Let denote the -dimensional external input, the -dimensional reservoir activation state, the -dimensional output vector, , , and the input/internal/output/output feedback connection weight matrices of sizes , , and , respectively. Then the continuous-time dynamics of a leaky-integrator ESN is given by where is a time
Optimizing the global parameters
In this section we discuss practical issues around the optimization of the various global parameters that occur in (3). By “optimization” we mainly refer to the goal of achieving a minimal training error. Achieving a minimal test error is delegated to cross-validation schemes which need a method for minimizing the training error as a substep.
We first observe that optimizing is by and large a non-issue. Raw training data will almost always be available in a discrete-time version with a given
The lazy figure eight
In this section we will train a leaky integrator ESN to generate a slow “figure eight” pattern in two output neurons, and we will dynamically change the time constant in the ESN equations to slow down and speed up the generated pattern.
The “figure 8” generation task is a perennial exercise for RNNs (for example, see Pearlmutter (1995), Zegers and Sundareshan (2003) and references therein). The task appears not very complicated, because a “figure 8” can be interpreted as the superposition of a
Data and task description
The “Japanese Vowels” (JV) dataset1 is a frequently used benchmark for time series classification. The data record utterances of nine Japanese male speakers of the vowel /ae/. Each utterance is represented by 12 LPC cepstrum coefficients. There are 30 utterances per speaker in the training set, totaling to 270 samples, and a total of 370 test samples lengths distributed unevenly over the speakers
Time warping invariant echo state networks
Time warping of input patterns is a common problem when recognizing human generated input or dealing with data artificially transformed into time series. The most widely used technique for dealing with time-warped patterns is probably dynamic time warping (DTW) (Itakura, 1975) and its modifications. It is based on finding the cheapest (w.r.t. some cost function) mapping between the observed signal and a prototype pattern. The price of the mapping is then taken as a classification criterion.
Conclusion
Leaky-integrator ESNs are only slightly more complicated to implement and to use than standard ESNs and appear to us as quite flexible devices when timescale phenomena are involved, where standard ESNs run into difficulties. Caution is, however, advised when simple Euler approximations to the continuous-time leaky- integrator dynamics are used.
Two questions were encountered which we consider to be of longer-lasting importance:
- •
Find computationally efficient ways to optimize the global scaling
Acknowledgments
The work on time-warping invariant ESNs reported here was supported by student contract grants for ML and DP from Planet intelligent systems GmbH, Raben Steinfeld, Germany. The authors would also like to thank five (!) anonymous reviewers of the NIPS 2005 conference, who helped to improve the presentation of Section 6, which once was a NIPS submission. The treatment of the lazy eight task owes much to discussions with J. Steil, R. Linsker, J. Principe and B. Schrauwen. The authors also express
References (31)
- et al.
Multidimensional curve classification using passing-through regions
Pattern Recognition Letters
(1999) A tutorial on hidden Markov models and selected applications in speech recognition
- et al.
Analyzing the weight dynamics of recurrent learning algorithms
Neurocomputing
(2005) - Barber, D. (2003). Dynamic Bayesian networks with deterministic latent tables. In Proc. NIPS 2003....
- et al.
A tighter bound for the echo state property
IEEE Transactions on Neural Networks
(2006) A learning rule for the emergence of stable dynamics and timing in recurrent networks
Journal of Neurophysiology
(2005)- Duin, R. P. W. (2002). The combining classifier: To train or not to train? In R. Kasturi, D. Laurendeau, & C. Suen...
Adaptive filters: Theory and applications
(1998)- Geurts, P. (2001). Pattern extraction for time series classification. In L. De Raedt, A. Siebes (Eds.), Proc. PKDD 2001...
- et al.
The elements of statistical learning
(2001)