Elsevier

Neural Networks

Volume 24, Issue 5, June 2011, Pages 440-456
Neural Networks

Architectural and Markovian factors of echo state networks

https://doi.org/10.1016/j.neunet.2011.02.002Get rights and content

Abstract

Echo State Networks (ESNs) constitute an emerging approach for efficiently modeling Recurrent Neural Networks (RNNs). In this paper we investigate some of the main aspects that can be accounted for the success and limitations of this class of models. In particular, we propose complementary classes of factors related to contractivity and architecture of reservoirs and we study their relative relevance.

First, we show the existence of a class of tasks for which ESN performance is independent of the architectural design. The effect of the Markovian factor, characterizing a significant class within these cases, is shown by introducing instances of easy/hard tasks for ESNs featured by contractivity of reservoir dynamics.

In the complementary cases, for which architectural design is effective, we investigate and decompose the aspects of network design that allow a larger reservoir to progressively improve the predictive performance. In particular, we introduce four key architectural factors: input variability, multiple time-scales dynamics, non-linear interactions among units and regression in an augmented feature space. To investigate the quantitative effects of the different architectural factors within this class of tasks successfully approached by ESNs, variants of the basic ESN model are proposed and tested on instances of datasets of different nature and difficulty.

Experimental evidences confirm the role of the Markovian factor and show that all the identified key architectural factors have a major role in determining ESN performances.

Introduction

Recurrent Neural Networks (RNNs) are a widely known class of neural network models used for sequential data processing. Reservoir Computing (RC) (e.g. Lukoševičius and Jaeger, 2009, Verstraeten et al., 2007) is a denomination for a class of RNN models that are characterized by a conceptual separation between a recurrent dynamical part and a simple non-recurrent output tool. The striking feature of RC is that the recurrent part of the network can be left untrained after initialization as long as it satisfies some very easy-to-check properties. Learning is then restricted to the recurrent-free output part, leading to a very efficient RNN design. RC comprises several classes of RNN models, including the popular Echo State Networks (ESNs) (Jaeger, 2001, Jaeger and Haas, 2004), Liquid State Machines (LSMs) (Maass, Natschlager, & Markram, 2002) and other approaches such as BackPropagation Decorrelation (BPDC) (Steil, 2004, Steil, 2006) and Evolino (Schmidhuber, Wierstra, Gagliolo, & Gomez, 2007). In this paper we focus on the ESN approach.

An ESN (typically) consists in a large and sparsely connected untrained reservoir layer of recurrent neurons, connected to a simple trained readout layer of linear neurons. A valid reservoir satisfies a condition on the state dynamics called the Echo State Property (ESP).

ESNs have been successfully applied in several sequential domains, such as non-linear system identification (e.g. Jaeger, 2002a), robot control (e.g. Hertzberg et al., 2002, Ishu et al., 2004, Plöger et al., 2003), speech processing (e.g. Skowronski & Harris, 2006), time series prediction and noise modeling (e.g. Jaeger & Haas, 2004).

However, some doubts remain about the applicative success of ESNs on practical tasks, with particular regard to problems for which standard RNNs have achieved good performance (Prokhorov, 2005). Moreover a number of more theoretical open issues still remain and motivate the research effort in the ESN area. Some of the main research topics on ESNs (Jaeger, 2005) focus on the optimization of reservoirs towards specific problems (Ishu et al., 2004, Schmidhuber et al., 2007, Schrauwen et al., 2008), the role of topological organization of reservoirs (Yanbo, Le, & Haykin, 2007) and the properties of reservoirs that are responsible for successful or unsuccessful applications (Hajnal and Lorincz, 2006, Ozturk et al., 2007). In particular, this last topic, considered in relation to the reservoir architecture and its (usually) high dimensionality is of a special interest for the aims of this paper.

Other aspects concerning the optimal design of ESNs involving the setting of hyper-parameters of the reservoir, such as the input scaling, the bias, the spectral radius and the settling time (see e.g. Venayagamoorthy and Shishir, 2009, Verstraeten et al., 2010) lie out of the aims of the paper.

An important feature of ESNs is contractivity of reservoir state transition function, which always guarantees stability of the network state dynamics (regardless of other initialization aspects) and the ESP (therefore valid reservoirs). Moreover, under a contractive setting, the network state dynamics is bounded into a region of the state space with interesting properties. The characteristics of state contracting mappings have already been investigated in the contexts of Iterated Function Systems (IFSs), variable memory length predictive models, fractal theory and for describing the bias of trainable RNNs initialized with small weights (Hammer and Tiňo, 2003, Tiňo et al., 2004, Tiňo and Hammer, 2003, Tiňo et al., 2007). It is a known fact that RNNs initialized with contractive state transition functions are able to discriminate among different (recent) input histories even prior to learning (Hammer and Tiňo, 2003, Tiňo et al., 2004), according to a Markovian organization of the state dynamics. Such characterization also applies to ESNs (e.g. Tiňo et al., 2007), although in this context it has still not been completely clarified, and investigations about possibilities and limitations of the ESN approach due to a Markovian nature of state dynamics are needed.

In particular, ESNs exploit the consequences of Markovianity of state dynamics in combination with a typically high dimensionality and non-linearity of the recurrent reservoir. The importance of a richly varied ESN state dynamics within a large number of reservoir units has been theoretically and experimentally pointed out in ESN literature (e.g. Jaeger, 2001, Jaeger and Haas, 2004, Tiňo et al., 2007, Verstraeten et al., 2007), although neither completely analyzed nor empirically evaluated. Moreover, a high dimensional reservoir constitutes the basis to argue a universal approximation property with bounded memory of ESNs, even in presence of a linear readout layer (Tiňo et al., 2007). Indeed, although the Markovian organization of the reservoir state space rules the dynamics of ESNs, it is known (e.g. Jaeger, 2002c, Makula et al., 2004, Verstraeten et al., 2007) that large reservoirs show a goodness of predictive results on sequence tasks which is almost proportional to the number of reservoir units. The Markovian characterization of the reservoir state space seems therefore not sufficient to completely explain the performances of the model.

These points open interesting issues, motivating our investigation on the factors which may influence the model behavior and on the assessment of their relative importance. In particular, adopting a critical perspective as in Prokhorov (2005), we are interested in the complementary investigation of characterizing (and not only of identifying) classes of tasks to which ESNs can be successfully/unsuccessfully applied.

In this paper, to approach the mentioned investigations still lacking in the ESN literature, Markovianity of reservoir dynamics is directly considered in relation to the issue of identifying relevant factors which might determine success and limitations of the ESN model and is specifically studied in relation to other architectural factors of network design.

Complementarily, on tasks for which ESNs show good results, we pose the question of identifying the sources of richness in reservoir dynamics that can be fruitfully exploited in terms of predictive accuracy (performance in the following) of the model. The aspects of high dimensionality and non-linearity of reservoirs are studied by asking to which extent performance improvements obtained by increasing the number of recurrent reservoir units is due to a larger number of non-linear recurrent dynamics or to the effect of the possibility to regress an augmented feature space. We also propose a study of different architectural factors of ESN design which allow the reservoir units to effectively diversify their activations and lead to an enrichment of the state dynamics. This is done by measuring and comparing the effects on the performance due to the inclusion of individual factors and combination of factors in the design of ESNs. This study also investigates the effectiveness on ESN performance of the characteristic of sparsity among reservoir units connections, which is commonly claimed to be a crucial feature of ESN modeling.

Recently, there has been a growing interest in studying architectural variants and simplifications of the standard ESN model. In particular, a number of reservoir models with an even simpler architecture than ESN have been proposed. A model with self-recurrent connections only, linear reservoir neurons and unitary input-to-reservoir weights, the so called “Simple ESN” (SESN) was presented in Fette and Eggert (2005). A feed-forward variant of ESN, the “Feed-Forward ESN” (FFESN), was introduced in Cernanský and Makula (2005), while in Cernanský and Tiňo (2008) a further simplification of the model with reservoir units organized into a tapped delay line was proposed. Our work, being directed towards a deeper understanding of the comparative predictive performance effects of different architectural factors of ESN design, can also be intended in this research direction as well.

According to the motivations described above, in short, the aims of this paper can be summarized as follows. We outline complementary cases of the ESN behavior. First, independently of the architectural network design (and reservoir dimensionality), we provide a characterization of contractive ESNs, captured by the concept of Markovian factor. Then, we identify relevant factors of architectural ESN design that allow a larger dimensional reservoir to be effective in terms of network predictive performance. In the approach adopted in the paper, the existence of such cases and the relative relevance of the proposed factors are concretely assessed by specific instances where the effect can be empirically evaluated.

The rest of the paper is organized as follows. Section 2 reviews the ESN model in the framework of RNN processing of sequential data. Section 3 focuses on the Markovian organization of reservoir state dynamics. Section 4 introduces the identified architectural factors of ESN design and the corresponding architectural variants proposed to the standard ESN model. Experimental results are illustrated in Section 5, by firstly discussing the influence of Markovianity on ESN performance, and then by assessing the relevance of the proposed architectural factors on tasks of common consideration in the ESN literature, showing a significant effect of the reservoir dimensionality. Finally, Section 6 summarizes the main general results of the paper.

Section snippets

Recurrent and echo states models for sequence processing

In this paper we are interested in processing sequence domains. In the following, an input element and an input sequence are represented by u and s(u), respectively. In particular, if s(u) is of length n, then we can show its elements by using the notation s(u)=[u(1),u(2),,u(n)], where u(1) is the oldest entry and u(n) is the most recent one. An empty input sequence is denoted by s(u)=[]. The concatenation of the sequences s(u) and s(v) is denoted by s(u)s(v). An output element and an output

Markovian factor of ESNs

For the aims of this paper, we say that a state model on sequence domains has a state space organization of a Markovian nature whenever the states assumed in correspondence of two different input sequences sharing a common suffix are close to each other proportionally to the length of the common suffix. This Markovian characterization of the state space dynamics is referred in this paper as the Markovian factor. A class of models on sequences on which the concept of Markovian factor applies is

Architectural factors of ESN design

Even though reservoirs dynamics are governed by the Markovian factor, there still are several other factors, related to the architectural design, which might influence the richness of the Markovian dynamics and thus the performance of ESNs. Indeed ESNs with the same contractive coefficient but different topologies can lead to different results on the same task. At the same time, the richness of the dynamics is related to the growth of the number of units (reservoir dimensionality). It is

Experimental results

The experiments presented in the following aim at testing the empirical effects of the factors introduced in Sections 3 Markovian factor of ESNs, 4 Architectural factors of ESN design. Firstly, in Section 5.2, we use two tasks to show the condition underlying the ESN state space organization, i.e. the Markovian assumption. Under such extreme condition we show that the Markovian factor dominates the behavior of the model and then complex architectures are even not necessary. In particular, the

Conclusions

Markovianity and high dimensionality (along with non-linearity) of the reservoir state space representation have revealed a relevant influence on the behavior and performance of the ESN model. Such factors have a complementary role and characterize distinct classes of tasks, for which we have provided representative instances. In the following the findings are detailed distinguishing the case for which Markovianity has a prominent role independent of the architectural design, and the

References (47)

  • M. Buehner et al.

    A tighter bound for the echo state property

    IEEE Transactions on Neural Networks

    (2006)
  • Butcher, J., Verstraeten, D., Schrauwen, B., Day, C., & Haycock, P. (2010). Extending reservoir computing with random...
  • Cernanský, M., & Makula, M. (2005). Feed-forward echo state networks. In Proceedings of the IEEE international joint...
  • Cernanský, M., & Tiňo, P. (2007). Comparison of echo state networks with simple recurrent networks and variable-length...
  • M. Cernanský et al.

    Predictive modeling with echo state networks

  • T. Cover

    Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition

    IEEE Transactions on Electronic Computers

    (1965)
  • L. Feldkamp et al.

    A signal processing framework based on dynamic neural networks with application to problems in adaptation, filtering, and classification

    Proceedings of the IEEE

    (1998)
  • G. Fette et al.

    Short term memory and pattern matching with simple echo state networks

  • Gallicchio, C., & Micheli, A. (2009). On the predictive effects of Markovian and architectural factors of echo state...
  • Hajnal, M., & Lorincz, A. (2006). Critical echo state networks. In Proceedings of the international conference on...
  • B. Hammer et al.

    Recurrent neural networks with small weights implement definite memory machines

    Neural Computation

    (2003)
  • J. Hertzberg et al.

    Learning to ground fact symbols in behavior-based robots

  • Ishu, K., van der Zant, T., Becanovic, V., & Ploger, P. (2004). Identification of motion with echo state network. In...
  • Cited by (164)

    View all citing articles on Scopus
    View full text