Quantitative analysis of neural networks is a critical issue to improve their performance. In this paper, we investigate a long-term time series prediction based on the echo state network operating at the edge of chaos. We also assess the eigenfunction of echo state networks and its criticality by the Hermite polynomials. A Hermite polynomial-based activation function design with fast convergence is proposed and the relation between long-term time dependence and edge-of-chaos criticality is given. A new particle swarm optimization-gravitational search algorithm is put forward to improve the parameters estimation that helps attain on the edge of chaos. The method was verified using a chaotic Lorenz system and a real health index data set. The experimental results indicate that evolution makes the reservoir great potential to run on the edge of chaos with rich expression.

The work was supported by the National Science Foundation of China (61473183, 61521063, 61627810), and National Key R&D Program of China (SQ2017YFGH001005), scientific and technological project in Henan Province(172102210255), CERNET Innovation Project(No. NGII20160517).
Appendix A: Approaching equilibrium points by Hermite polynomials
The proofs of all the theorems we quote can be found in Chapter 2 and Chapter 5 of [57] or similar textbooks. A complete normed vector space is called a Banach space. A Hilbert space is a closed subset of a Banach space whose norm is defined by the inner product. Let H be the Hilbert space from S to \(\mathbb {R}\). The following Contraction-Mapping Theorem, also known as Banach’s Fixed-Point Theorem describes the existence and uniqueness of solutions of differential equations.
Theorem 1
Let \(T:S\rightarrow S\) be a contraction mapping, then the equation Tx = x has only one solution in S, and the unique solution x can be obtained by the limit of the sequence x(n) defined by \(x(n) = Tx(n-1),n= 1,2,\dots \), expressed as:
where x0 is an arbitrary initial element in S.
The theorem not only illustrates the existence and uniqueness of solutions to differential equations, but also provides a way to find solutions by iterative processes. The following is a constructive extension of the theorem.
Lemma 1
If L is a self-adjoint operator, then there is B ≥ A > 0 satisfying
then L is invertible, have
The inequality (17) proves that its eigenvalues are between A and B. In a finite dimension, it is diagonalized on an orthonormal basis since L is self-adjoint. It is therefore invertible with eigenvalues between B− 1 and A− 1, which proves (18).
Supposing the probabilists’ weight vector \(p(x) = e^{-x^{2}/2}\), apply Lemma 1 it follows that the Hermite polynomials are orthogonal on the interval \((-\infty , \infty )\) with respect to the weight function, then we obtain the following important results,
We use the following facts about the Hermite polynomials (see Chapter 11 in [58]):
Appendix B: Examples
Next, we analyze the eigenvalues of several popular activation functions under Hermite polynomials and their effects on the convergence and criticality of neural networks.
1.1 Sigmoid activation
The Sigomoid function is \( \sigma (x)= \frac {1} {e^{-x} + 1} \). Since \(H_{0}(x) = 1, H_{1}(x) = x, \text { and } H_{2}(x) = \frac {x^{2}-1}{\sqrt {2}}\), according to the orthogonal Hermite polynomials theorem, Substitute the Sigmoid activation into (19), and get the Hermite coefficients of the first two coefficients, expressed as:
For n ≥ 3, we write \(g_{n}(x) = \frac {e^{-x^{2}/2}}{(1+e^{-x})}H_{n}(x)\), according to (21), the derivative of gn(x) is:
Since (25) is equal to 0 as \(x\to \infty \), we therefore get:
Therefore, the Hermite coefficients of Sigmoid activation is expressed as:
We can find that Sigmoid activation attenuates the values with higher magnitudes. When multiplied by many layers, the overall gradient becomes quite small.
1.2 Normalized ReLU activation
Consider the unit activation \({f}(x) = \sqrt {2}\max \limits (0, x)\). Substitute the Hermite polynomials into the (19) , get the corresponding coefficients:
For n ≥ 3, we write \(g_{n}(x) = x H_{n}(x) e^{-\frac {x^{2}}{2}}\), and its derivative is:
Since the (30) is equal to 0 as \(x\to \infty \), we therefore get:
Therefore, the Hermite coefficient of ReLU activation is expressed as:
The maximum eigenvalue is \(\frac {1}{\sqrt {2}}\), then gradually decay to the critical point 0. In particular, (a0,a1,a2, \( a_{3}, a_{4}, a_{5}, a_{6}) = (\frac {1}{\sqrt {\pi }} ,\frac {1}{\sqrt {2}}, \frac {1}{\sqrt {2\pi }}, 0, \frac {1}{\sqrt {24\pi }}, 0, \frac {1}{\sqrt {80\pi }})\). We also see that there are 0 eigenvalues with an interval of 1, which may result in the network do not pass information in saddle points rather than the expected global optimal solution, especially by the gradient descent method [59].
Zhang, G., Zhang, C. & Zhang, W. Evolutionary echo state network for long-term time series prediction: on the edge of chaos. Appl Intell 50, 893–904 (2020). https://doi.org/10.1007/s10489-019-01546-w
https://doi.org/10.1007/s10489-019-01546-w