Abstract
Quantitative analysis of neural networks is a critical issue to improve their performance. In this paper, we investigate a long-term time series prediction based on the echo state network operating at the edge of chaos. We also assess the eigenfunction of echo state networks and its criticality by the Hermite polynomials. A Hermite polynomial-based activation function design with fast convergence is proposed and the relation between long-term time dependence and edge-of-chaos criticality is given. A new particle swarm optimization-gravitational search algorithm is put forward to improve the parameters estimation that helps attain on the edge of chaos. The method was verified using a chaotic Lorenz system and a real health index data set. The experimental results indicate that evolution makes the reservoir great potential to run on the edge of chaos with rich expression.






Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Quan H, Srinivasan D, Khosravi A (2017) Short-term load and wind power forecasting using neural network-based prediction intervals. IEEE Trans Neural Netw Learn Syst 25(2):303–315
Qin M, Du Z, Du Z (2017) Red tide time series forecasting by combining arima and deep belief network. Knowl-Based Syst 125:39–52
Abaszade M, Effati S (2018) Stochastic support vector regression with probabilistic constraints. Appl Intell 48(1):243–256
Williams RJ, Zipser D (2014) A learning algorithm for continually running fully recurrent neural networks. Neural Comput 1(2):270–280
Siegelmann HT, Sontag ED (1995) On the computational power of neural nets. J Comput Syst Sci 50 (1):440–449
Graves A (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Cho K, van Merrienboer B, Bahdanau D, Bengio Y (2014) On the properties of neural machine translation: Encoder-decoder approaches. In: Proceedings of Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, pp 103– 111
Jaeger H, Haas H (2004) Harnessing nonlinearity: Predicting chaotic systems and saving energy in wireless communication. Science 304(5667):78–80
Lai G, Chang W, Yang Y, Liu H (2018) Modeling long- and short-term temporal patterns with deep neural networks. In: The 41st international ACM SIGIR conference on research & development in information retrieval, SIGIR 2018, pp 95–104
Langton CG (1990) Computation at the edge of chaos: Phase transitions and emergent computation. Physica D: Nonlinear Phenom 42(1–3):12–37
Trillos NG, Murray R (2016) A new analytical approach to consistency and overfitting in regularized empirical risk minimization. Eur J Appl Math 28(6):36
Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on Learning Representations, ICLR
Cinar YG, Mirisaee H, Goswami P, Gaussier E, Aït-Bachir A, Strijov V (2017) Position-based content attention for time series forecasting with sequence-to-sequence rnns. In: International Conference on Neural Information Processing. Springer, pp 533–544
Liang Y, Ke S, Zhang J, Yi X, Zheng Y (2018) Geoman: Multi-level attention networks for geo-sensory time series prediction. In: Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, pp 3428–3434
Liu T, Yu S, Xu B, Yin H (2018) Recurrent networks with attention and convolutional networks for sentence representation and classification. Appl Intell 48(10):3797–3806
Yi S, Guo J, Xin L, Kong Q, Guo L, Wang L (2018) Long-term prediction of polar motion using a combined ssa and arma model. J Geodesy 92(3):333–343
Dai C, Pi D (2017) Parameter auto-selection for hemispherical resonator gyroscope’s long-term prediction model based on cooperative game theory. Knowl-Based Syst 134:105– 115
Cannon DM, Goldberg SR (2015) Simple rules for thriving in a complex world, and irrational things like missing socks, pickup lines, and other essential puzzles. J Corporate Account Finance 26(6):97–99
Benmessahel I, Xie K, Chellal M (2018) A new evolutionary neural networks based on intrusion detection systems using multiverse optimization. Appl Intell 48(8):2315–2327
Poole B, Lahiri S, Raghu M, Sohl-Dickstein J, Ganguli S (2016) Exponential expressivity in deep neural networks through transient chaos. In: Advances in neural information processing systems 29: Annual conference on neural information processing systems, neural information processing systems foundation, Barcelona, Spain, pp 3368–3376
Valdez MA, Jaschke D, Vargas DL, Carr LD (2017) Quantifying complexity in quantum phase transitions via mutual information complex networks. Phys Rev Lett 119(22):225301
Raghu M, Poole B, Kleinberg JM, Ganguli S, Sohl-Dickstein J (2017) On the expressive power of deep neural networks. In: Proceedings of the 34th International Conference on Machine Learning, pp 2847–2854
Mafahim JU, Lambert D, Zare M, Grigolini P (2015) Complexity matching in neural networks. New J Phys 17(1):1–18
Azizipour M, Afshar MH (2018) Reliability-based operation of reservoirs: a hybrid genetic algorithm and cellular automata method. Soft Comput 22(19):6461–6471
Erkaymaz O, Ozer M, Perc M (2017) Performance of small-world feedforward neural networks for the diagnosis of diabetes. Appl Math Comput 311:22–28
Wang SX, Li M, Zhao L, Jin C (2019) Short-term wind power prediction based on improved small-world neural network. Neural Computing and Applications 31(7):3173–3185
Semwal VB, Gaud N, Nandi G (2019) Human gait state prediction using cellular automata and classification using elm. In: Machine Intelligence and Signal Analysis, Springer, pp 135– 145
Kossio FYK, Goedeke S, Akker BVD, Ibarz B, Memmesheimer RM (2018) Growing critical: Self-organized criticality in a developing neural system, vol 121
Hazan H, Saunders DJ, Sanghavi DT, Siegelmann HT, Kozma R (2018) Unsupervised learning with self-organizing spiking neural networks. In: 2018 International Joint Conference on Neural Networks, IJCNN, pp 1–6
Choromanska A, Henaff M, Mathieu M, Arous GB, LeCun Y (2015) The loss surfaces of multilayer networks In: Proceedings of the Eighteenth International Conference on Artificial Intelligence and Statistics
SH L LW (2018) Neural network renormalization group. Phys Rev Lett 121(26):260601
Deng DL, Li X, Sarma SD (2017) Quantum entanglement in neural network states. Physrevx 7(2):021021
Iso S, Shiba S, Yokoo S (2018) Scale-invariant feature extraction of neural network and renormalization group flow. Physical review E 97(5-1)
Yang G, Schoenholz S (2017a) Mean field residual networks: on the edge of chaos. In: Advances in Neural Information Processing Systems, pp 7103–7114
Yang G, Schoenholz SS (2017b) Mean field residual networks: On the edge of chaos. In: Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, pp 2865–2873
Kawamoto T, Tsubaki M, Obuchi T (2018) Mean-field theory of graph neural networks in graph partitioning, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS, pp 4366– 4376
Carleo G, Troyer M (2016) Solving the quantum many-body problem with artificial neural networks. Science 355(6325):602–606
Kochjanusz M, Ringel Z (2017) Mutual information, neural networks and the renormalization group. Nat Phys 14(6):578–582
Efthymiou S, Beach MJS, Melko RG (2019) Super-resolving the ising model with convolutional neural networks. Phys Rev B 99:075113
Zhang H, Wang Z, Liu D (2014) A comprehensive review of stability analysis of continuous-time recurrent neural networks. IEEE Trans Neural Netw Learn Syst 25(7):1229–1262
Elfwing S, Uchibe E, Doya K (2018) Sigmoid-weighted linear units for neural network function approximation in reinforcement learning. Neural Netw 107:3–11
Njikam ANS, Zhao H (2016) A novel activation function for multilayer feed-forward neural networks. Appl Intell 45(1):75–82
Halmos PR (2012) A Hilbert Space Problem Book, vol 19. Springer Science & Business Media
Petersen A, Muller HG (2016) Functional data analysis for density functions by transformation to a hilbert space. Ann Stat 44(1):183–218
Chen M, Pennington J, Schoenholz SS (2018) Dynamical isometry and a mean field theory of rnns: Gating enables signal propagation in recurrent neural networks. In: Proceedings of the 35th International Conference on Machine Learning, ICML, pp 872– 881
Gupta C, Jain A, Tayal DK, Castillo O (2018) Clusfude: Forecasting low dimensional numerical data using an improved method based on automatic clustering, fuzzy relationships and differential evolution. Eng Appl of AI 71:175–189
Bianchi FM, Livi L, Alippi C (2018) Investigating echo-state networks dynamics by means of recurrence analysis. IEEE Trans Neural Netw Learn Syst 29(2):427–439
Mocanu DC, Mocanu E, Stone P, Nguyen PH, Gibescu M, Liotta A (2018) Scalable training of artificial neural networks with adaptive sparse connectivity inspired by network science. Nat Commun 9(1):2383
Stanley KO, Clune J, Lehman J, Miikkulainen R (2019) Designing neural networks through neuroevolution. Nat Mach Intell 1(1):24–35
Valdez F, Vȧzquez J C, Melin P, Castillo O (2017) Comparative study of the use of fuzzy logic in improving particle swarm optimization variants for mathematical functions using co-evolution. Appl Soft Comput 52:1070–1083
Soto J, Melin P, Castillo O (2018) A new approach for time series prediction using ensembles of IT2FNN models with optimization of fuzzy integrators. Int J Fuzzy Syst 20(3):701– 728
Radosavljevi J (2016) A solution to the combined economic and emission dispatch using hybrid psogsa algorithm. Appl Artif Intell 30(5):445–474
Olivas F, Valdez F, Melin P, Sombra A, Castillo O (2019) Interval type-2 fuzzy logic for dynamic parameter adaptation in a modified gravitational search algorithm. Inf Sci 476:159– 175
Beilock SL, DeCaro MS (2007) From poor performance to success under stress: Working memory, strategy selection, and mathematical problem solving under pressure. J Exper Psychol Learn Memory Cogn 33(6):983
Mantegna RN, Stanley HE (1994) Stochastic process with ultraslow convergence to a gaussian: The truncated lévy flight. Phys Rev Lett 73(22):2946
Yang G, Pennington J, Rao V, Sohl-Dickstein J, Schoenholz SS (2019) A mean field theory of batch normalization. In: International Conference on Learning Representations
Kreyszig E (1978) Introductory Functional Analysis with Applications. Wiley, New York
ODonnell R (2013) Analysis of Boolean Functions, vol 9781107038325. Cambridge University Press, Cambridge
Nazemi A, Mortezaee M (2019) A new gradient-based neural dynamic framework for solving constrained min-max optimization problems with an application in portfolio selection models. Applied Intelligence 49(2):396–419
Acknowledgment
The work was supported by the National Science Foundation of China (61473183, 61521063, 61627810), and National Key R&D Program of China (SQ2017YFGH001005), scientific and technological project in Henan Province(172102210255), CERNET Innovation Project(No. NGII20160517).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Approaching equilibrium points by Hermite polynomials
The proofs of all the theorems we quote can be found in Chapter 2 and Chapter 5 of [57] or similar textbooks. A complete normed vector space is called a Banach space. A Hilbert space is a closed subset of a Banach space whose norm is defined by the inner product. Let H be the Hilbert space from S to \(\mathbb {R}\). The following Contraction-Mapping Theorem, also known as Banach’s Fixed-Point Theorem describes the existence and uniqueness of solutions of differential equations.
Theorem 1
Let \(T:S\rightarrow S\) be a contraction mapping, then the equation Tx = x has only one solution in S, and the unique solution x can be obtained by the limit of the sequence x(n) defined by \(x(n) = Tx(n-1),n= 1,2,\dots \), expressed as:
where x0 is an arbitrary initial element in S.
The theorem not only illustrates the existence and uniqueness of solutions to differential equations, but also provides a way to find solutions by iterative processes. The following is a constructive extension of the theorem.
Lemma 1
If L is a self-adjoint operator, then there is B ≥ A > 0 satisfying
then L is invertible, have
The inequality (17) proves that its eigenvalues are between A and B. In a finite dimension, it is diagonalized on an orthonormal basis since L is self-adjoint. It is therefore invertible with eigenvalues between B− 1 and A− 1, which proves (18).
Supposing the probabilists’ weight vector \(p(x) = e^{-x^{2}/2}\), apply Lemma 1 it follows that the Hermite polynomials are orthogonal on the interval \((-\infty , \infty )\) with respect to the weight function, then we obtain the following important results,
We use the following facts about the Hermite polynomials (see Chapter 11 in [58]):
Appendix B: Examples
Next, we analyze the eigenvalues of several popular activation functions under Hermite polynomials and their effects on the convergence and criticality of neural networks.
1.1 Sigmoid activation
The Sigomoid function is \( \sigma (x)= \frac {1} {e^{-x} + 1} \). Since \(H_{0}(x) = 1, H_{1}(x) = x, \text { and } H_{2}(x) = \frac {x^{2}-1}{\sqrt {2}}\), according to the orthogonal Hermite polynomials theorem, Substitute the Sigmoid activation into (19), and get the Hermite coefficients of the first two coefficients, expressed as:
For n ≥ 3, we write \(g_{n}(x) = \frac {e^{-x^{2}/2}}{(1+e^{-x})}H_{n}(x)\), according to (21), the derivative of gn(x) is:
Since (25) is equal to 0 as \(x\to \infty \), we therefore get:
Therefore, the Hermite coefficients of Sigmoid activation is expressed as:
We can find that Sigmoid activation attenuates the values with higher magnitudes. When multiplied by many layers, the overall gradient becomes quite small.
1.2 Normalized ReLU activation
Consider the unit activation \({f}(x) = \sqrt {2}\max \limits (0, x)\). Substitute the Hermite polynomials into the (19) , get the corresponding coefficients:
For n ≥ 3, we write \(g_{n}(x) = x H_{n}(x) e^{-\frac {x^{2}}{2}}\), and its derivative is:
Since the (30) is equal to 0 as \(x\to \infty \), we therefore get:
Therefore, the Hermite coefficient of ReLU activation is expressed as:
The maximum eigenvalue is \(\frac {1}{\sqrt {2}}\), then gradually decay to the critical point 0. In particular, (a0,a1,a2, \( a_{3}, a_{4}, a_{5}, a_{6}) = (\frac {1}{\sqrt {\pi }} ,\frac {1}{\sqrt {2}}, \frac {1}{\sqrt {2\pi }}, 0, \frac {1}{\sqrt {24\pi }}, 0, \frac {1}{\sqrt {80\pi }})\). We also see that there are 0 eigenvalues with an interval of 1, which may result in the network do not pass information in saddle points rather than the expected global optimal solution, especially by the gradient descent method [59].
Rights and permissions
About this article
Cite this article
Zhang, G., Zhang, C. & Zhang, W. Evolutionary echo state network for long-term time series prediction: on the edge of chaos. Appl Intell 50, 893–904 (2020). https://doi.org/10.1007/s10489-019-01546-w
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-019-01546-w