Skip to main content
Log in

DBN based SD-ARX model for nonlinear time series prediction and analysis

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

One of the main purposes of nonlinear system modeling is to design model-based controllers such as model predictive control (MPC). A group of deep belief networks (DBNs) are used to approximate the function type coefficients of a state dependent autoregressive model with exogenous variables (SD-ARX), which can represent nonlinear dynamics, and thus a DBN-based state-dependent ARX (DBN-ARX) model is obtained in this paper. The DBN-ARX model has the function approximation ability of single DBN model and the nonlinear description advantage of SD-ARX model. All parameters of the DBN-ARX model are estimated by the pre-training and fine-tuning strategies and the stability condition of the model are also discussed. The proposed DBN-ARX model is a pseudo-linear ARX model identified offline, and its function type coefficients are composed of the operating-point dependent DBNs. The usefulness of the DBN-ARX model is illustrated by modeling a continuously stirred tank reactor (CSTR) time series, Box and Jenkins data, a nonlinear process and a water tank system. The four experimental results show that the one-step-ahead and multi-step-ahead prediction accuracy of the proposed DBN-ARX model is improved comparing with the modeling results of several existing models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

References

  1. Gu B, Gupta YP (2008) Control of nonlinear processes by using linear model predictive control algorithms. ISA Transactions 47(2):211–216

    Article  Google Scholar 

  2. Huusom JK, Poulsen NK, Rgensen SB, Rgensen JB (2012) Tuning siso offset-free model predictive control based on ARX models. Journal of Process Control 22(10):1997–2007

    Article  Google Scholar 

  3. Novák, J., Chalupa, P., & Bobál, V. (2011). MIMO model predictive control with local linear models. Wseas International Conference on Automatic Control

  4. Hosen MA, Hussain MA, Mjalli FS (2011) Control of polystyrene batch reactors using neural network based model predictive control (NNMPC): an experimental investigation. Control Engineering Practice 19(5):454–467

    Article  Google Scholar 

  5. Ekman M (2008) Bilinear black-box identification and MPC of the activated sludge process. Journal of Process Control 18(7–8):643–653

    Article  Google Scholar 

  6. Dongli W, Yan Z, Xiaoyang H (2007) Radial basis function neural network-based model predictive control for freeway traffic systems. International Journal of Intelligent Systems Technologies and Applications 2(4):370

    Article  Google Scholar 

  7. Aggelogiannaki E, Sarimveis H (2008) Nonlinear model predictive control for distributed parameter systems using data driven artificial neural network models. Computers and Chemical Engineering 32(6):1225–1237

    Article  Google Scholar 

  8. Han HG, Zhang L, Hou Y, Qiao JF (2015) Nonlinear model predictive control based on a self-organizing recurrent neural network. IEEE Transactions on Neural Networks and Learning Systems 27(2):402

    Article  MathSciNet  Google Scholar 

  9. Rezaee B, Zarandi MHF (2010) Data-driven fuzzy modeling for takagi–sugeno–kang fuzzy system. Information Sciences 180(2):241–255

    Article  Google Scholar 

  10. Box GEP, Jenkins GM (1971) Time series analysis, forecasting and control. Journal of the American Statistical Association 134(3)

  11. Gaweda AE, Zurada JM (2003) Data-driven linguistic modeling using relational fuzzy rules. Fuzzy Systems IEEE Transactions on 11(1):121–134

    Article  Google Scholar 

  12. Kim E, Park M, Ji S, Park M (1997) A new approach to fuzzy modeling. IEEE Trans Fuzzy Syst 5(3):328–337

    Article  Google Scholar 

  13. Kim E, Park M, Kim S, Park M (1998) A transformed input-domain approach to fuzzy modeling. Fuzzy Systems IEEE Transactions on 6(4):596–604

    Article  Google Scholar 

  14. Sugeno M, Tanaka K (1991) Successive identification of a fuzzy model and its applications to prediction of a complex system. Fuzzy Sets & Systems 42(3):315–334

    Article  MathSciNet  Google Scholar 

  15. Wang H, Kwong S, Jin Y, Wei W, Man KF (2005) Agent-based evolutionary approach for interpretable rule-based knowledge extraction. IEEE Transactions on Systems Man & Cybernetics Part C 35(2):143–155

    Article  Google Scholar 

  16. Zhao J, Mo H, Deng Y (2020) An efficient network method for time series forecasting based on the DC algorithm and visibility relation. IEEE Access 8(1). https://doi.org/10.1109/ACCESS.2020.2964067

  17. Liu F, Deng Y (2019) A fast algorithm for network forecasting time series. IEEE Access 7(1):102554–102560

    Article  Google Scholar 

  18. Xu P, Zhang R, Deng Y (2018) A novel visibility graph transformation of time series into weighted networks. Chaos, Solitons & Fractals 117:201–208

    Article  MathSciNet  Google Scholar 

  19. Maciel L, Ballini R, Gomide F (2016) Evolving granular analytics for interval time series forecasting. Granular Computing 1(4):213–224

    Article  Google Scholar 

  20. Zhang J, Chin KS, Lawryńczuk M (2018) Nonlinear model predictive control based on piecewise linear hammerstein models. Nonlinear Dynamics 92:1001–1021

    Article  Google Scholar 

  21. Korda M, Mezić I (2016) Linear predictors for nonlinear dynamical systems: koopman operator meets model predictive control. Automatica 93

  22. Ania LC, Agamennoni OE, Figueroa JL (2003) A nonlinear model predictive control system based on Wiener piecewise linear models. Journal of Process Control 13(7):655–666

    Article  Google Scholar 

  23. Zhao M, Li N, Li SY (2009) Min-Max model predictive control for constrained nonlinear systems via multiple LPV embeddings. Science in China Series F (Information Science) 52(7):1129–1135

    Article  MathSciNet  Google Scholar 

  24. Ding B, Tao Z (2014) A synthesis approach for output feedback robust model predictive control based-on input–output model. Journal of Process Control 24(3):60–72

    Article  Google Scholar 

  25. Xu W, Peng H, Zeng X, Zhou F, Tian X, Peng X (2019) Deep belief network-based AR model for nonlinear time series forecasting. Applied Soft Computing 77:605–621

    Article  Google Scholar 

  26. Xu W, Peng H, Zeng X, Zhou F, Tian X, Peng X (2019) A hybrid modelling method for time series forecasting based on a linear regression model and deep learning. Applied Intelligence 49(8):3002–3015

    Article  Google Scholar 

  27. Li W, Szidarovszky F (1999) An elementary result in the stability theory of time-invariant nonlinear discrete dynamical systems. Applied Mathematics and Computation 102:35–49

    Article  MathSciNet  Google Scholar 

  28. Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science. 313(5786):504–507

    Article  MathSciNet  Google Scholar 

  29. Cervantes AL, Agamennoni OE, Figueroa JL (2003) A nonlinear model predictive control system based on Wiener piecewise linear models. Journal of Process Control 13:655–666

    Article  Google Scholar 

  30. Man, H. (2014). Research on model identification and nonlinear predictive control methods of CSTR process, Ph.D. dissertation, Univ. Dalian University of Technology, Dalian, China

  31. Gan M, Li HX, Peng H (2017) A variable projection approach for efficient estimation of RBF-ARX model. IEEE Transactions on Cybernetics 45(3):462–471

    Article  Google Scholar 

Download references

Acknowledgments

The authors would like to thank the editors and referees for their valuable comments and suggestions, which substantially improved the original manuscript. The work presented in this paper was supported by the National Natural Science Foundation of China (Grant No. 61773402, Grant No. 51575167 and Grant No. 61540037) and the Anhui Provincial Natural Science Foundation(2008085MF197).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hui Peng.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Fine tuning process of the proposed DBN-ARX model

Fine tuning process of the proposed DBN-ARX model

Eq. (13) is used to calculate the objective function in the fine-tuning stage, and the objective function is defined as follows

$$ E(t)=\frac{1}{2}{\xi}^2(t)=\frac{1}{2}{\left(y(t)-\hat{y}(t)\right)}^2=\frac{1}{2}{\left(y(t)-{\hat{\phi}}_0\left(\varGamma \left(t-1\right)\right)-\sum \limits_{m=1}^{n_y}{\hat{\phi}}_{y,m}\left(\varGamma \left(t-1\right)\right)y\left(t-m\right)-\sum \limits_{n=1}^{n_u}{\hat{\phi}}_{u,n}\left(\varGamma \left(t-1\right)\right)u\left(t-n\right)\right)}^2,t={n}_y+1,{n}_y+2,\cdots, N $$
(22)

where y(t) and \( \hat{y}(t) \) are actual and predicted values, respectively.

The traditional gradient decent method is used to fine tune the parameters of the DBN-ARX model. For the neuron in the \( {N}_r^{(j)} \)-th layer, Eq. (14) is used to compute the gradient for parameter updating.

$$ \frac{\partial E(t)}{\partial {w}_{1,{n}_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}\right)}}=\frac{\partial E(t)}{\partial \xi (t)}\frac{\partial \xi (t)}{\partial \hat{y}(t)}\frac{\partial \hat{y}(t)}{\partial {\hat{\phi}}_j(t)}\frac{\partial {\hat{\phi}}_j(t)}{\partial {u}_{1,j}^{\left({N}_r^{(j)}\right)}(t)}\frac{\partial {u}_{1,j}^{\left({N}_r^{(j)}\right)}(t)}{\partial {w}_{1,{n}_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}\right)}}\kern4em =-\xi (t)a\left(t-j\right){\varphi}^{\prime}\left({u}_{1,j}^{\left({N}_r^{(j)}\right)}(t)\right){h}_{n_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}-1\right)}(t)={\varphi}^{\prime}\left({u}_{1,j}^{\left({N}_r^{(j)}\right)}(t)\right)c\left(t-j\right){h}_{n_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}-1\right)}(t) $$
(23)

where

$$ {\displaystyle \begin{array}{c}{\hat{\phi}}_0(t)={\hat{\phi}}_0\left(\varGamma \left(t-1\right)\right);\\ {}{\hat{\phi}}_1(t)={\hat{\phi}}_{y,1}\left(\varGamma \left(t-1\right)\right);\\ {}\vdots \\ {}{\hat{\phi}}_{n_y}(t)={\hat{\phi}}_{y,{n}_y}\left(\varGamma \left(t-1\right)\right);\\ {}{\hat{\phi}}_{n_y+1}(t)={\hat{\phi}}_{u,1}\left(\varGamma \left(t-1\right)\right);\\ {}\vdots \\ {}{\hat{\phi}}_{n_y+{n}_u}(t)={\hat{\phi}}_{u,{n}_u}\left(\varGamma \left(t-1\right)\right);\end{array}} $$
$$ a\left(t-j\right)=y\left(t-j\right),j=1,2,\cdots, {n}_y+{n}_u;a(t)=1 $$
$$ c\left(t-j\right)=-\xi (t)a\left(t-j\right),j=0,1,2,\cdots, \left({n}_y+{n}_u\right) $$

and φ(u) is the derivative of φ(u) with respect to u. Let

$$ {\delta}_{1,j}^{\left({N}_r^{(j)}\right)}(t)={\varphi}^{\prime}\left({u}_{1,j}^{\left({N}_r^{(j)}\right)}\right)c\left(t-j\right) $$
(24)

According to the chain rule of calculus, the local gradient of the j ‐ th DBN module in the last layer can be redefined by

$$ \frac{\partial E(t)}{\partial {w}_{1,{n}_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}\right)}}={\delta}_{1,j}^{\left({N}_r^{(j)}\right)}(t){h}_{n_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}-1\right)}(t) $$
(25)

Similarly, we have

$$ \frac{\partial E(t)}{\partial {b}_{1,j}^{\left({N}_r^{(j)}\right)}}=\frac{\partial E(t)}{\partial \xi (t)}\frac{\partial \xi (t)}{\partial \hat{y}(t)}\frac{\partial \hat{y}(t)}{\partial {\hat{\phi}}_j(t)}\frac{\partial {\hat{\phi}}_j(t)}{\partial {u}_{1,j}^{\left({N}_r^{(j)}\right)}(t)}\frac{\partial {u}_{1,j}^{\left({N}_r^{(j)}\right)}(t)}{\partial {b}_{1,j}^{\left({N}_r^{(j)}\right)}}\kern2.75em =-\xi (t)a\left(t-j\right){\varphi}^{\prime}\left({u}_{1,j}^{\left({N}_r^{(j)}\right)}(t)\right)={\varphi}^{\prime}\left({u}_{1,j}^{\left({N}_r^{(j)}\right)}\right)c\left(t-j\right)={\delta}_{1,j}^{\left({N}_r^{(j)}\right)}(t) $$
(26)

With regard to neuron \( {n}_{N_r^{(j)}-1}^{(j)}\in \left\{1,2,\cdots, {Q}_{N_r^{(j)}-1}^{(j)}\right\} \) in the (\( {N}_r^{(j)}-1 \))-th hidden layer, Eq. (27) can be used to compute the gradient for parameter updating.

$$ \frac{\partial E(t)}{\partial {w}_{n_{N_r^{(j)}-1}^{(j)},{n}_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-1\right)}}=\frac{\partial E(t)}{\partial \xi (t)}\frac{\partial \xi (t)}{\partial \hat{y}(t)}\frac{\partial \hat{y}(t)}{\partial {\hat{\phi}}_j(t)}\frac{\partial {\hat{\phi}}_j(t)}{\partial {u}_{1,j}^{\left({N}_r^{(j)}\right)}(t)}\frac{\partial {u}_{1,j}^{\left({N}_r^{(j)}\right)}(t)}{\partial {h}_{n_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}-1\right)}(t)}\frac{\partial {h}_{n_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}-1\right)}(t)}{\partial {u}_{n_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}-1\right)}(t)}\frac{\partial {u}_{n_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}-1\right)}(t)}{\partial {w}_{n_{N_r^{(j)}-1}^{(j)},{n}_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-1\right)}}=-\xi (t)a\left(t-j\right){\varphi}^{\prime}\left({u}_{1,j}^{\left({N}_r^{(j)}\right)}(t)\right){w}_{1,{n}_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}\right)}{\varphi}^{\prime}\left({u}_{n_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}-1\right)}(t)\right){h}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}(t)={\varphi}^{\prime}\left({u}_{n_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}-1\right)}(t)\right){w}_{1,{n}_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}\right)}{\varphi}^{\prime}\left({u}_{1,j}^{\left({N}_r^{(j)}\right)}(t)\right)c\left(t-j\right){h}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}(t)={\varphi}^{\prime}\left({u}_{n_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}-1\right)}(t)\right){w}_{1,{n}_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}\right)}{\delta}_{1,j}^{\left({N}_r^{(j)}\right)}(t){h}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}(t) $$
(27)

Let

$$ {\delta}_{n_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}-1\right)}(t)={\varphi}^{\prime}\left({u}_{n_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}-1\right)}(t)\right){w}_{1,{n}_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}\right)}{\delta}_{1,j}^{\left({N}_r^{(j)}\right)}(t) $$
(28)

then Eq. (27) becomes

$$ \frac{\partial E(t)}{\partial {w}_{n_{N_r^{(j)}-1}^{(j)},{n}_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-1\right)}}={\delta}_{n_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}-1\right)}(t){h}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}(t) $$
(29)

and

$$ {\displaystyle \begin{array}{c}\frac{\partial E(t)}{\partial {b}_{n_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}-1\right)}}=\frac{\partial E(t)}{\partial \xi (t)}\frac{\partial \xi (t)}{\partial \hat{y}(t)}\frac{\partial \hat{y}(t)}{\partial {\hat{\phi}}_j(t)}\frac{\partial {\hat{\phi}}_j(t)}{\partial {u}_{1,j}^{\left({N}_r^{(j)}\right)}(t)}\frac{\partial {u}_{1,j}^{\left({N}_r^{(j)}\right)}(t)}{\partial {h}_{n_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}-1\right)}(t)}\frac{\partial {h}_{n_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}-1\right)}(t)}{\partial {u}_{n_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}-1\right)}(t)}\frac{\partial {u}_{n_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}-1\right)}(t)}{\partial {b}_{n_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}-1\right)}}\\ {}=-\xi (t)a\left(t-j\right){\varphi}^{\prime}\left({u}_{1,j}^{\left({N}_r^{(j)}\right)}(t)\right){w}_{1,{n}_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}\right)}{\varphi}^{\prime}\left({u}_{n_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}-1\right)}(t)\right)\\ {}={\varphi}^{\prime}\left({u}_{n_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}-1\right)}(t)\right){w}_{1,{n}_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}\right)}{\varphi}^{\prime}\left({u}_{1,j}^{\left({N}_r^{(j)}\right)}(t)\right)c\left(t-j\right)\\ {}={\delta}_{n_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}-1\right)}(t)\end{array}} $$
(30)

With regard to neuron \( {n}_{N_r^{(j)}-2}^{(j)}\in \left\{1,2,\cdots, {Q}_{N_r^{(j)}-2}^{(j)}\right\} \) in the (\( {N}_r^{(j)}-2 \))-th hidden layer, Eq. (31) can be used to compute the gradient for parameter updating. Eq. (31) is given as follows.

$$ \frac{\partial E(t)}{\partial {w}_{n_{N_r^{(j)}-2}^{(j)},{n}_{N_r^{(j)}-3}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}}=\frac{\partial E(t)}{\partial \xi (t)}\frac{\partial \xi (t)}{\partial \hat{y}(t)}\frac{\partial \hat{y}(t)}{\partial {\hat{\phi}}_j(t)}\left(\frac{\partial {\hat{\phi}}_j(t)}{\partial {h}_{1,j}^{\left({N}_r^{(j)}-1\right)}(t)},\frac{\partial {h}_{1,j}^{\left({N}_r^{(j)}-1\right)}(t)}{\partial {h}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}(t)},\frac{\partial {h}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}(t)}{\partial {u}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}(t)},\frac{\partial {u}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}(t)}{\partial {w}_{n_{N_r^{(j)}-2}^{(j)},{n}_{N_r^{(j)}-3}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}},+,\frac{\partial {\hat{\phi}}_j(t)}{\partial {h}_{2,j}^{\left({N}_r^{(j)}-1\right)}(t)},\frac{\partial {h}_{2,j}^{\left({N}_r^{(j)}-1\right)}(t)}{\partial {h}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}(t)},\frac{\partial {h}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}(t)}{\partial {u}_{n_{N_r^{(j)}-2}^{(j)},j}^{N_r^j-2}(t)},\frac{\partial {u}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}(t)}{\partial {w}_{n_{N_r^{(j)}-2}^{(j)},{n}_{N_r^{(j)}-3}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}},+,\cdots, +,\frac{\partial {\hat{\phi}}_j(t)}{\partial {h}_{Q_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}-1\right)}(t)},\frac{\partial {h}_{Q_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}-1\right)}(t)}{\partial {h}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}(t)},\frac{\partial {h}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}(t)}{\partial {u}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}(t)},\frac{\partial {u}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}(t)}{\partial {w}_{n_{N_r^{(j)}-2}^{(j)},{n}_{N_r^{(j)}-3}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}}\right)=-\xi (t)a\left(t-j\right)\left({\varphi}^{\prime },\left({u}_{1,j}^{\left({N}_r^{(j)}\right)}\right),{w}_{1,1,j}^{\left({N}_r^{(j)}\right)},{\varphi}^{\prime },\left({u}_{1,j}^{\left({N}_r^{(j)}-1\right)}\right),{w}_{1,{n}_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-1\right)},{\varphi}^{\prime },\left({u}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}\right),{h}_{n_{N_r^{(j)}-3}^{(j)},j}^{\left({N}_r^{(j)}-3\right)},(t),+,{\varphi}^{\prime },\left({u}_{1,j}^{\left({N}_r^{(j)}\right)}\right),{w}_{1,2,j}^{\left({N}_r^{(j)}\right)},{\varphi}^{\prime },\left({u}_{2,j}^{\left({N}_r^{(j)}-1\right)}\right),{w}_{2,{n}_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-1\right)},{\varphi}^{\prime },\left({u}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}\right),{h}_{n_{N_r^{(j)}-3}^{(j)},j}^{\left({N}_r^{(j)}-3\right)},(t),+,\cdots, +,{\varphi}^{\prime },\left({u}_{1,j}^{\left({N}_r^{(j)}\right)}\right),{w}_{1,{Q}_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}\right)},{\varphi}^{\prime },\left({u}_{Q_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}-1\right)}\right),{w}_{Q_{N_r^{(j)}-1}^{(j)},{n}_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-1\right)},{\varphi}^{\prime },\left({u}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}\right),{h}_{n_{N_r^{(j)}-3}^{(j)},j}^{\left({N}_r^{(j)}-3\right)},(t)\right)=\sum \limits_{v=1}^{Q_{N_r^{(j)}-1}^{(j)}}{\varphi}^{\prime}\left({u}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}\right){w}_{v,{n}_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-1\right)}{\varphi}^{\prime}\left({u}_{v,j}^{\left({N}_r^{(j)}-1\right)}\right){w}_{1,v,j}^{\left({N}_r^{(j)}\right)}{\varphi}^{\prime}\left({u}_{1,j}^{\left({N}_r^{(j)}\right)}\right)c\left(t-j\right){h}_{n_{N_r^{(j)}-3}^{(j)},j}^{\left({N}_r^{(j)}-3\right)}(t)=\sum \limits_{v=1}^{Q_{N_r^{(j)}-1}^{(j)}}{\varphi}^{\prime}\left({u}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}\right){w}_{v,{n}_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-1\right)}{\delta}_{v,j}^{\left({N}_r^{(j)}-1\right)}(t){h}_{n_{N_r^{(j)}-3}^{(j)},j}^{\left({N}_r^{(j)}-3\right)}(t) $$
(31)

Let

$$ {\delta}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}(t)=\sum \limits_{v=1}^{Q_{N_r^{(j)}-1}^{(j)}}{\varphi}^{\prime}\left({u}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}\right){w}_{v,{n}_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-1\right)}{\delta}_{v,j}^{\left({N}_r^{(j)}-1\right)}(t) $$
(32)

then formula (31) becomes

$$ \frac{\partial E(t)}{\partial {w}_{n_{N_r^{(j)}-2}^{(j)},{n}_{N_r^{(j)}-3}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}}={\delta}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}(t){h}_{n_{N_r^{(j)}-3}^{(j)},j}^{\left({N}_r^{(j)}-3\right)}(t) $$
(33)

And similarly, we have

$$ \frac{\partial E(t)}{\partial {b}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}}=\frac{\partial E(t)}{\partial \xi (t)}\frac{\partial \xi (t)}{\partial \hat{y}(t)}\frac{\partial \hat{y}(t)}{\partial {\hat{\phi}}_j(t)}\left(\frac{\partial {\hat{\phi}}_j(t)}{\partial {h}_{1,j}^{\left({N}_r^{(j)}-1\right)}(t)},\frac{\partial {h}_{1,j}^{\left({N}_r^{(j)}-1\right)}(t)}{\partial {h}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}(t)},\frac{\partial {h}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}(t)}{\partial {u}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}(t)},\frac{\partial {u}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}(t)}{\partial {b}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}},+,\frac{\partial {\hat{\phi}}_j(t)}{\partial {h}_{2,j}^{\left({N}_r^{(j)}-1\right)}(t)},\frac{\partial {h}_{2,j}^{\left({N}_r^{(j)}-1\right)}(t)}{\partial {h}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}(t)},\frac{\partial {h}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}(t)}{\partial {u}_{n_{N_r^{(j)}-2}^{(j)},j}^{N_r^j-2}(t)},\frac{\partial {u}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}(t)}{\partial {b}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}},+,\cdots, +,\frac{\partial {\hat{\phi}}_j(t)}{\partial {h}_{Q_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}-1\right)}(t)},\frac{\partial {h}_{Q_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}-1\right)}(t)}{\partial {h}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}(t)},\frac{\partial {h}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}(t)}{\partial {u}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}(t)},\frac{\partial {u}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}(t)}{\partial {b}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}}\right)=-\xi (t)a\left(t-j\right)\left({\varphi}^{\prime },\left({u}_{1,j}^{\left({N}_r^{(j)}\right)}\right),{w}_{1,1,j}^{\left({N}_r^{(j)}\right)},{\varphi}^{\prime },\left({u}_{1,j}^{\left({N}_r^{(j)}-1\right)}\right),{w}_{1,{n}_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-1\right)},{\varphi}^{\prime },\left({u}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}\right),+,{\varphi}^{\prime },\left({u}_{1,j}^{\left({N}_r^{(j)}\right)}\right),{w}_{1,2,j}^{\left({N}_r^{(j)}\right)},{\varphi}^{\prime },\left({u}_{2,j}^{\left({N}_r^{(j)}-1\right)}\right),{w}_{2,{n}_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-1\right)},{\varphi}^{\prime },\left({u}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}\right),+,\cdots, +,{\varphi}^{\prime },\left({u}_{1,j}^{\left({N}_r^{(j)}\right)}\right),{w}_{1,{Q}_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}\right)},{\varphi}^{\prime },\left({u}_{Q_{N_r^{(j)}-1}^{(j)},j}^{\left({N}_r^{(j)}-1\right)}\right),{w}_{Q_{N_r^{(j)}-1}^{(j)},{n}_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-1\right)},{\varphi}^{\prime },\left({u}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}\right)\right)=\sum \limits_{v=1}^{Q_{N_r^{(j)}-1}^{(j)}}{\varphi}^{\prime}\left({u}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}\right){w}_{v,{n}_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-1\right)}{\varphi}^{\prime}\left({u}_{v,j}^{\left({N}_r^{(j)}-1\right)}\right){w}_{1,v,j}^{\left({N}_r^{(j)}\right)}{\varphi}^{\prime}\left({u}_{1,j}^{\left({N}_r^{(j)}\right)}\right)c\left(t-j\right)=\sum \limits_{v=1}^{Q_{N_r^{(j)}-1}^{(j)}}{\varphi}^{\prime}\left({u}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}\right){w}_{v,{n}_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-1\right)}{\delta}_{v,j}^{\left({N}_r^{(j)}-1\right)}(t)={\delta}_{n_{N_r^{(j)}-2}^{(j)},j}^{\left({N}_r^{(j)}-2\right)}(t) $$
(34)

Therefore, as for the j ‐ th DBN module, according to the derivation process above, Eq. (35) is used to calculate the local gradient of each neuron of layer ℓj.

$$ {\delta}_{n_{\ell_j}^{(j)},j}^{\left({\ell}_j\right)}(t)=\sum \limits_{v=1}^{Q_{\ell_j+1}^{(j)}}{\varphi}^{\prime}\left({u}_{n_{\ell_j}^{(j)},j}^{\left({\ell}_j\right)}\right){w}_{v,{n}_{\ell_j}^{(j)},j}^{\left({\ell}_j+1\right)}{\delta}_{v,j}^{\left({\ell}_j+1\right)}(t),{\ell}_j\in \left\{1,2,\cdots, {N}_r^{(j)}-2\right\}, $$
(35)

The updated gradients to the weight and corresponding bias can be calculated by Eq. (36) and Eq. (37).

$$ \frac{\partial E(t)}{\partial {w}_{n_{\ell_j}^{(j)},{n}_{\ell_j-1}^{(j)},j}^{\left({\ell}_j\right)}}={\delta}_{n_{\ell_j}^{(j)},j}^{\left({\ell}_j\right)}(t){h}_{n_{\ell_j-1}^{(j)},j}^{\left({\ell}_j-1\right)}(t) $$
(36)
$$ \frac{\partial E(t)}{\partial {b}_{n_{\ell_j}^{(j)},j}^{\left({\ell}_j\right)}}={\delta}_{n_{\ell_j}^{(j)},j}^{\left({\ell}_j\right)}(t) $$
(37)

The correction \( \varDelta {w}_{n_L^{(j)},{n}_{L-1}^{(j)},j}^{(L)} \) applied to \( {w}_{n_L^{(j)},{n}_{L-1}^{(j)},j}^{(L)} \) is defined by the delta rule, Accordingly, the use of Eqs. (24–26), Eqs. (28–30) and Eqs. (32–37) yields

$$ \varDelta {w}_{n_L^{(j)},{n}_{L-1}^{(j)},j}^{(L)}=-\eta \frac{\partial E(t)}{\partial {w}_{n_L^{(j)},{n}_{L-1}^{(j)},j}^{(L)}}=-\eta {\delta}_{n_L^{(j)},j}^{(L)}{h}_{n_{L-1}^{(j)},j}^{\left(L-1\right)}(t) $$
(38)
$$ \varDelta {b}_{n_L^{(j)},j}^{(L)}=-\eta \frac{\partial E(t)}{\partial {b}_{n_L^{(j)},j}^{(L)}}=-\eta {\delta}_{n_L^{(j)},j}^{(L)} $$
(39)

where \( L\in \left\{1,2,\cdots, {N}_r^{(j)}-1,{N}_r^{(j)}\right\} \) and η > 0 is the pre-determined learning rate, and the parameters of weight and bias are calculated according to the following rules

$$ \left\{\begin{array}{c}{w}_{n_L^{(j)},{n}_{L-1}^{(j)},j}^{(L)}\Leftarrow {w}_{n_L^{(j)},{n}_{L-1}^{(j)},j}^{(L)}+\varDelta {w}_{n_L^{(j)},{n}_{L-1}^{(j)},j}^{(L)}\\ {}{b}_{n_L^{(j)},j}^{(L)}\Leftarrow {b}_{n_L^{(j)},j}^{(L)}+\varDelta {b}_{n_L^{(j)},j}^{(L)}\end{array}\right. $$
(40)

where the initial value of weight \( {w}_{n_L^{(j)},{n}_{L-1}^{(j)},j}^{(L)} \) and bias \( {b}_{n_L^{(j)},j}^{(L)} \) are obtained in the pre-training stage. In addition, to slow down the parameters convergence speed and oscillation, the momentum constant is added in Eq. (40), and finally we use the following parameter updating strategy

$$ \left\{\begin{array}{c}{w}_{n_L^{(j)},{n}_{L-1}^{(j)},j}^{(L)}(k)\Leftarrow {w}_{n_L^{(j)},{n}_{L-1}^{(j)},j}^{(L)}\left(k-1\right)+\varDelta {w}_{n_L^{(j)},{n}_{L-1}^{(j)},j}^{(L)}(k)+\alpha \left({w}_{n_L^{(j)},{n}_{L-1}^{(j)},j}^{(L)}\left(k-1\right)-{w}_{n_L^{(j)},{n}_{L-1}^{(j)},j}^{(L)}\left(k-2\right)\right)\\ {}{b}_{n_L^{(j)},j}^{(L)}(k)\Leftarrow {b}_{n_L^{(j)},j}^{(L)}\left(k-1\right)+\varDelta {b}_{n_L^{(j)},j}^{(L)}(k)+\alpha \left({b}_{n_L^{(j)},j}^{(L)}\left(k-1\right)-{b}_{n_L^{(j)},j}^{(L)}\left(k-2\right)\right)\end{array}\right. $$
(41)

where k is the number of updating iterations, α ∈ [0, 1)is a pre-determined momentum constant.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xu, W., Peng, H., Tian, X. et al. DBN based SD-ARX model for nonlinear time series prediction and analysis. Appl Intell 50, 4586–4601 (2020). https://doi.org/10.1007/s10489-020-01804-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-020-01804-2

Keywords

Navigation