Skip to main content

Advertisement

Log in

Modified gradient-based learning for local coupled feedforward neural networks with Gaussian basis function

  • Original Article
  • Published:
Neural Computing and Applications Aims and scope Submit manuscript

Abstract

Local coupled feedforward neural networks (LCFNNs) help address the problems of slow convergence and large computation consumption caused by multi-layer perceptrons structurally. This paper presents a modified gradient-based learning algorithm in an attempt to further enhance the capabilities of LCFNNs. Using this approach, an LCFNN can achieve quality generalisation with higher learning efficiency. Theoretical analysis of the convergence property of this algorithm is provided, indicating that the gradient of the error function monotonically decreases and tends to zeros and the weight parameter sequence converges to a minimum of the given error function with respect to the number of learning iterations. Conditions for the use of a constant learning rate in order to guarantee the convergence are also specified. The work is verified with numerical experimental results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Explore related subjects

Discover the latest articles and news from researchers in related subjects, suggested using machine learning.

References

  1. Kathirvalakumar T, Thangavel P (2006) A modified backpropagation training algorithm for feedforward neural networks. Neural Process Lett 23:111–119

    Article  Google Scholar 

  2. Liu J, Feng D, Zhang W (2009) Adaptive improved natural gradient algorithm for blind source separation. Neural Comput 21:872–889

    Article  MathSciNet  MATH  Google Scholar 

  3. Man Z, Wu H, Liu S, Yu X (2006) A new adaptive backpropagation algorithm based on lyapunov stability theory for neural networks. IEEE Trans Neural Netw 17:1580–1591

    Article  Google Scholar 

  4. Navia-Vázquez A, Figueiras-Vidal A (2000) Efficient block training of multilayer perceptrons. Neural Comput 12:1429–1447

    Article  Google Scholar 

  5. Ng S, Cheung C, Leung S (2004) Magnified gradient function with deterministic weight modification in adaptive learning. IEEE Trans Neural Netw 15:1411–1423

    Article  Google Scholar 

  6. Wang W, Yu B (2009) Text categorization based on combination of modified back propagation neural network and latent semantic analysis. Neural Comput Appl 18:875–881

    Article  Google Scholar 

  7. Nunnari G (2006) An improved back propagation algorithm topredict episodes of poor air quality. Soft Comput 10:132–139

    Article  Google Scholar 

  8. Qiao J, Zhang Y, Han H (2008) Fast unit pruning algorithm for feedforward neural network design. Appl Math Comput 205:622–627

    Article  MathSciNet  MATH  Google Scholar 

  9. Rubanov N (2000) The layer-wise method and the backpropagation hybrid approach to learning a feedforward neural network. IEEE Trans Neural Netw 11:295–304

    Article  Google Scholar 

  10. Siu S, Yang S, Lee C, Ho C (2007) Improving the back-propagation algorithm using evolutionary strategy. IEEE Trans Circ Syst-II: Express Br 54:171–175

    Article  Google Scholar 

  11. Hocenski Ž, Antunoviæ M, Filko D (2010) Accelerated gradient learning algorithm for neural network weights update. Neural Comput Appl 19:219–225

    Article  Google Scholar 

  12. Yadav RN, Kalra PK, John J (2006) Neural network learning with generalized-mean based neuron model. Soft Comput 10:257–263

    Article  Google Scholar 

  13. Zhang J, Zhang J, Lok T, Lyu M (2007) A hybrid particle swarm optimizationcback-propagation algorithm for feedforward neural network training. Artificial Intelligence and Computational Intelligence 185:1026–1037

    MATH  Google Scholar 

  14. Zweiri Y (2007) Optimization of a three-term backpropagation algorithm used for neural network learning. Int J Comput Intell 3:322–327

    Google Scholar 

  15. Sun J (1998) A new kind of feedforward neural network with advanced learning property. In: Proceedings of the 1998 artificial networks in engineering conference, intelligent engineering systems through artificial neural networks, vol 4, pp 81–84

  16. Sun J (2010) Local coupled feedforward neural network. Neural Netw 23:108–113

    Article  Google Scholar 

  17. Gori M, Maggini M (1996) Optimal convergence of on-line backpropagation. IEEE Trans Neural Netw 7:251–254

    Article  Google Scholar 

  18. Shao H, Wu W, Liu L (2006) Convergence of an online gradient algorithm with penalty for two-layer neural networks. In: MATH’06: Proceedings of the 10th WSEAS international conference on APPLIED MATHEMATICS, Stevens Point, Wisconsin, USA, World Scientific and Engineering Academy and Society (WSEAS), pp 107–111

  19. Wu W, Feng G, Li Z, Xu Y (2005) Deterministic convergence of an online gradient method for bp neural networks. IEEE Trans Neural Netw 16:533–540

    Article  Google Scholar 

  20. Xu Z, Zhang R, Jing W (2009) When does online bp training converge?. IEEE Trans Neural Netw 20:1529–1539

    Article  Google Scholar 

  21. Zhang H, Wu W (2009) Boundedness and convergence of online gradient method with penalty for linear output feedforward neural networks. Neural Process Lett 29:205–212

    Article  Google Scholar 

  22. Zhang H, Wu W, Liu F, Yao M (2009) Boundedness and convergence of online gradient method with penalty for feedforward neural networks. IEEE Trans Neural Netw 20:1050–1054

    Article  Google Scholar 

  23. Wu W, Li L, Yang J, Liu Y (2010) A modified gradient-based neuro-fuzzy learning algorithm and its convergence. Inform Sci 180:1630–1642

    Article  MathSciNet  MATH  Google Scholar 

  24. Wu W, Wang J, Cheng M, Li Z (2011) Convergence analysis of online gradient method for BP neural networks. Neural Netw 24:91–98

    Article  MATH  Google Scholar 

  25. Wang J, Yang J, Wu W (2011) Convergence of cyclic and almost-cyclic learning with momentum for feedforward neural networks. IEEE Trans Neural Netw 22:1297–1306

    Article  Google Scholar 

  26. Shao H, Wu W, Li F, Zheng G (2007) Convergence of batch gradient algorithm for feedforward neural network training. J Inform Comput Sci 4:251–255

    Article  Google Scholar 

  27. Kaastra I, Boyd M (1996) Designing a neural network for forecasting financial and economic time series. Neurocomputing 10:215–236

    Article  Google Scholar 

  28. Azoff EM (1994) Neural network time series forecasting of financial markets. Wiley, New York

    Google Scholar 

  29. Yuan Y, Sun W (2001) Optimization theory and methods. Science Press, China

    Google Scholar 

Download references

Acknowledgments

The authors are grateful to the contributions of the other team members, but will take full responsibility for the views expressed in this paper. This work is partly supported by “the Fundamental Research Funds for the Central Universities” and “the National Natural Science Foundation of China (10871220)”, and partly by Aberystwyth University, UK.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanpeng Qu.

Appendix

Appendix

Two lemmas are first presented in order to prove the main theorem.

Lemma 1

Suppose that \({F:\mathbb{R}^Q\rightarrow \mathbb{R}}\) is continuous and differentiable on a compact set \({{\bf D}\subset\mathbb{R}^Q,}\) and that \(\overline{\varvec{\Upomega}}=\{{\bf x}\in {\bf D}\,|\,\frac{\partial F({\bf x})}{\partial{\bf x}}=0\}\) contains only a finite number of points. If a sequence \(\{{\bf x}^k\}\subset {\bf D}\) satisfies

$$ \lim\limits_{k\rightarrow\infty}\|{\bf x}^{k+1}-{\bf x}^k\|=0,\quad \lim\limits_{k\rightarrow\infty}\left\|\frac{\partial F({\bf x}^k)} {\partial{\bf x}}\right\|=0, $$

then there exists a point \({\bf x}^{\ast}\in \overline{\varvec{\Upomega}}\) such that \(\lim_{k\rightarrow\infty}{\bf x}^k={\bf x}^{\ast}.\)

Proof

This result is almost the same as Theorem 14.1.5 in [29]. The detail of the proof is therefore omitted. \(\square\)

To continue the proof of the theorem, introduce the following notations for any 1 ≤ j ≤ J, 1 ≤ i ≤ n and \(k=0,1,2,\ldots\):

$$ \begin{aligned} \mathit{\Upphi}_0^{k,j}={\bf u}^k\cdot (f^{k,j}\odot G^{k,j}),\quad& \varphi^{k,j}=f^{k+1,j}-f^{k,j},\\ \quad&\psi^{k,j}=G^{k+1,j}-G^{k,j},\\ \end{aligned} $$
(21)
$$ \xi_i^{k,j}={\bf x}^j-{\bf a}_{\bf i}^k, \quad \mathit{\Upphi}_i^{k,j}=\xi_i^{k,j}\odot {\bf b}_{\bf i}^k. $$
(22)

Lemma 2

Suppose that Assumptions (A1) and (A2) both hold, then for all 1 ≤ i ≤ n, 1 ≤ j ≤ J and \(k=0,1,2\ldots\)

$$ \|\mathit{\Upphi}_0^{k,j}\|\leq nC_0, \quad \|\xi_i^{k,j}\|\leq C_1, \quad \|\mathit{\Upphi}_i^{k,j}\|\leq C_1, \quad \|O^{j}\|\leq C_1, $$
(23)
$$ \sum\limits_{j=1}^J g_j^{\prime}(\mathit{\Upphi}_0^{k,j})(\Updelta {\bf u}^{k}\cdot (\varphi^{k,j}\odot \psi^{k,j}))\leq C_2\eta^2 \left\|\frac{\partial E({\bf W}^k)}{\partial{\bf W}}\right\|^2, $$
(24)
$$ \begin{aligned} &\sum\limits_{j=1}^J g_j^{\prime}(\mathit{\Upphi}_0^{k,j})(\Updelta{\bf u}^k\cdot(\varphi^{k,j}\odot G^{k,j}))\\ &\quad \leq C_3\eta^2\left( \left\|\frac{\partial E({\bf W}^k)} {\partial{\bf u}}\right\|^2+\sum\limits_{i=1}^n \left\|\frac{\partial E({\bf W}^k)}{\partial{\bf v}_{\bf i}}\right\|^2\right), \end{aligned} $$
(25)
$$ \begin{aligned} &\sum\limits_{j=1}^J g_j^{\prime} (\mathit{\Upphi}_0^{k,j})(\Updelta {\bf u}^k\cdot(f^{k,j}\odot\psi^{k,j}))\\ &\quad \leq C_4\eta^2 \left( \left\|\frac{\partial E({\bf W}^k)} {\partial{\bf u}}\right\|^2+ \sum\limits_{i=1}^n \left\|\frac{\partial E({\bf W}^k)}{\partial{\bf a}_{\bf i}}\right\|^2+ \sum\limits_{i=1}^n \left\| \frac{\partial E({\bf W}^k)}{\partial{\bf b}_{\bf i}}\right\|^2\right), \end{aligned} $$
(26)
$$ \begin{aligned} &\sum\limits_{j=1}^J g_j^{\prime} (\mathit{\Upphi}_0^{k,j})({\bf u}^k\cdot(\varphi^{k,j}\odot\psi^{k,j}))\\ &\quad\leq C_5\eta^2\sum\limits_{i=1}^n\left( \left\| \frac{\partial E({\bf W}^k)}{\partial{\bf v}_{\bf i}}\right\|^2+ \left\|\frac{\partial E({\bf W}^k)}{\partial{\bf a}_{\bf i}} \right\|^2+\left\|\frac{\partial E({\bf W}^k)}{\partial{\bf b}_{\bf i}}\right\|^2\right),\\ \end{aligned} $$
(27)
$$ \sum\limits_{j=1}^J g_j^{\prime}(\mathit{\Upphi}_0^{k,j})(\Updelta{\bf u}^k\cdot(f^{k,j}\odot G^{k,j}))= -\eta \left\|\frac{\partial E({\bf W}^k)}{\partial{\bf u}}\right\|^2, $$
(28)
$$ \sum\limits_{{j = 1}}^{J} {g_{j}^{\prime } } \left({\Upphi}_{0}^{{k,j}} \right)({\mathbf{u}}^{k} \cdot(\varphi^{{k,j}} \odot G^{{k,j}} )) \le - (\eta - C_{6} \eta ^{2})\sum\limits_{{i = 1}}^{n} {\left\| {\frac{{\partial E({\mathbf{W}}^{k} )}}{{\partial {\mathbf{v}}_{{\mathbf{i}}} }}}\right\|^{2} } , $$
(29)
$$ \begin{aligned} &\sum\limits_{j=1}^J g_j^{\prime} (\mathit{\Upphi}_0^{k,j})({\bf u}^k\cdot(f^{k,j}\odot \psi^{k,j}))\\ &\quad\leq -(\eta-C_7\eta^2)\sum\limits_{i=1}^n\left(\| \frac{\partial E({\bf W}^k)}{\partial{\bf a}_{\bf i}}\|^2+ \left\|\frac{\partial E({\bf W}^k)}{\partial{\bf b}_{\bf i}}\right\|^2\right),\\ \end{aligned} $$
(30)
$$ \frac{1}{2}\sum\limits_{j=1}^J g_j^{\prime\prime}(s_{k,j}) (\mathit{\Upphi}_0^{k+1,j}-\mathit{\Upphi}_0^{k,j})^2\leq C_8\eta^2\left\|\frac{\partial E({\bf W}^k)}{\partial{\bf W}}\right\|^2, $$
(31)

where C m  (m = 1, 2, 3, 4, 5, 6, 7, 8) are constants independent of k and jand each \({s_{k,j}\in\mathbb{R}}\) is a constant lying on the segment between \(\mathit{\Upphi}_0^{k,j}\) and \({\mathit{\Upphi}}_0^{k+1,j}.\)

In order to keep the presentation in a more readable form, the following provides the proof of the convergence theorem by using the above two lemmas first. Then, the rather tedious proof of Lemma 2 is given at the end of this Appendix.

Proof of Theorem 1

The proof is divided into three parts, dealing with each of statements (i), (ii) and (iii), respectively. \(\square\)

Proof of Statement (i)

Using the Taylor expansion and Lemma 2, the following can be established for all \(k=0,1,2,\ldots\):

$$ \begin{aligned} &E({\bf W}^{k+1})-E({\bf W}^{k})\\ &\quad=\sum\limits_{j=1}^J (g_j(\mathit{\Upphi}_0^{k+1,j})-g_j(\mathit{\Upphi}_0^{k,j}))\\ &\quad=\sum\limits_{j=1}^J \left[\vphantom{\frac{1}{2}} g_j^{\prime}(\mathit{\Upphi}_0^{k,j})(\mathit{\Upphi}_0^{k+1,j}-\mathit{\Upphi}_0^{k,j})\right.\\ &\qquad\left. +\frac{1}{2}g_j^{\prime\prime} (s_{k,j})(\mathit{\Upphi}_0^{k+1,j}-\mathit{\Upphi}_0^{k,j})^2\right]\\ &\quad=\sum\limits_{j=1}^J \left[\vphantom{\frac{1}{2}} g_j^{\prime}(\mathit{\Upphi}_0^{k,j})({\bf u}^{k+1}\cdot (f^{k+1,j}\odot G^{k+1,j})\right.\\ &\qquad\left.-{\bf u}^{k}\cdot (f^{k,j}\odot G^{k,j}))+\frac{1}{2} g_j^{\prime\prime} (s_{k,j})(\mathit{\Upphi}_0^{k+1,j}-\mathit{\Upphi}_0^{k,j})^2\right]\\ &\quad=\sum\limits_{j=1}^J \left[g_j^{\prime} (\mathit{\Upphi}_0^{k,j})(\Updelta {\bf u}^{k}\cdot (\varphi^{k,j}\odot \psi^{k,j})+\Updelta{\bf u}^k\cdot(\varphi^{k,j}\odot G^{k,j})\right.\\ &\qquad+\Updelta {\bf u}^k\cdot(f^{k,j}\odot\psi^{k,j})+{\bf u}^k\cdot(\varphi^{k,j}\odot\psi^{k,j})\\ &\qquad+\Updelta{\bf u}^k\cdot(f^{k,j}\odot G^{k,j}) +{\bf u}^k\cdot(\varphi^{k,j}\odot G^{k,j})\\ &\qquad\left.+{\bf u}^k\cdot(f^{k,j}\odot \psi^{k,j}))\right]+ \frac{1}{2}\sum\limits_{j=1}^J g_j^{\prime\prime} (s_{k,j})(\mathit{\Upphi}_0^{k+1,j}-\mathit{\Upphi}_0^{k,j})^2\\ &\quad\leq C_2\eta^2 \left\|\frac{\partial E({\bf W}^k)}{\partial{\bf W}}\right\|^2\\ &\qquad+ C_3\eta^2\left(\left\|\frac{\partial E({\bf W}^k)}{\partial{\bf u}}\right\|^2 +\sum\limits_{i=1}^n\left\|\frac{\partial E({\bf W}^k)} {\partial{\bf v}_{\bf i}}\right\|^2\right)\\ &\qquad+C_4\eta^2 \left( \left\|\frac{\partial E({\bf W}^k)} {\partial{\bf u}}\right\|^2+\sum\limits_{i=1}^n \left\|\frac{\partial E({\bf W}^k)} {\partial{\bf a}_{\bf i}}\right\|^2\right.\\ &\qquad\left.+ \sum\limits_{i=1}^n \left\|\frac{\partial E({\bf W}^k)} {\partial{\bf b}_{\bf i}}\right\|^2\right)\\ &\qquad+C_5\eta^2\sum\limits_{i=1}^n\left( \left\|\frac{\partial E({\bf W}^k)}{\partial{\bf v}_{\bf i}}\right\|^2+ \left\|\frac{\partial E({\bf W}^k)}{\partial{\bf a}_{\bf i}}\right\|^2\right.\\ &\qquad\left.+\left\|\frac{\partial E({\bf W}^k)}{\partial{\bf b}_{\bf i}}\right\|^2\right)\\ &\qquad-\eta\left\|\frac{\partial E({\bf W}^k)}{\partial{\bf u}}\right\|^2-(\eta-C_6\eta^2)\sum\limits_{i=1}^n \left\|\frac{\partial E({\bf W}^k)}{\partial{\bf v}_{\bf i}}\right\|^2\\ &\qquad-(\eta-C_7\eta^2)\sum\limits_{i=1}^n\left( \left\|\frac{\partial E({\bf W}^k)}{\partial{\bf a}_{\bf i}}\right\|^2+ \left\|\frac{\partial E({\bf W}^k)} {\partial{\bf b}_{\bf i}}\right\|^2\right)\\ &\qquad+C_8\eta^2\left\|\frac{\partial E({\bf W}^k)} {\partial{\bf W}}\right\|^2\\ &\quad\leq C_2\eta^2\left\|\frac{\partial E({\bf W}^k)} {\partial{\bf W}}\right\|^2-\eta\left( \left\|\frac{\partial E({\bf W}^k)}{\partial{\bf u}}\right\|^2\right.\\ &\qquad\left.+\sum\limits_{i=1}^n\left( \left\|\frac{\partial E({\bf W}^k)} {\partial{\bf v}_{\bf i}}\right\|^2+ \left\|\frac{\partial E({\bf W}^k)} {\partial{\bf a}_{\bf i}}\right\|^2+ \left\|\frac{\partial E({\bf W}^k)}{\partial{\bf b}_{\bf i}}\right\|^2\right)\right)\\ &\qquad+(C_9+C_{10}+C_5)\eta^2\left( \left\|\frac{\partial E({\bf W}^k)}{\partial{\bf u}}\right\|^2\right.\\ &\qquad\left.+\sum\limits_{i=1}^n\left( \left\|\frac{\partial E({\bf W}^k)}{\partial{\bf v}_{\bf i}}\right\|^2+ \left\|\frac{\partial E({\bf W}^k)} {\partial{\bf a}_{\bf i}}\right\|^2+ \left\|\frac{\partial E({\bf W}^k)} {\partial{\bf b}_{\bf i}}\right\|^2\right)\right)\\ &\qquad+C_8\eta^2\left\|\frac{\partial E({\bf W}^k)}{\partial{\bf W}}\right\|^2\\ &\quad=-(\eta-C\eta^2)\left\|\frac{\partial E({\bf W}^k)}{\partial{\bf W}}\right\|^2, \end{aligned} $$
(32)

where C = C 2 + C 5 + C 8 + C 9 + C 10 and \({s_{k,j}\in\mathbb{R}}\) lies on the segment between \(\mathit{\Upphi}_0^{k,j}\) and \(\mathit{\Upphi}_0^{k+1,j}\).

Write β = η − Cη2. Then,

$$ E({\bf W}^{k+1})\leq E({\bf W}^{k})-\beta\left\|\frac{\partial E({\bf W}^k)}{\partial {\bf W}}\right\|^2. $$
(33)

Obviously, if the learning rate η is chosen such that

$$ 0< \eta < \frac{1}{C} $$
(34)

is satisfied, then the following holds

$$ E({\bf W}^{k+1})\leq E({\bf W}^{k}),\quad k=0,1,2,\ldots. $$
(35)

This proves statement (i). \(\square\)

Proof of Statement (ii)

From (33), it follows that

$$ \begin{aligned} &E({\bf W}^{k+1})\\ &\quad \leq E({\bf W}^{k})-\beta\left\| \frac{\partial E({\bf W}^k)}{\partial {\bf W}}\right\|^2\\ &\quad \leq{\cdots}\leq E({\bf W}^{0})-\beta\sum\limits_{t=0}^k \left\|\frac{\partial E({\bf W}^t)}{\partial {\bf W}}\right\|^2. \end{aligned} $$

Since E(W k+1) ≥ 0, the following holds:

$$ \beta\sum\limits_{t=0}^k \left\|\frac{\partial E({\bf W}^t)} {\partial {\bf W}}\right\|^2\leq E({\bf W}^{0}). $$

Letting \(k\rightarrow \infty\) results in

$$ \sum\limits_{t=0}^\infty\left\|\frac{\partial E({\bf W}^t)} {\partial {\bf W}}\right\|^2\leq \frac{1}{\beta} E({\bf W}^{0})< \infty. $$

This immediately gives

$$ \lim\limits_{k\rightarrow\infty}\left\|\frac{\partial E({\bf W}^k)}{\partial {\bf W}}\right\|=0. $$
(36)

Statement (ii) is therefore proved. \(\square\)

Proof of Statement (iii)

It follows from (12) and (36) that

$$ \lim\limits_{k\rightarrow\infty}\|\Updelta {\bf W}^k\|=0. $$
(37)

Note that the error function E(W) defined in (6) is continuous and differentiable. According to (37), Assumption (A3) and Lemma 1, it is straightforward to show that there exists a point \({\bf W}^{\ast}\in\mathit{\Upomega}\) such that

$$ \lim\limits_{k\rightarrow\infty}{\bf W}^k={\bf W}^{\ast}. $$

Thus, statement (iii) is proved, and this completes the proof of Theorem 1. \(\square\)

What remains is to prove Lemma 2. This is done by proving (23) to (31) successively in the sequel.

Proof of Lemma 2 (23)

For a fixed and finite set of training patterns, the estimates of (23) can be established by using Assumption (A1) in conjunction with (7), (8), (11), (21), (22) and also with the definitions of operator “\(\odot\)” and window function \(G(\cdot)\). \(\square\)

Proof of Lemma 2 (24)

By using the Mean Value Theorem, for 1 ≤ i ≤ n, 1 ≤ j ≤ J and \(k=0,1,2,\ldots,\) the following can be established:

$$ \begin{aligned} &h_i^{k+1,j}-h_i^{k,j}\\ &\quad={\exp}\left(\sum\limits_{l=1}^m (-(x_l^j-a_{li}^{k+1})^2(b_{li}^{k+1})^2)\right)\\ &\qquad-{\exp}\left(\sum\limits_{l=1}^m (-(x_l^j-a_{li}^k)^2(b_{li}^k)^2)\right)\\ &\quad={\exp}(-(\xi_i^{k+1,j}\odot {\bf b}_{\bf i}^{k+1})\cdot(\xi_i^{k+1,j}\odot {\bf b}_{\bf i}^{k+1}))\\ &\qquad-{\exp}(-(\xi_i^{k,j}\odot {\bf b}_{\bf i}^k)\cdot(\xi_i^{k,j}\odot {\bf b}_{\bf i}^k))\\ &\quad={\exp}(-\mathit{\Upphi}_i^{k+1,j}\cdot \mathit{\Upphi}_i^{k+1,j})-{\exp}(-\mathit{\Upphi}_i^{k,j}\cdot \mathit{\Upphi}_i^{k,j})\\ &\quad={\exp}(t_{i}^{s,j})(-(\mathit{\Upphi}_i^{k+1,j}\cdot \mathit{\Upphi}_i^{k+1,j}-\mathit{\Upphi}_i^{k,j}\cdot \mathit{\Upphi}_i^{k,j}))\\ &\quad=-{\exp}(t_{i}^{s,j})((\mathit{\Upphi}_i^{k+1,j}+ \mathit{\Upphi}_i^{k,j})\cdot(\mathit{\Upphi}_i^{k+1,j}- \mathit{\Upphi}_i^{k,j})), \end{aligned} $$

where t s,j i lies in between \(-\mathit{\Upphi}_i^{k,j}\cdot \mathit{\Upphi}_i^{k,j}\) and \(-\mathit{\Upphi}_i^{k+1,j}\cdot \mathit{\Upphi}_i^{k+1,j}\). By (23) and the Properties 1) and 3) of operator “\(\odot\)”, it follows that

$$ \begin{aligned} &|h_i^{k+1,j}-h_i^{k,j}|\\ &\quad\leq 2C_1\|\mathit{\Upphi}_i^{k+1,j}- \mathit{\Upphi}_i^{k,j}\|=2C_1\|\xi_i^{k+1,j}\odot {\bf b}_{\bf i}^{k+1}-\xi_i^{k,j}\odot {\bf b}_{\bf i}^k\|\\ &\quad=2C_1\|\xi_i^{k+1,j}\odot {\bf b}_{\bf i}^{k+1}-\xi_i^{k,j}\odot {\bf b}_{\bf i}^{k+1} + \xi_i^{k,j}\odot {\bf b}_{\bf i}^{k+1}\\ &\qquad-\xi_i^{k,j}\odot {\bf b}_{\bf i}^k\|\\ &\quad=2C_1\|(\xi_i^{k+1,j}-\xi_i^{k,j})\odot {\bf b}_{\bf i}^{k+1} + \xi_i^{k,j}\odot ({\bf b}_{\bf i}^{k+1}-{\bf b}_{\bf i}^k)\|\\ &\quad\leq 2C_1\|(\xi_i^{k+1,j}-\xi_i^{k,j})\odot {\bf b}_{\bf i}^{k+1}\| + 2C_1\|\xi_i^{k,j}\odot ({\bf b}_{\bf i}^{k+1}-{\bf b}_{\bf i}^k)\|\\ &\quad\leq 2C_0C_1\|\xi_i^{k+1,j}-\xi_i^{k,j}\|+2C_1^2\|{\bf b}_{\bf i}^{k+1}-{\bf b}_{\bf i}^k\|\\ &\quad\leq C_{21}(\|\Updelta {\bf a}_{\bf i}^k\|+\|\Updelta {\bf b}_{\bf i}^k\|), \end{aligned} $$
(38)

where C 21 = 2C 1max{C 0,C 1}. Furthermore, for 1 ≤ i ≤ n, 1 ≤ j ≤ J and \(k=0,1,2,\ldots,\) the following holds:

$$ \begin{aligned} \psi_i^{k,j}&=G_i^{k+1,j}-G_i^{k,j} \\ &=G^{\prime}(\sigma_{i}^{p,j})(h_i^{k+1,j}-h_i^{k,j}),\\ \end{aligned} $$

where σ p,j i lies in between h k+1,j i and h k,j i . Then, by (38) and Assumption (A2),

$$ \begin{aligned} |\psi_i^{k,j}|&=M_1|h_i^{k+1,j}-h_i^{k,j}|\\ &\leq M_1C_{21}(\|\Updelta {\bf a}_{\bf i}^k\|+\|\Updelta {\bf b}_{\bf i}^k\|). \end{aligned} $$
(39)

It follows from (39) that for any 1 ≤ j ≤ J and \(k=0,1,2,\ldots,\)

$$ \begin{aligned} \|\psi^{k,j}\|^2 &=\|G^{k+1,j}-G^{k,j}\|^2=\left\| \left(\begin{array}{c} G_1^{k+1,j}-G_1^{k,j}\\ G_2^{k+1,j}-G_2^{k,j}\\ \ldots\\ G_n^{k+1,j}-G_n^{k,j}\end{array}\right)\right\|^2\\ &\leq M_1^2C_{21}^2\sum\limits_{i=1}^n(\|\Updelta {\bf a}_{\bf i}^k\|+\|\Updelta {\bf b}_{\bf i}^k\|)^2\\ &\leq 2M_1^2C_{21}^2\sum\limits_{i=1}^n(\|\Updelta {\bf a}_{\bf i}^k\|^2+\|\Updelta {\bf b}_{\bf i}^k\|^2). \end{aligned} $$
(40)

By Assumption (A1), it can be derived that

$$ \|\psi^{k,j}\|\leq4\sqrt{n}M_1C_0C_{21} $$
(41)

As for \(\varphi^{k,j}\),

$$ \begin{aligned} \varphi_i^{k,j}&=f_i^{k+1,j}-f_i^{k,j} \\ &=f({\bf v}_{\bf i}^{k+1}\cdot{\bf x}^j)-f({\bf v}_{\bf i}^{k}\cdot{\bf x}^j) \\ &=f^{\prime}(\tau_{i}^{r,j})({\bf v}_{\bf i}^{k+1}-{\bf v}_{\bf i}^{k})\cdot{\bf x}^j,\\ \end{aligned} $$

where τ r,j i lies in between \({\bf v}_{\bf i}^{k+1}\cdot{\bf x}^j\) and \({\bf v}_{\bf i}^{k}\cdot{\bf x}^j\). Because \(|\frac{df(x)}{dx}|<1\), then

$$ \begin{aligned} |\varphi_i^{k,j}|&=|f_i^{k+1,j}-f_i^{k,j}| \\ &\leq M\|\Updelta {\bf v}_{\bf i}^{k}\|, \end{aligned} $$
(42)

furthermore,

$$ \begin{aligned} \|\varphi^{k,j}\|^2&=\|f^{k+1,j}-f^{k,j}\|^2 \\ &\leq M^2\sum\limits_{i=1}^n\|\Updelta {\bf v}_{\bf i}^{k}\|^2. \end{aligned} $$
(43)

By Assumption (A1), the following holds:

$$ \|\varphi^{k,j}\|\leq2\sqrt{n}MC_0. $$
(44)

According to the definition of g j (t) as expressed in (9), it is straightforward to establish that g j (t) = t − O j. This together with (23) leads to \(|g_j^{\prime}(\mathit{\Upphi}_0^{k,j})|\leq (nC_0+C_1)\). Employing (41) and (44), it can be derived that

$$ \begin{aligned} &\sum\limits_{j=1}^J g_j^{\prime}(\mathit{\Upphi}_0^{k,j})(\Updelta {\bf u}^{k}\cdot (\varphi^{k,j}\odot \psi^{k,j}))\\ &\quad=\frac{1}{3}\sum\limits_{j=1}^J g_j^{\prime}(\mathit{\Upphi}_0^{k,j}) (\Updelta {\bf u}^{k}\cdot (\varphi^{k,j}\odot \psi^{k,j}))\\ &\qquad +\frac{1}{3}\sum\limits_{j=1}^J g_j^{\prime}(\mathit{\Upphi}_0^{k,j}) (\Updelta {\bf u}^{k}\cdot (\varphi^{k,j}\odot \psi^{k,j}))\\ &\qquad+\frac{1}{3}\sum\limits_{j=1}^J g_j^{\prime}(\mathit{\Upphi}_0^{k,j}) (\Updelta {\bf u}^{k}\cdot (\varphi^{k,j}\odot \psi^{k,j}))\\ &\quad \leq \frac{1}{3}(nC_0+C_1)\sum\limits_{j=1}^J \|\Updelta {\bf u}^{k}\|\|\varphi^{k,j}\odot \psi^{k,j}\| \\ &\qquad+\frac{1}{3}(nC_0+C_1)\sum\limits_{j=1}^J \|\Updelta {\bf u}^{k}\|\|\varphi^{k,j}\odot \psi^{k,j}\| \\ &\qquad+\frac{1}{3}(nC_0+C_1)\sum\limits_{j=1}^J \|\Updelta {\bf u}^{k}\|\|\varphi^{k,j}\odot \psi^{k,j}\| \\ &\quad\leq \frac{1}{3}(nC_0+C_1)\sum\limits_{j=1}^J \|\Updelta {\bf u}^{k}\|\|\varphi^{k,j}\|\| \psi^{k,j}\| \\ &\qquad+\frac{1}{3}(nC_0+C_1)\sum\limits_{j=1}^J \|\Updelta {\bf u}^{k}\|\|\varphi^{k,j}\|\| \psi^{k,j}\| \\ &\qquad+\frac{1}{3}(nC_0+C_1)\sum\limits_{j=1}^J \|\Updelta {\bf u}^{k}\|\|\varphi^{k,j}\|\| \psi^{k,j}\| \\ &\quad\leq \frac{4}{3}M_1C_0C_{21}{\sqrt{n}} (nC_0+C_1)\sum\limits_{j=1}^J \|\Updelta {\bf u}^{k}\|\|\varphi^{k,j}\| \\ &\qquad+\frac{2}{3}MC_0{\sqrt{n}}(nC_0+C_1)\sum\limits_{j=1}^J \|\Updelta {\bf u}^{k}\|\| \psi^{k,j}\| \\ &\qquad+\frac{2}{3}C_0(nC_0+C_1)\sum\limits_{j=1}^J \|\varphi^{k,j}\|\| \psi^{k,j}\| \\ &\quad\leq \frac{2}{3} M_1C_0C_{21}{\sqrt{n}} (nC_0+C_1)\sum\limits_{j=1}^J(\|\Updelta {\bf u}^{k}\|^2+\|\varphi^{k,j}\|^2) \\ &\qquad+\frac{1}{3} MC_0{\sqrt{n}}(nC_0+C_1)\sum\limits_{j=1}^J(\|\Updelta {\bf u}^{k}\|^2+\|\psi^{k,j}\|^2) \\ &\qquad+\frac{1}{3}C_0(nC_0+C_1)\sum\limits_{j=1}^J (\|\varphi^{k,j}\|^2+\|\psi^{k,j}\|^2)\\ &\quad\leq \frac{2}{3}JM_1C_0C_{21}{\sqrt{n}}(nC_0+C_1)\left(\|\Updelta {\bf u}^{k}\|^2+M^2\sum\limits_{i=1}^n\|\Updelta{\bf v}_{\bf i}^{k}\|^2\right)\\ &\qquad+\frac{1}{3}JMC_0{\sqrt{n}}(nC_0+C_1)(\|\Updelta {\bf u}^{k}\|^2\\ &\qquad+2M_1^2C_{21}^2\sum\limits_{i=1}^n(\|\Updelta {\bf a}_{\bf i}^k\|^2+\|\Updelta {\bf b}_{\bf i}^k\|^2))\\ &\qquad+\frac{1}{3}JC_0(nC_0+C_1)\left(M^2\sum\limits_{i=1}^n\|\Updelta{\bf v}_{\bf i}^{k}\|^2\right.\\ &\qquad\left.+2M_1^2C_{21}^2\sum\limits_{i=1}^n\left(\|\Updelta {\bf a}_{\bf i}^k\|^2+\|\Updelta {\bf b}_{\bf i}^k\|^2\right)\right)\\ &\quad\leq C_2\left(\|\Updelta {\bf u}^k\|^2+\sum\limits_{i=1}^n\|\Updelta{\bf v}_{\bf i}^{k}\|^2+\sum\limits_{i=1}^n\|\Updelta {\bf a}_{\bf i}^k\|^2+\sum\limits_{i=1}^n\|\Updelta {\bf b}_{\bf i}^k\|^2\right)\\ &\quad =C_2\eta^2\left( \left\|\frac{\partial E({\bf W}^k)} {\partial{\bf u}}\right\|^2+ \sum\limits_{i=1}^n \left\|\frac{\partial E({\bf W}^k)}{\partial{\bf v}_{\bf i}}\right\|^2\right.\\ &\qquad\left.+\sum\limits_{i=1}^n \left\|\frac{\partial E({\bf W}^k)} {\partial{\bf a}_{\bf i}}\right\|^2 +\sum\limits_{i=1}^n\left\| \frac{\partial E({\bf W}^k)}{\partial{\bf b}_{\bf i}}\right\|^2\right) \\ &\quad =C_2\eta^2\|\frac{\partial E({\bf W}^k)}{\partial{\bf W}}\|^2\\ \end{aligned} $$

where \(C_2=\frac{1}{3}JC_0(nC_0+C_1)\hbox{max}\{\sqrt{n}(2M_1C_{21}+M)\), \(M^2(2\sqrt{n}M_1C_{21}+1),2M_1^2C_{21}^2(\sqrt{n}M+1)\}\).

So Lemma 2 (24) is proved. \(\square\)

Proof of Lemma 2 (25)

Using the definition of window function \(G(\cdot)\), it can be established that

$$ \begin{aligned} &\sum\limits_{j=1}^J g_j^{\prime}(\mathit{\Upphi}_0^{k,j})(\Updelta{\bf u}^k\cdot(\varphi^{k,j}\odot G^{k,j}))\\ &\quad \leq(nC_0+C_1)\sum\limits_{j=1}^J\|\Updelta{\bf u}^k\|\|\varphi^{k,j}\|\|G^{k,j}\|\\ &\quad \leq{\sqrt{n}}(nC_0+C_1)\sum\limits_{j=1}^J\|\Updelta{\bf u}^k\|\|\varphi^{k,j}\|\\ &\quad \leq \frac{1}{2} {\sqrt{n}}(nC_0+C_1)\sum\limits_{j=1}^J(\|\Updelta{\bf u}^k\|^2+\|\varphi^{k,j}\|^2)\\ &\quad \leq \frac{1}{2}J{\sqrt{n}}(nC_0+C_1)(\|\Updelta {\bf u}^{k}\|^2+M^2\sum\limits_{i=1}^n\|\Updelta{\bf v}_{\bf i}^{k}\|^2)\\ &\quad \leq C_3\eta^2\left(\left\|\frac{\partial E({\bf W}^k)} {\partial{\bf u}}\right\|^2+\sum\limits_{i=1}^n\| \frac{\partial E({\bf W}^k)}{\partial{\bf v}_{\bf i}}\|^2\right),\\ \end{aligned} $$

where \(C_3=\frac{1}{2}J\sqrt{n}(nC_0+C_1)\hbox{max}\{1,M^2\}\). So Lemma 2 (25) is proved. \(\square\)

Proof of Lemma 2 (26)

With the definition of activation function f(x), the following can be derived:

$$ \begin{aligned} &\sum\limits_{j=1}^J g_j^{\prime}(\mathit{\Upphi}_0^{k,j})(\Updelta {\bf u}^k\cdot(f^{k,j}\odot\psi^{k,j})) \\ &\quad\leq(nC_0+C_1)\sum\limits_{j=1}^J\|\Updelta{\bf u}^k\|\|f^{k,j}\|\|\psi^{k,j}\| \\ &\quad\leq{\sqrt{n}}(nC_0+C_1)\sum\limits_{j=1}^J\|\Updelta{\bf u}^k\|\|\psi^{k,j}\| \\ &\quad\leq \frac{1}{2}{\sqrt{n}}(nC_0+C_1)\sum\limits_{j=1}^J(\|\Updelta{\bf u}^k\|^2+\|\psi^{k,j}\|^2)\\ &\quad\leq \frac{1}{2} J{\sqrt{n}}(nC_0+C_1)(\|\Updelta {\bf u}^{k}\|^2\\ &\qquad+2M_1^2C_{21}^2\sum\limits_{i=1}^n(\|\Updelta{\bf a}_{\bf i}^{k}\|^2+\|\Updelta{\bf b}_{\bf i}^{k}\|^2)\\ &\quad\leq C_4\eta^2\left(\left\|\frac{\partial E({\bf W}^k)}{\partial{\bf u}}\right\|^2+\sum\limits_{i=1}^n \left\| \frac{\partial E({\bf W}^k)}{\partial{\bf a}_{\bf i}}\right\|^2\right.\\ &\qquad\left.+\sum\limits_{i=1}^n \left\|\frac{\partial E({\bf W}^k)} {\partial{\bf b}_{\bf i}}\right\|^2\right),\\ \end{aligned} $$

where \(C_4=\frac{1}{2}J\sqrt{n}(nC_0+C_1) \hbox{max}\{1,2M_1^2C_{21}^2\}\). So Lemma 2 (26) is proved. \(\square\)

Proof of Lemma 2 (27)

By Assumption (A1)

$$ \begin{aligned} &\sum\limits_{j=1}^J g_j^{\prime} (\mathit{\Upphi}_0^{k,j})({\bf u}^k\cdot(\varphi^{k,j}\odot\psi^{k,j})) \\ &\quad\leq(nC_0+C_1)\sum\limits_{j=1}^J\|{\bf u}^k\|\|\varphi^{k,j}\|\|\psi^{k,j}\| \\ &\quad\leq(nC_0+C_1)C_0\sum\limits_{j=1}^J \|\varphi^{k,j}\|\|\psi^{k,j}\|\\ &\quad\leq \frac{1}{2}(nC_0+C_1)C_0\sum\limits_{j=1}^J (\|\varphi^{k,j}\|^2+\|\psi^{k,j}\|^2)\\ &\quad\leq\frac{1}{2} J(nC_0+C_1)C_0 (M^2\sum\limits_{i=1}^n\|\Updelta {\bf v}_{\bf i}^{k}\|^2\\ &\qquad+2M_1^2C_{21}^2\sum\limits_{i=1}^n (\|\Updelta{\bf a}_{\bf i}^{k}\|^2+\|\Updelta{\bf b}_{\bf i}^{k}\|^2)\\ &\quad\leq C_5\eta^2\sum\limits_{i=1}^n\left(\|\frac{\partial E({\bf W}^k)}{\partial{\bf v}_{\bf i}}\|^2+\left\|\frac{\partial E({\bf W}^k)}{\partial{\bf a}_{\bf i}}\right\|^2+\left\|\frac{\partial E({\bf W}^k)}{\partial{\bf b}_{\bf i}}\right\|^2\right)\\ \end{aligned} $$

where \(C_5=\frac{1}{2}JC_0(nC_0+C_1) \hbox{max}\{M^2,2M_1^2C_{21}^2\}\). So Lemma 2 (27) is proved. \(\square\)

Proof of Lemma 2 (28)

It follows from (14) that

$$ \begin{aligned} &\sum\limits_{j=1}^J g_j^{\prime}(\mathit{\Upphi}_0^{k,j})(\Updelta{\bf u}^k\cdot(f^{k,j}\odot G^{k,j}))\\ &\quad =\frac{\partial E({\bf W}^k)}{\partial{\bf u}}\cdot \left(-\eta \frac{\partial E({\bf W}^k)}{\partial{\bf u}}\right)\\ &\quad=-\eta\left\|\frac{\partial E({\bf W}^k)}{\partial{\bf u}}\right\|^2.\\ \end{aligned} $$

So Lemma 2 (28) is proved. \(\square\)

Proof of Lemma 2 (29)

Using the Taylor expansion, the following can be established:

$$ \begin{aligned} &{\bf u}^k\cdot(\varphi^{k,j}\odot G^{k,j})\\ &\quad =\sum\limits_{i=1}^nu_i^k(f_i^{k+1,j}-f_i^{k,j})G_i^{k,j}\\ &\quad =\sum\limits_{i=1}^nu_i^k\left(f^{\prime}({\bf v}_{\bf i}^{k}\cdot{\bf x}^j)(\Updelta {\bf v}_{\bf i}^{k}\cdot{\bf x}^j)+\frac{1}{2} f^{\prime\prime}(\widetilde{\tau}_i^{r,j})(\Updelta {\bf v}_{\bf i}^{k}\cdot{\bf x}^j)^2\right)G_i^{k,j}\\ &\quad =\sum\limits_{i=1}^nu_i^kf^{\prime}({\bf v}_{\bf i}^{k}\cdot{\bf x}^j)(\Updelta {\bf v}_{\bf i}^{k}\cdot{\bf x}^j)G_i^{k,j}\\ &\qquad+\frac{1}{2}\sum\limits_{i=1}^nu_i^kf^{\prime\prime} (\widetilde{\tau}_i^{r,j})(\Updelta {\bf v}_{\bf i}^{k}\cdot{\bf x}^j)^2G_i^{k,j}\\ &\quad \triangleq \,\Upgamma_1+\Upgamma_2, \end{aligned} $$
(45)

where \(\widetilde{\tau}_i^{r,j}\) lies in between \({\bf v}_{\bf i}^{k+1}\cdot{\bf x}^j\) and \({\bf v}_{\bf i}^{k}\cdot{\bf x}^j\).

From (15), the following can be derived:

$$ \begin{aligned} &\sum\limits_{j=1}^J g_j^{\prime}(\mathit{\Upphi}_0^{k,j})\Upgamma_1\\ &\quad =\sum\limits_{j=1}^J g_j^{\prime}(\mathit{\Upphi}_0^{k,j})\sum\limits_{i=1}^nu_i^kf^{\prime} ({\bf v}_{\bf i}^{k}\cdot{\bf x}^j)(\Updelta {\bf v}_{\bf i}^{k}\cdot{\bf x}^j)G_i^{k,j}\\ &\quad= \sum\limits_{i=1}^n \frac{\partial E({\bf W}^k)}{\partial{\bf v}_{\bf i}} \cdot \left(-\eta \frac{\partial E({\bf W}^k)}{\partial{\bf v}_{\bf i}}\right)\\ &\quad =-\eta\sum\limits_{i=1}^n \left\|\frac{\partial E({\bf W}^k)}{\partial{\bf v}_{\bf i}}\right\|^2.\\ \end{aligned} $$
(46)

Observing \(|\frac{d^2f(x)}{dx^2}|<1\) and the definition of \(G(\cdot)\), it follows that

$$ \begin{aligned} &\sum\limits_{j=1}^J g_j^{\prime}(\mathit{\Upphi}_0^{k,j})\Upgamma_2\\ &\quad=\frac{1}{2}\sum\limits_{j=1}^J g_j^{\prime}(\mathit{\Upphi}_0^{k,j}) \sum\limits_{i=1}^nu_i^kf^{\prime\prime} (\widetilde{\tau}_i^{r,j})(\Updelta {\bf v}_{\bf i}^{k}\cdot{\bf x}^j)^2G_i^{k,j}\\ &\quad\leq \frac{1}{2} M^2C_0(nC_0+C_1) \sum\limits_{j=1}^J\sum\limits_{i=1}^n\|\Updelta {\bf v}_{\bf i}^{k}\|^2\\ &\quad \leq C_{6}\eta^2\sum\limits_{i=1}^n \left\|\frac{\partial E({\bf W}^k)}{\partial{\bf v}_{\bf i}}\right\|^2, \end{aligned} $$
(47)

where \(C_{6}=\frac{1}{2}JM^2C_0(nC_0+C_1)\). According to (46) and (47), it can be derived that

$$ \begin{aligned} &\sum\limits_{j=1}^J g_j^{\prime}(\mathit{\Upphi}_0^{k,j}){\bf u}^k\cdot(\varphi^{k,j}\odot G^{k,j}) \\ &\quad=\sum\limits_{j=1}^J g_j^{\prime}(\mathit{\Upphi}_0^{k,j})\Upgamma_1+\sum\limits_{j=1}^J g_j^{\prime}(\mathit{\Upphi}_0^{k,j})\Upgamma_2 \\ &\quad\leq-\eta\sum\limits_{i=1}^n \left\|\frac{\partial E({\bf W}^k)}{\partial{\bf v}_{\bf i}}\right\|^2+ C_{6}\eta^2\sum\limits_{i=1}^n \left\| \frac{\partial E({\bf W}^k)}{\partial{\bf v}_{\bf i}}\right\|^2 \\ &\quad =-(\eta-C_{6}\eta^2)\sum\limits_{i=1}^n\left\| \frac{\partial E({\bf W}^k)}{\partial{\bf v}_{\bf i}}\right\|^2.\\ \end{aligned} $$

So Lemma 2 (29) is proved. \(\square\)

Proof of Lemma 2 (30)

Using the Taylor expansion, it follows that

$$ \begin{aligned} &{\bf u}^k\cdot(f^{k,j}\odot \psi^{k,j})\\ &\quad=\sum\limits_{i=1}^nu_i^kf_i^{k,j}(G_i^{k+1,j}-G_i^{k,j})\\ &\quad=\sum\limits_{i=1}^nu_i^kf_i^{k,j}(G^{\prime} (h_i^{k,j})(h_i^{k+1,j}-h_i^{k,j})\\ &\qquad+\frac{1}{2}G^{\prime\prime} (\widetilde{\sigma}_{i}^{p,j})(h_i^{k+1,j}-h_i^{k,j})^2)\\ &\quad=\sum\limits_{i=1}^nu_i^kf_i^{k,j}G^{\prime} (h_i^{k,j})(h_i^{k+1,j}-h_i^{k,j})\\ &\qquad+\frac{1}{2}\sum\limits_{i=1}^nu_i^kf_i^{k,j}G^{\prime\prime} (\widetilde{\sigma}_{i}^{p,j})(h_i^{k+1,j}-h_i^{k,j})^2\\ &\quad\triangleq\, \mathit{\Upomega}_1+\mathit{\Upomega}_2, \end{aligned} $$
(48)

where \(\widetilde{\sigma}_{i}^{p,j}\) lies in between h k+1,j i and h k,j i . Thus,

$$ \begin{aligned} &\mathit{\Upomega}_1\\ &\quad=\sum\limits_{i=1}^nu_i^kf_i^{k,j}G^{\prime} (h_i^{k,j})(h_i^{k+1,j}-h_i^{k,j})\\ &\quad=\sum\limits_{i=1}^nu_i^kf_i^{k,j}G^{\prime} (h_i^{k,j})({\exp}(-\mathit{\Upphi}_i^{k+1,j}\cdot\mathit{\Upphi}_i^{k+1,j})\\ &\qquad-{\exp}(-\mathit{\Upphi}_i^{k,j}\cdot\mathit{\Upphi}_i^{k,j}))\\ &\quad=\sum\limits_{i=1}^nu_i^kf_i^{k,j}G^{\prime} (h_i^{k,j}) \left[\vphantom{\frac{1}{2}} {\exp}(-\mathit{\Upphi}_i^{k,j}\cdot\mathit{\Upphi}_i^{k,j})\right.\\ &\qquad\cdot (-(\mathit{\Upphi}_i^{k+1,j}\cdot\mathit{\Upphi}_i^{k+1,j} -\mathit{\Upphi}_i^{k,j}\cdot\mathit{\Upphi}_i^{k,j}))\\ &\qquad\left.+\frac{1}{2}{\exp}(\widetilde{t}_i^{s,j}) (-(\mathit{\Upphi}_i^{k+1,j}\cdot\mathit{\Upphi}_i^{k+1,j} -\mathit{\Upphi}_i^{k,j}\cdot\mathit{\Upphi}_i^{k,j}))^2\right]\\ &\quad=\sum\limits_{i=1}^nu_i^kf_i^{k,j}G^{\prime}(h_i^{k,j}) h_i^{k,j}(-(\mathit{\Upphi}_i^{k+1,j}\cdot\mathit{\Upphi}_i^{k+1,j}- \mathit{\Upphi}_i^{k,j}\cdot\mathit{\Upphi}_i^{k,j}))\\ &\qquad+\frac{1}{2}\sum\limits_{i=1}^nu_i^kf_i^{k,j} G^{\prime}(h_i^{k,j}){\exp}(\widetilde{t}_i^{s,j})\\ &\qquad\cdot(\mathit{\Upphi}_i^{k+1,j}\cdot\mathit{\Upphi}_i^{k+1,j} -\mathit{\Upphi}_i^{k,j}\cdot\mathit{\Upphi}_i^{k,j})^2\\ &\quad\triangleq \Updelta_1+\Updelta_2,\\ \end{aligned} $$
(49)

where \(\widetilde{t}_i^{s,j}\) lies in between \(-\mathit{\Upphi}_i^{k+1,j}\cdot \mathit{\Upphi}_i^{k+1,j}\) and \(-\mathit{\Upphi}_i^{k,j}\cdot \mathit{\Upphi}_i^{k,j}\). It follows from the properties of the operator “\(\odot\)” that

$$ \begin{aligned} &\Updelta_1\\ &\quad=\sum\limits_{i=1}^nu_i^kf_i^{k,j}G^{\prime} (h_i^{k,j}) h_i^{k,j}(-(\mathit{\Upphi}_i^{k+1,j}\cdot\mathit{\Upphi}_i^{k+1,j}- \mathit{\Upphi}_i^{k,j}\cdot\mathit{\Upphi}_i^{k,j}))\\ &\quad=\sum\limits_{i=1}^nu_i^kf_i^{k,j}G^{\prime} (h_i^{k,j})h_i^{k,j} [-(2\mathit{\Upphi}_i^{k,j}\cdot(\mathit{\Upphi}_i^{k+1,j}-\mathit{\Upphi}_i^{k,j})\\ &\qquad+(\mathit{\Upphi}_i^{k+1,j}-\mathit{\Upphi}_i^{k,j}) \cdot(\mathit{\Upphi}_i^{k+1,j}-\mathit{\Upphi}_i^{k,j}))]\\ &\quad=-2\sum\limits_{i=1}^nu_i^kf_i^{k,j}G^{\prime}(h_i^{k,j})h_i^{k,j} (\mathit{\Upphi}_i^{k,j}\cdot(\mathit{\Upphi}_i^{k+1,j}-\mathit{\Upphi}_i^{k,j}))\\ &\qquad-\sum\limits_{i=1}^nu_i^kf_i^{k,j}G^{\prime}(h_i^{k,j})h_i^{k,j} \|\mathit{\Upphi}_i^{k+1,j}-\mathit{\Upphi}_i^{k,j}\|^2\\ &\quad=-2\sum\limits_{i=1}^nu_i^kf_i^{k,j}G^{\prime}(h_i^{k,j})h_i^{k,j} (\mathit{\Upphi}_i^{k,j}\cdot((\xi_i^{k+1,j}-\xi_i^{k,j})\odot {\bf b}_{\bf i}^{k+1}\\ &\qquad+ \xi_i^{k,j}\odot ({\bf b}_{\bf i}^{k+1}-{\bf b}_{\bf i}^k)))-\delta\\ &\quad=-2\sum\limits_{i=1}^nu_i^kf_i^{k,j}G^{\prime}(h_i^{k,j})h_i^{k,j} ((\xi_i^{k,j}\odot {\bf b}_{\bf i}^k)\cdot((-\Updelta {\bf a}_{\bf i}^k)\odot {\bf b}_{\bf i}^{k+1}\\ &\qquad+\xi_i^{k,j}\odot \Updelta {\bf b}_{\bf i}^k))-\delta\\ &\quad=2\sum\limits_{i=1}^nu_i^kf_i^{k,j}G^{\prime}(h_i^{k,j})h_i^{k,j} (\xi_i^{k,j}\odot {\bf b}_{\bf i}^k)\cdot(\Updelta {\bf a}_{\bf i}^k\odot {\bf b}_{\bf i}^{k+1})\\ &\qquad-2\sum\limits_{i=1}^nu_i^kf_i^{k,j}G^{\prime} (h_i^{k,j})h_i^{k,j}(\xi_i^{k,j}\odot {\bf b}_{\bf i}^k)\cdot (\xi_i^{k,j}\odot \Updelta {\bf b}_{\bf i}^k)-\delta\\ &\quad=2\sum\limits_{i=1}^nu_i^kf_i^{k,j}G^{\prime}(h_i^{k,j})h_i^{k,j} (\xi_i^{k,j}\odot {\bf b}_{\bf i}^k\odot {\bf b}_{\bf i}^{k+1})\cdot\Updelta {\bf a}_{\bf i}^k\\ &\qquad-2\sum\limits_{i=1}^nu_i^kf_i^{k,j} G^{\prime}(h_i^{k,j})h_i^{k,j}(\xi_i^{k,j}\odot\xi_i^{k,j} \odot {\bf b}_{\bf i}^k)\cdot\Updelta {\bf b}_{\bf i}^k-\delta\\ &\quad=2\sum\limits_{i=1}^nu_i^kf_i^{k,j}G^{\prime}(h_i^{k,j})h_i^{k,j} (\xi_i^{k,j}\odot {\bf b}_{\bf i}^k\odot {\bf b}_{\bf i}^{k})\cdot\Updelta {\bf a}_{\bf i}^k\\ &\qquad+2\sum\limits_{i=1}^nu_i^kf_i^{k,j}G^{\prime}(h_i^{k,j})h_i^{k,j} (\xi_i^{k,j}\odot {\bf b}_{\bf i}^k\odot \Updelta {\bf b}_{\bf i}^k)\cdot\Updelta {\bf a}_{\bf i}^k\\ &\qquad-2\sum\limits_{i=1}^nu_i^kf_i^{k,j} G^{\prime}(h_i^{k,j})h_i^{k,j}(\xi_i^{k,j}\odot\xi_i^{k,j} \odot {\bf b}_{\bf i}^k)\cdot\Updelta {\bf b}_{\bf i}^k-\delta,\\ \end{aligned} $$
(50)

where \(\delta=\sum\nolimits_{i=1}^nu_i^kf_i^{k,j} G^{\prime}(h_i^{k,j})h_i^{k,j} \|\mathit{\Upphi}_i^{k+1,j}-\mathit{\Upphi}_i^{k,j}\|^2\). Then, according to (18) and (19),

$$ \begin{aligned} &\sum\limits_{j=1}^J g_j^{\prime}(\mathit{\Upphi}_0^{k,j})\Updelta_1\\ &\quad=2\sum\limits_{j=1}^Jg_j^{\prime}(\mathit{\Upphi}_0^{k,j}) \sum\limits_{i=1}^nu_i^kf_i^{k,j}G^{\prime}(h_i^{k,j})h_i^{k,j} (\xi_i^{k,j}\odot {\bf b}_{\bf i}^k\odot {\bf b}_{\bf i}^{k})\cdot\Updelta {\bf a}_{\bf i}^k\\ &\qquad+2\sum\limits_{j=1}^Jg_j^{\prime}(\mathit{\Upphi}_0^{k,j}) \sum\limits_{i=1}^nu_i^kf_i^{k,j}G^{\prime}(h_i^{k,j})h_i^{k,j} (\xi_i^{k,j}\odot {\bf b}_{\bf i}^k\odot \Updelta {\bf b}_{\bf i}^k)\cdot\Updelta {\bf a}_{\bf i}^k\\ &\qquad-2\sum\limits_{j=1}^Jg_j^{\prime}(\mathit{\Upphi}_0^{k,j}) \sum\limits_{i=1}^nu_i^kf_i^{k,j}G^{\prime} (h_i^{k,j})h_i^{k,j}(\xi_i^{k,j}\odot\xi_i^{k,j} \odot {\bf b}_{\bf i}^k)\cdot\Updelta {\bf b}_{\bf i}^k\\ &\qquad-\sum\limits_{j=1}^Jg_j^{\prime}(\mathit{\Upphi}_0^{k,j})\delta \\ &\quad=\sum\limits_{i=1}^n \frac{\partial E({\bf W}^k)}{\partial{\bf a}_{\bf i}}\cdot\Updelta {\bf a}_{\bf i}^k+2\sum\limits_{j=1}^Jg_j^{\prime}(\mathit{\Upphi}_0^{k,j}) \sum\limits_{i=1}^nu_i^kf_i^{k,j}G^{\prime}(h_i^{k,j})\\ &\qquad\cdot h_i^{k,j}(\xi_i^{k,j}\odot {\bf b}_{\bf i}^k\odot \Updelta {\bf b}_{\bf i}^k)\cdot\Updelta {\bf a}_{\bf i}^k\\ &\qquad +\sum\limits_{i=1}^n \frac{\partial E({\bf W}^k)}{\partial{\bf b}_{\bf i}}\cdot\Updelta {\bf b}_{\bf i}^k-\sum\limits_{j=1}^Jg_j^{\prime}(\mathit{\Upphi}_0^{k,j})\delta.\\ \end{aligned} $$
(51)

With Assumptions (A1) and (A2), and also (23) plus Property 1) of “\(\odot\)”, the following can be established:

$$ \begin{aligned} &2\sum\limits_{j=1}^Jg_j^{\prime}(\mathit{\Upphi}_0^{k,j}) \sum\limits_{i=1}^nu_i^kf_i^{k,j}G^{\prime}(h_i^{k,j})\\ &\qquad\cdot h_i^{k,j} (\xi_i^{k,j}\odot {\bf b}_{\bf i}^k\odot \Updelta {\bf b}_{\bf i}^k)\cdot\Updelta {\bf a}_{\bf i}^k\\ &\quad\leq2M_1C_0C_1(nC_0+C_1)\sum\limits_{j=1}^J\sum\limits_{i=1}^n \|\Updelta {\bf b}_{\bf i}^k\|\|\Updelta {\bf a}_{\bf i}^k\|\\ &\quad\leq JM_1C_0C_1(nC_0+C_1)\sum\limits_{i=1}^n(\|\Updelta {\bf a}_{\bf i}^k\|^2+\|\Updelta {\bf b}_{\bf i}^k\|^2), \end{aligned} $$
(52)

and

$$ \begin{aligned} &-\sum\limits_{j=1}^Jg_j^{\prime}(\mathit{\Upphi}_0^{k,j})\delta\\ &\quad=-\sum\limits_{j=1}^Jg_j^{\prime}(\mathit{\Upphi}_0^{k,j}) \sum\limits_{i=1}^nu_i^kf_i^{k,j}G^{\prime}(h_i^{k,j})h_i^{k,j} \|\mathit{\Upphi}_i^{k+1,j}-\mathit{\Upphi}_i^{k,j}\|^2\\ &\quad\leq M_1C_0(nC_0+C_1)\sum\limits_{j=1}^J \sum\limits_{i=1}^n\|\mathit{\Upphi}_i^{k+1,j}-\mathit{\Upphi}_i^{k,j}\|^2\\ &\quad= JM_1C_0(nC_0+C_1)\sum\limits_{i=1}^n \|(-\Updelta {\bf a}_{\bf i}^k)\odot {\bf b}_{\bf i}^{k+1}\\ &\qquad+\xi_i^{k,j}\odot \Updelta {\bf b}_{\bf i}^k\|^2\\ &\quad\leq JM_1C_0(nC_0+C_1)\sum\limits_{i=1}^n((C_0^2+C_0C_1)\|\Updelta {\bf a}_{\bf i}^k\|^2\\ &\qquad+(C_1^2+C_0C_1)\|\Updelta {\bf b}_{\bf i}^k\|^2)\\ &\quad\leq C_{71}\sum\limits_{i=1}^n(\|\Updelta {\bf a}_{\bf i}^k\|^2+\|\Updelta {\bf b}_{\bf i}^k\|^2), \end{aligned} $$
(53)

where C 71 = JM 1 C 0(C 0 + C 1)(nC 0 + C 1)max{C 0,C 1}. The combination of (51), (52) and (53) leads to

$$ \begin{aligned} &\sum\limits_{j=1}^J g_j^{\prime}(\mathit{\Upphi}_0^{k,j})\Updelta_1 \leq \sum\limits_{i=1}^n \frac{\partial E({\bf W}^k)}{\partial{\bf a}_{\bf i}}\cdot\Updelta {\bf a}_{\bf i}^k \\ &\qquad+\sum\limits_{i=1}^n \frac{\partial E({\bf W}^k)} {\partial{\bf b}_{\bf i}}\cdot\Updelta {\bf b}_{\bf i}^k+ C_{72}\sum\limits_{i=1}^n(\|\Updelta {\bf a}_{\bf i}^k\|^2+\|\Updelta {\bf b}_{\bf i}^k\|^2), \end{aligned} $$
(54)

where C 72 = C 71 + JM 1 C 0 C 1(nC 0 + C 1). Furthermore,

$$ \begin{aligned} &\sum\limits_{j=1}^J g_j^{\prime}(\mathit{\Upphi}_0^{k,j})\Updelta_2\\ &\quad=\frac{1}{2}\sum\limits_{j=1}^J g_j^{\prime}(\mathit{\Upphi}_0^{k,j})\sum\limits_{i=1}^nu_i^kf_i^{k,j} G^{\prime}(h_i^{k,j}){\exp}(\widetilde{t}_i^{s,j})\\ &\qquad\cdot(\mathit{\Upphi}_i^{k+1,j}\cdot\mathit{\Upphi}_i^{k+1,j}- \mathit{\Upphi}_i^{k,j}\cdot\mathit{\Upphi}_i^{k,j})^2\\ &\quad\leq \frac{1}{2}M_1C_0(nC_0+C_1) \sum\limits_{j=1}^J\sum\limits_{i=1}^n (\mathit{\Upphi}_i^{k+1,j}\cdot\mathit{\Upphi}_i^{k+1,j}- \mathit{\Upphi}_i^{k,j}\cdot\mathit{\Upphi}_i^{k,j})^2\\ &\quad=\frac{1}{2}M_1C_0(nC_0+C_1) \sum\limits_{j=1}^J\sum\limits_{i=1}^n [(\mathit{\Upphi}_i^{k+1,j}+\mathit{\Upphi}_i^{k,j})\cdot (\mathit{\Upphi}_i^{k+1,j}-\mathit{\Upphi}_i^{k,j})]^2\\ &\quad\leq 2M_1C_0C_1^2(nC_0+C_1)\sum\limits_{j=1}^J \sum\limits_{i=1}^n\|\mathit{\Upphi}_i^{k+1,j}-\mathit{\Upphi}_i^{k,j}\|^2\\ &\quad\leq 2JM_1C_0C_1^2(nC_0+C_1)\sum\limits_{i=1}^n((C_0^2+C_0C_1)\|\Updelta {\bf a}_{\bf i}^k\|^2\\ &\qquad+(C_1^2+C_0C_1)\|\Updelta {\bf b}_{\bf i}^k\|^2)\\ &\quad\leq C_{73}\sum\limits_{i=1}^n(\|\Updelta {\bf a}_{\bf i}^k\|^2+\|\Updelta {\bf b}_{\bf i}^k\|^2), \end{aligned} $$
(55)

where C 73 = 2JM 1 C 0 C 21 (C 0 + C 1)(nC 0 + C 1)max{C 0,C 1}. The combination of (49), (54) and (55) leads to

$$ \begin{aligned} &\sum\limits_{j=1}^J g_j^{\prime}(\mathit{\Upphi}_0^{k,j})\mathit{\Upomega}_1\\ &\quad=\sum\limits_{j=1}^J g_j^{\prime}(\mathit{\Upphi}_0^{k,j})\Updelta_1 +\sum\limits_{j=1}^J g_j^{\prime}(\mathit{\Upphi}_0^{k,j})\Updelta_2\\ &\quad\leq \sum\limits_{i=1}^n \frac{\partial E({\bf W}^k)}{\partial{\bf a}_{\bf i}}\cdot\Updelta {\bf a}_{\bf i}^k +\sum\limits_{i=1}^n \frac{\partial E({\bf W}^k)}{\partial{\bf b}_{\bf i}}\cdot\Updelta {\bf b}_{\bf i}^k\\ &\quad+(C_{72}+C_{73})\sum\limits_{i=1}^n(\|\Updelta {\bf a}_{\bf i}^k\|^2+\|\Updelta {\bf b}_{\bf i}^k\|^2)\\ &\quad=-\eta\sum\limits_{i=1}^n\left( \left\|\frac{\partial E({\bf W}^k)} {\partial{\bf a}_{\bf i}}\right\|^2+\left\|\frac{\partial E({\bf W}^k)}{\partial{\bf b}_{\bf i}}\right\|^2\right)\\ &\qquad+(C_{72}+C_{73})\eta^2\sum\limits_{i=1}^n\left( \left\|\frac{\partial E({\bf W}^k)}{\partial{\bf a}_{\bf i}}\right\|^2+\left\|\frac{\partial E({\bf W}^k)} {\partial{\bf b}_{\bf i}}\right\|^2\right).\\ \end{aligned} $$
(56)

As for \(\mathit{\Upomega}_2\), from Assumption (A2) and (38), it can be seen that

$$ \begin{aligned} \mathit{\Upomega}_2&=\sum\limits_{i=1}^nu_i^kf_i^{k,j}G^{\prime\prime} (\widetilde{\sigma}_i^{p,j})(h_i^{k+1,j}-h_i^{k,j})^2\\ &\leq M_2C_0\sum\limits_{i=1}^n|h_i^{k+1,j}-h_i^{k,j}|^2 \\ &\leq M_2C_0C_{21}^2\sum\limits_{i=1}^n(\|\Updelta {\bf a}_{\bf i}^k\|+\|\Updelta {\bf b}_{\bf i}^k\|)^2\\ &\leq 2M_2C_0C_{21}^2\sum\limits_{i=1}^n(\|\Updelta {\bf a}_{\bf i}^k\|^2+\|\Updelta {\bf b}_{\bf i}^k\|^2),\\ \end{aligned} $$
(57)

then

$$ \begin{aligned} &\sum\limits_{j=1}^J g_j^{\prime}(\mathit{\Upphi}_0^{k,j})\mathit{\Upomega}_2\\ &\quad\leq 2M_2C_0C_{21}^2\sum\limits_{j=1}^J g_j^{\prime}(\mathit{\Upphi}_0^{k,j})\sum\limits_{i=1}^n(\|\Updelta {\bf a}_{\bf i}^k\|^2+\|\Updelta {\bf b}_{\bf i}^k\|^2)\\ &\quad\leq 2JM_2C_0C_{21}^2(nC_0+C_1)\sum\limits_{i=1}^n(\|\Updelta {\bf a}_{\bf i}^k\|^2+\|\Updelta {\bf b}_{\bf i}^k\|^2)\\ &\quad\leq C_{74}\eta^2\sum\limits_{i=1}^n\left( \left\|\frac{\partial E({\bf W}^k)}{\partial{\bf a}_{\bf i}}\right\|^2+ \left\|\frac{\partial E({\bf W}^k)} {\partial{\bf b}_{\bf i}}\right\|^2\right),\\ \end{aligned} $$
(58)

where C 74 = 2JM 2 C 0 C 221 (nC 0 + C 1). The combination of (56) and (58) leads to

$$ \begin{aligned} &\sum\limits_{j=1}^J g_j^{\prime}(\mathit{\Upphi}_0^{k,j}){\bf u}^k\cdot(f^{k,j}\odot \psi^{k,j}) \\ &\quad=\sum\limits_{j=1}^J g_j^{\prime}(\mathit{\Upphi}_0^{k,j})\mathit{\Upomega}_1+\sum\limits_{j=1}^J g_j^{\prime}(\mathit{\Upphi}_0^{k,j})\mathit{\Upomega}_2 \\ &\quad\leq-\eta\sum\limits_{i=1}^n\left( \left\|\frac{\partial E({\bf W}^k)}{\partial{\bf a}_{\bf i}}\right\|^2+ \left\|\frac{\partial E({\bf W}^k)} {\partial{\bf b}_{\bf i}}\right\|^2\right)\\ &\qquad+(C_{72}+C_{73})\eta^2\sum\limits_{i=1}^n\left( \left\|\frac{\partial E({\bf W}^k)}{\partial{\bf a}_{\bf i}}\right\|^2+\left\|\frac{\partial E({\bf W}^k)} {\partial{\bf b}_{\bf i}}\right\|^2\right)\\ &\qquad+ C_{74}\eta^2\sum\limits_{i=1}^n\left( \left\|\frac{\partial E({\bf W}^k)}{\partial{\bf a}_{\bf i}}\right\|^2+ \left\|\frac{\partial E({\bf W}^k)} {\partial{\bf b}_{\bf i}}\right\|^2\right)\\ &\quad=-\eta\sum\limits_{i=1}^n\left( \left\|\frac{\partial E({\bf W}^k)}{\partial{\bf a}_{\bf i}}\right\|^2+\left\| \frac{\partial E({\bf W}^k)}{\partial{\bf b}_{\bf i}}\right\|^2\right)\\ &\qquad+C_7\eta^2\sum\limits_{i=1}^n\left( \left\|\frac{\partial E({\bf W}^k)}{\partial{\bf a}_{\bf i}}\right\|^2+\left\| \frac{\partial E({\bf W}^k)}{\partial{\bf b}_{\bf i}}\right\|^2\right)\\ &\quad=-(\eta-C_7\eta^2)\sum\limits_{i=1}^n\left( \left\|\frac{\partial E({\bf W}^k)}{\partial{\bf a}_{\bf i}}\right\|^2+ \left\|\frac{\partial E({\bf W}^k)}{\partial{\bf b}_{\bf i}}\right\|^2\right),\\ \end{aligned} $$
(59)

where C 7 = C 72 + C 73 + C 74. So Lemma 2 (30) is proved. \(\square\)

Proof of Lemma 2 (31)

By the definition of g j (t) as expressed in (9), it is straightforward to derive that g ′′ j (t) = 1 and that the boundedness of \(G(\cdot)\) lies in \([0,+\infty)\). Employing Assumption (A1), and also (24) and (43), the following can be established:

$$ \begin{aligned} &\frac{1}{2}\sum\limits_{j=1}^J g_j^{\prime\prime}(s_{k,j})(\mathit{\Upphi}_0^{k+1,j}-\mathit{\Upphi}_0^{k,j})^2 \\ &\quad=\frac{1}{2}\sum\limits_{j=1}^J \|\mathit{\Upphi}_0^{k+1,j}-\mathit{\Upphi}_0^{k,j}\|^2 \\ &\quad=\frac{1}{2}\sum\limits_{j=1}^J \|{\bf u}^{k+1}\cdot (f^{k+1,j}\odot G^{k+1,j})-{\bf u}^{k}\cdot (f^{k,j}\odot G^{k,j})\|^2 \\ &\quad=\frac{1}{2}\sum\limits_{j=1}^J \|({\bf u}^{k+1}-{\bf u}^{k})\cdot (f^{k+1,j}\odot G^{k+1,j}) \\ &\qquad+{\bf u}^{k}\cdot (f^{k+1,j}\odot G^{k+1,j}-f^{k,j}\odot G^{k,j})\|^2 \\ &\quad\leq \frac{1}{2} \sum\limits_{j=1}^J(n(n+C_0)\|\Updelta {\bf u}^{k}\|^2 \\ &\qquad+C_0(n+C_0)\|f^{k+1,j}\odot G^{k+1,j}-f^{k,j}\odot G^{k,j}\|^2)\\ &\quad=\frac{1}{2} \sum\limits_{j=1}^J(n(n+C_0)\|\Updelta {\bf u}^{k}\|^2+C_0(n+C_0) \\ &\qquad\cdot\|(f^{k+1,j}-f^{k,j})\odot G^{k+1,j}+f^{k,j}\odot (G^{k+1,j}-G^{k,j})\|^2)\\ &\quad\leq \frac{1}{2} \sum\limits_{j=1}^J(n(n+C_0)\|\Updelta {\bf u}^{k}\|^2+2nC_0(n+C_0) \|\varphi^{k,j}\|^2 \\ &\qquad+2nC_0(n+C_0)\|\psi^{k,j}\|^2)\\ &\quad \leq \frac{1}{2} J n(n+C_0)\eta^2 \left\|\frac{\partial E({\bf W}^k)}{\partial{\bf u}}\right\|^2 \\ &\qquad+JnM^2C_0(n+C_0)\eta^2\sum\limits_{i=1}^n \left\|\frac{\partial E({\bf W}^k)}{\partial{\bf v}_{\bf i}}\right\|^2 \\ &\qquad+ 2JnM_1^2C_0C_{21}^2(n+C_0) \eta^2\sum\limits_{i=1}^n\left(\left\| \frac{\partial E({\bf W}^k)} {\partial{\bf a}_{\bf i}}\right\|^2+ \left\|\frac{\partial E({\bf W}^k)}{\partial{\bf b}_{\bf i}}\right\|^2\right) \\ &\quad= C_8\eta^2\left\|\frac{\partial E({\bf W}^k)}{\partial{\bf W}}\right\|^2\\ \end{aligned} $$

where \(C_8=\frac{1}{2}Jn(n+C_0) \hbox{max}\{1,2M^2C_0,4M_1^2C_0C_{21}^2\}\). Lemma 2 (31) is then proved. This completes the whole proof of Lemma 2. \(\square\)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Qu, Y., Shang, C., Yang, J. et al. Modified gradient-based learning for local coupled feedforward neural networks with Gaussian basis function. Neural Comput & Applic 22 (Suppl 1), 379–394 (2013). https://doi.org/10.1007/s00521-012-0910-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00521-012-0910-9

Keywords