Skip to main content
Log in

A Smoothing Algorithm with Constant Learning Rate for Training Two Kinds of Fuzzy Neural Networks and Its Convergence

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

In this paper, a smoothing algorithm with constant learning rate is presented for training two kinds of fuzzy neural networks (FNNs): max-product and max-min FNNs. Some weak and strong convergence results for the algorithm are provided with the error function monotonically decreasing, its gradient going to zero, and weight sequence tending to a fixed value during the iteration. Furthermore, conditions for the constant learning rate are specified to guarantee the convergence. Finally, three numerical examples are given to illustrate the feasibility and efficiency of the algorithm and to support the theoretical findings.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  1. Baruch IS, Lopez RB, Guzman J-LO, Flores JM (2008) A fuzzy-neural multi-model for nonlinear systems identification and control. Fuzzy Sets Syst 159:2650–2667

    Article  MathSciNet  Google Scholar 

  2. Hengjie S, Chunyan M, Zhiqi S, Yuan M, Lee B-S (2009) A fuzzy neural network with fuzzy impact grades. Neurocomputing 72:3098–3122

    Article  Google Scholar 

  3. Castro JR, Castillo O, Melin P, Rodríguez-Díaz A (2009) A hybrid learning algorithm for a class of interval type-2 fuzzy neural networks. Inf Sci 179:2175–2193

    Article  Google Scholar 

  4. Juang C-F, Yang-YinLin, Chiu-ChuanTu (2010) A recurrent self-evolving fuzzy neural network with local feedbacks and itsapplication to dynamic system processing. Fuzzy Sets Syst 161:2552–2568

    Article  Google Scholar 

  5. Khajeh A, Modarress H (2010) Prediction of solubility of gases in polystyrene by adaptive neuro-fuzzy inference system and radial basis function neural network. Expert Syst Appl 37:3070–3074

    Article  Google Scholar 

  6. Nandedkar AV, Biswas PK (2007) A fuzzy min-max neural network classifier with compensatory neuron architecture. IEEE Trans Neural Netw 18:42–54

    Article  Google Scholar 

  7. Sonule PM, Shetty BS (2017) An enhanced fuzzy min-max neural network with ant colony optimization based-rule-extractor for decision making. Neurocomputing 239:204–213

    Article  Google Scholar 

  8. Peeva K (2013) Resolution of fuzzy relational equations-method, algorithm and software with applications. Inf Sci 234(10):44–63

    Article  MathSciNet  Google Scholar 

  9. Brouwer RK (2002) A discrete fully recurrent network of max product units for associative memory and classification. Int J Neural Syst 12(3–4):247–262

    Article  Google Scholar 

  10. Wong W-K, Loo C-K, Lim W-S, Tan P-N (2010) Thermal condition monitoring system using log-polar mapping, quaternion correlation and max-product fuzzy neural network classification. Neurocomputing 74(1–3):164–177

    Article  Google Scholar 

  11. Li Y, Zhong-Fu W (2008) Fuzzy feature selection based on min-max learning rule and extension matrix. Pattern Recognit 41:217–226

    Article  Google Scholar 

  12. Wong W, Loo C, Lim W, Tan P (2010) Thermal condition monitoring system using log-polar mapping, quaternion correlation and max-product fuzzy neural network classification. Neurocomputing 74:164–177

    Article  Google Scholar 

  13. Liu J, Ma Y, Zhang H, Hanguang S, Xiao G (2017) A modified fuzzy minCmax neural network for data clustering and its application on pipeline internal inspection data. Neurocomputing 238:56–66

    Article  Google Scholar 

  14. Mohammeda MF, Lim CP (2017) A new hyperbox selection rule and a pruning strategy for the enhanced fuzzy min-max neural network. Neural Netw 86:69–79

    Article  Google Scholar 

  15. Shindea S, Kulkarni U (2016) Extracting classification rules from modified fuzzy minCmax neuralnetwork for data with mixed attributes. Appl Soft Comput 40:364–378

    Article  Google Scholar 

  16. Quteishat A, Lim CP (2008) A modified fuzzy min-max neural network with rule extraction and its application to fault detection and classification. Appl Soft Comput 8:985–995

    Article  Google Scholar 

  17. Park J-H, Kim T-H, Sugie T (2011) Output feedback model predictive control for LPV systems based on quasi-min-max algorithm. Automatica 47:2052–2058

    Article  MathSciNet  Google Scholar 

  18. Iliadis LS, Spartalis S, Tachos S (2008) Application of fuzzy T-norms towards a new artificial neural networks? Evaluation framework: a case from wood industry. Inf Sci 178:3828–3839

    Article  Google Scholar 

  19. Marks RJ II, Oh S, Arabshahi P, Caudell TP, Choi JJ, Song BG (1992) Steepest descent adaptation of min-max fuzzy if-then rules. In: Proc. IJCNN, Beijing, China, vol. III, pp 471–477

  20. Stoeva S, Nikov A (2000) A fuzzy backpropagation algorithm. Fuzzy Sets Syst 112:27–39

    Article  MathSciNet  Google Scholar 

  21. Nikov A, Stoeva S (2001) Quick fuzzy backpropagation algorithm. Neural Netw 14:231–244

    Article  Google Scholar 

  22. Blanco A, Delgado M, Requena I (1995) Identification of fuzzy relational equations by fuzzy neural networks. Fuzzy Sets Syst 71:215–226

    Article  Google Scholar 

  23. Zhang X, Hang C-C (1996) The min-max function differentiation and training of fuzzy neural networks. IEEE Trans Neural Netw 7(5):1139–1149

    Article  Google Scholar 

  24. Li L, Qiao Z, Liu Y, Chen Y (2017) A convergent smoothing algorithm for training max-min fuzzy neural networks. Neurocomputing 260:404–410

    Article  Google Scholar 

  25. Peng J-M, Lin Z (1999) A non-interior continuation method for generalized linear complementarity problems. Math Program Ser A 86:533–563

    Article  MathSciNet  Google Scholar 

  26. Tong X, Qi L, Felix W, Zhou H (2010) A smoothing method for solving portfolio optimization with CVaR and applications in allocation of generation asset. Appl Math Comput 216:1723–1740

    MathSciNet  MATH  Google Scholar 

  27. Zhang H, Wu W, Liu F, Yao M (2009) Boundedness and convergence of online gradient method with penalty for feedforward neural networks. IEEE Trans Neural Netw 20:1050–1054

    Article  Google Scholar 

  28. Wei W, Li L, Yang J, Liu Y (2010) A modified gradient-based neuro-fuzzy learning algorithm and its convergence. Inf Sci 180:1630–1642

    Article  Google Scholar 

  29. Shao HM, Zheng GF (2011) Boundedness and convergence of online gradient method with penalty and momentum. Neurocomputing 74:765C770

    Google Scholar 

  30. Loetamonphong J, Fang S-C (1999) An efficient solution procedure for fuzzy relation equations with max-product composition. IEEE Trans Fuzzy Syst 7:441–445

    Article  Google Scholar 

  31. Yeh CT (2008) On the minimal solutions of max-min fuzzy relational equations. Fuzzy Sets Syst 159:23–39

    Article  MathSciNet  Google Scholar 

  32. Wang J, Wei W, Zurada JM (2011) Determinisitc convergence of conjugate gradient method for feedforward neural networks. Neurocomputing 74:2368–2376

    Article  Google Scholar 

Download references

Acknowledgements

This project is partially supported by the Natural Science Foundation of China (11401185), the Hunan Provincial Natural Science Foundation of China (2017JJ2011, 14JJ6039), the Scientific Research Fund of Hunan Provincial Education Department (17A031, 13B004), the Science and Technology Plan Project of Hunan Province (Hunan Provincial Key Laboratory of Intelligent Information Processing and Application, 2016TP1020) and China Scholarship Council.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Long Li.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

First, let us make an estimation of matrix norm as preparation for the proof of our convergence theorem.

According to (12) and (13), we know that all second partial derivatives of \(\widetilde{E}(w)\) exist in \(R^n\) for any \(t>0\) and its second partial derivative \(\frac{\partial ^2 \widetilde{E}(w) }{\partial w_i\partial w_j}\) is given by

$$\begin{aligned} \frac{\partial ^2 \widetilde{E}(w) }{\partial w_i\partial w_j}= & {} \sum \limits _{s=1}^S\left\{ \lambda _j^s(w,t)\lambda _i^s(w,t)x^s_jx^s_i+\frac{1}{t} \left[ T^s-\widetilde{g}^s(w,t)\right] \lambda _j^s(w,t)\lambda _i^s(w,t)x^s_jx^s_i\right\} \nonumber \\= & {} \sum \limits _{s=1}^S \lambda _j^s(w,t)\lambda _i^s(w,t)x^s_jx^s_i\left[ 1+\frac{1}{t} (T^s-\widetilde{g}^s(w,t))\right] \end{aligned}$$
(22)

where \(\lambda _i^s(w,t)=\frac{\exp \left( x^s_iw_i/t\right) }{\sum \nolimits _{j=1}^n\exp \left( x^s_jw_j/t\right) }\), \(\widetilde{g}^s(w,t)=t\ln \sum \nolimits _{i=1}^n \exp \left( \frac{x_i^sw_i}{t}\right) \).

Then the Hessian matrix H(w) of \(\widetilde{E}(w)\) is a square n-by-n matrix and defined as

$$\begin{aligned} H(w)=\left( \frac{\partial ^2 \widetilde{E}(w) }{\partial w_i\partial w_j}\right) _{n\times n} \end{aligned}$$
(23)

It is easy to verify that all second partial derivatives of \(\widetilde{E}(w)\) are continuous and the following equation holds for all \(i,j=1,2,\ldots ,n\)

$$\begin{aligned} \frac{\partial ^2 \widetilde{E}(w) }{\partial w_i\partial w_j}=\frac{\partial ^2 \widetilde{E}(w) }{\partial w_j\partial w_i} \end{aligned}$$

Therefore, the Hessian matrix H(w) is a real symmetric matrix.

Suppose that \(\lambda _1(w),\lambda _2(w),\ldots ,\lambda _n(w)\) are n real eigenvalues of matrix H(w) and the norm \(\Vert \bullet \Vert _2\) defined for a matrix A by

$$\begin{aligned} \Vert {A}\Vert _2=\max \{\sqrt{\lambda }:\ \lambda \ is\ an\ eigenvalue\ of\ A^TA\} \end{aligned}$$

is taken as the matrix norm considered in the following discussion. We can obtain an estimation of the norm of matrix H(w) as shown in Lemma 2.

Lemma 2

Let H(w) be the Hessian matrix of \(\widetilde{E}(w)\) and defined in (22) and (23). There exits a constant \(C_1\) such that for all \(w\in D\)

$$\begin{aligned} \Vert H(w)\Vert _2\leqslant C_1 \end{aligned}$$

where D is a bounded set.

Proof

Since matrix H(w) is a real symmetric matrix and \(\lambda _1(w),\lambda _2(w),\ldots ,\lambda _n(w)\) are n real eigenvalues of it, there exits an unitary matrix Q such that

$$\begin{aligned} H(w)=Q^Tdiag\left( \lambda _1(w),\lambda _2(w),\ldots ,\lambda _n(w)\right) Q \end{aligned}$$

Then

$$\begin{aligned} \left( H(w)\right) ^TH(w)=Q^Tdiag\left( \lambda _1^2(w),\lambda _2^2(w),\ldots ,\lambda _n^2(w)\right) Q \end{aligned}$$

It is easy to know that all eigenvalues of \(\left( H(w)\right) ^TH(w)\) are

$$\begin{aligned} \lambda _1^2(w),\lambda _2^2(w),\ldots ,\lambda _n^2(w) \end{aligned}$$

Thus, we have

$$\begin{aligned} \Vert H(w)\Vert _2=\max \{|\lambda _i(w)|:i=1,2,\ldots ,n\} \end{aligned}$$

It follows from (22), (23) and the set D is bounded that \(\lambda _i(w)\) is bounded for all \(i=1,2,\ldots ,n\) and all \(w\in D\), namely there exits a constant \(C_1\) such that

$$\begin{aligned} |\lambda _i(w)|\leqslant C_1 \end{aligned}$$

for all \(i=1,2,\ldots ,n\) and all \(w\in D\). Therefore, we have

$$\begin{aligned} \Vert H(w)\Vert _2\leqslant C_1 \end{aligned}$$

This completes the proof of Lemma 2. \(\square \)

The following lemma is also crucial for the proof of our convergence theorem. This result is almost the same as Lemma 5.3 in [32] and the detail of its proof is omitted.

Lemma 3

Suppose that \(F : \mathbb {R}^p\rightarrow \mathbb {R}^q,\ p\ge 1,\ q\ge 1\) is continuous on a bounded closed region \(\mathbf{D}\subset \mathbb {R}^n\), and that \(\mathbf{D_0}=\{\mathbf {z}\in \mathbf{D}:\ F(\mathbf {z})=0\}\). The projection of \(\mathbf{D_0}\) on each coordinate axis does not contain any interior point. If a sequence \(\{\mathbf {z}^k\}\subset \mathbf{D}\) satisfies

$$\begin{aligned} \lim \limits _{k\rightarrow \infty } \Vert F(\mathbf {z}^k)\Vert =0,\ \ \lim \limits _{k\rightarrow \infty }\Vert \mathbf {z}^{k+1}-\mathbf {z}^k\Vert =0. \end{aligned}$$

then there exists a unique \(\mathbf {z}^*\in \mathbf{D_0}\) such that \(\lim \nolimits _{k\rightarrow \infty }\mathbf {z}^k=\mathbf {z}^*\).

Now we are ready to prove the main theorems in terms of the above two lemmas.

Proof to Theorem 1

The proof is divided into four parts, dealing with (18)–(21) respectively.

Proof to (18). Expanding \(\widetilde{E}(w^{k+1})\) with Taylor formula, we have for all \(k=0,1,2,\ldots \) that

$$\begin{aligned} \widetilde{E}(w^{k+1})-\widetilde{E}(w^{k})= & {} (\nabla \widetilde{E}(w^{k}))^T\Delta w^{k}+\frac{1}{2}(\Delta w^{k})^TH(\xi )(\Delta w^{k})\\\leqslant & {} -\eta \Vert \nabla \widetilde{E}(w^{k}))\Vert ^2+\frac{\Vert H(\xi )\Vert }{2}\eta ^2 \Vert \nabla \widetilde{E}(w^{k}))\Vert ^2\\= & {} -\left( \eta -\frac{\Vert H(\xi )\Vert }{2}\eta ^2\right) \Vert \nabla \widetilde{E}(w^{k}))\Vert ^2 \end{aligned}$$

where \(\xi \) lies between \(w^k\) and \(w^{k+1}\). Write \(\alpha =\eta -\frac{\Vert H(\xi )\Vert }{2}\eta ^2\). Then

$$\begin{aligned} \widetilde{E}(w^{k+1})\leqslant \widetilde{E}(w^{k})-\alpha \Vert \nabla \widetilde{E}(w^{k}))\Vert ^2 \end{aligned}$$
(24)

We require the learning rate \(\eta \) to satisfy

$$\begin{aligned} 0<\eta \leqslant \frac{1}{C} \end{aligned}$$
(25)

where \(C=\frac{C_1}{2}\). By virtue of Lemma 2, we have

$$\begin{aligned} \alpha =\eta -\frac{\Vert H(\xi )\Vert }{2}\eta ^2=\eta \left( 1-\frac{\Vert H(\xi )\Vert }{2}\eta \right) \geqslant 0 \end{aligned}$$

This together with (24) leads to

$$\begin{aligned} \widetilde{E}(w^{k+1})\leqslant \widetilde{E}(w^k),\quad k=0,1,2,\ldots \end{aligned}$$
(26)

This completes the proof of (18).

Proof to (19). By the definition of \(\widetilde{E}(w)\) in (12), it is easy to see that \(\widetilde{E}(w^k)\ge 0\) for \(\ k=0,1,2,\ldots \). Combining with (26), we then conclude by monotone convergence theorem that

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }\widetilde{E}(w^k)=\widetilde{E}^* \end{aligned}$$
(27)

where \(\widetilde{E}^*=\inf _{w\in \Omega }\widetilde{E}(w)\). This proves (19).

Proof to (20). According to (24), it is easy to get

$$\begin{aligned} \widetilde{E}(w^{k+1})\leqslant \widetilde{E}(w^{k})-\alpha \Vert \nabla \widetilde{E}(w^{k}))\Vert ^2 \leqslant \cdots \leqslant \widetilde{E}(w^{0})-\alpha \sum \limits _{t=0}^k\Vert \nabla \widetilde{E}(w^{t}))\Vert ^2 \end{aligned}$$

Since \(\widetilde{E}(w^{k+1})\geqslant 0\), we have

$$\begin{aligned} \alpha \sum \limits _{t=0}^k\Vert \nabla \widetilde{E}(w^{t}))\Vert ^2\leqslant \widetilde{E}(w^{0}) \end{aligned}$$

Letting \(k\rightarrow \infty \) results in

$$\begin{aligned} \sum \limits _{t=0}^\infty \Vert \nabla \widetilde{E}(w^{t}))\Vert ^2\leqslant \frac{1}{\alpha }\widetilde{E}(w^{0})<\infty \end{aligned}$$

This immediately gives

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }\Vert \nabla \widetilde{E}(w^k)\Vert =0 \end{aligned}$$
(28)

(20) is proved.

Proof to (21). By virtue of (14) and (28), we can get that

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }\Vert w^{k+1}-w^{k}\Vert = \lim \limits _{k\rightarrow \infty }\Vert -\eta \nabla \widetilde{E}(w^k)\Vert =0. \end{aligned}$$
(29)

Note that the error function \(\widetilde{E}(w)\) defined in (12) is continuous and differentiable. According to (13) and (15), it is easy to verify that \(\nabla \widetilde{E}(w)\) is continuous on \(w\in \mathbb {R}^n\). As (28), (29) and Assumption (A3) are valid, it follows immediately from Lemma 3 that (21) holds, that is, there exists a unique fixed point \(w^*\in \Omega \) such that

$$\begin{aligned} \lim \limits _{k\rightarrow \infty }w^k=w^*. \end{aligned}$$

And this completes the proof of Theorem 1. \(\square \)

Proof to Theorem 2

Using the Taylor expansion, we can also get for all \(k=0,1,2,\ldots \) that

$$\begin{aligned} \overline{E}(w^{k+1})-\overline{E}(w^{k})=(\nabla \overline{E}(w^{k}))^T\Delta w^{k}+\frac{1}{2} (\Delta w^{k})^TH(\xi )(\Delta w^{k}) \end{aligned}$$

where \(\xi \) lies between \(w^k\) and \(w^{k+1}\). Since Hessian matrix H(w) of \(\overline{E}(w)\) and \(\widetilde{E}(w)\) have the same mathematical properties, we can follow the proof process of Theorem 1 and get the same results as in Theorem 1 for the error function \(\overline{E}(w)\) and the sequence \(\{w^k\}\) generated by SAMM. This completes the proof of Theorem 2. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, L., Qiao, Z. & Long, Z. A Smoothing Algorithm with Constant Learning Rate for Training Two Kinds of Fuzzy Neural Networks and Its Convergence. Neural Process Lett 51, 1093–1109 (2020). https://doi.org/10.1007/s11063-019-10135-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-019-10135-4

Keywords

Navigation