Skip to main content
Log in

A Modified Learning Algorithm for Interval Perceptrons with Interval Weights

  • Published:
Neural Processing Letters Aims and scope Submit manuscript

Abstract

In many applications, it is natural to use interval data to describe various kinds of uncertainties. This paper is concerned with a one-layer interval perceptron with the weights and the outputs being intervals and the inputs being real numbers. In the original learning method for this interval perceptron, an absolute value function is applied for newly learned radii of the interval weights, so as to force the radii to be positive. This approach seems unnatural somehow, and might cause oscillation in the learning procedure as indicated in our numerical experiments. In this paper, a modified learning method is proposed for this one-layer interval perceptron. We do not use the function of the absolute value, and instead, we replace, in the error function, the radius of each interval weight by a quadratic term. This simple trick does not cause any additional computational work for the learning procedure, but it brings about the following three advantages: First, the radii of the intervals of the weights are guaranteed to be positive during the learning procedure without the help of the absolute value function. Secondly, the oscillation mentioned above is eliminated and the convergence of the learning procedure is improved, as indicated by our numerical experiments. Finally, a by-product is that the convergence analysis of the learning procedure is now an easy job, while the analysis for the original learning method is at least difficult, if not impossible, due to the non-smoothness of the absolute value function involved.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Han M, Fan JC, Wang J (2011) A dynamic feedforward neural network based on Gaussian particle swarm optimization and its application for predictive control. IEEE Trans Neural Netw 22:1457–1468

    Article  MATH  Google Scholar 

  2. Ren XM, Lv XH (2011) Identification of extended Hammerstein systems using dynamic self-optimizing neural networks. IEEE Trans Neural Netw 22:1169–1179

    Article  Google Scholar 

  3. Wade JJ, McDaid LJ, Santos JA, Sayers HM (2010) SWAT: a spiking neural network training algorithm for classification problems. IEEE Trans Neural Netw 21:1817–1830

    Article  Google Scholar 

  4. Zhang NM (2011) Momentum algorithms in neural networks and the applications in numerical algebra, AIMSEC, pp 2192–2195

  5. Heshmaty B, Kandel A (1985) Fuzzy linear regression and its applications to forecasting in uncertain environment. Fuzzy Sets Syst 15:159C191

    Article  Google Scholar 

  6. Kaneyoshi M, Tanaka H, Kamei M, Farata H (1990) New system identification technique using fuzzy regression analysis. In: International symposium on uncertainty modeling and analysis, College Park, MD, USA, pp 528C533

  7. Hashiyama T, Furuhash T, Uchikawa Y (1992) An interval fuzzy model using a fuzzy neural network. In: IEEE international conference neural networks, Baltimore, MD, USA, pp 745–750

  8. Ishibuchi H, Tanaka H (1992) Fuzzy regression analysis using neural networks. Fuzzy Sets Syst 50:257–265

    Article  MathSciNet  Google Scholar 

  9. Ishibuchi H, Tanaka H, Okada H (1993) An architecture of neural networks with interval weights and its application to fuzzy regression analysis. Fuzzy Sets Syst 57:27–39

    Article  MathSciNet  Google Scholar 

  10. Ishibuchi H, Nii M (2001) Fuzzy regression using asymmetric fuzzy coefficients and fuzzified neural networks. Fuzzy Sets Syst 119:273–290

    Article  MathSciNet  MATH  Google Scholar 

  11. Hernandez CA, Espf J, Nakayama K, Fernandez M (1993) Interval arithmetic backpropagation. In: Proceeding of 1993 international joint conference on neural network, pp 375–378,

  12. Drago GP, Ridella S (1998) Pruning with interval arithmetic perceptron. Neurocomputing 18:229–246

    Article  Google Scholar 

  13. Drago GP, Ridella S (1999) Possibility and necessity pattern classification using an interval arithmetic perceptron. Neural Comput Appl 8:40–52

    Article  Google Scholar 

  14. Roque AMS, Mate C, Arroyo J, Sarabia A (2007) iMLP applying multi-layer perceptrons to interval-valued data. Neural Process Lett 25:157–169

    Article  Google Scholar 

  15. Shao HM, Zheng GF (2011) Convergence analysis of a back-propagation algorithm with adaptive momentum. Neurocomputing 74:749–752

    Article  Google Scholar 

  16. Wang J, Yang J, Wu W (2011) Convergence of cyclic and almost-cyclic learning with momentum for feedforward neural networks. IEEE Trans Neural Netw 22:1297–1306

    Article  Google Scholar 

  17. Xu ZB, Zhang R, Jing WF (2009) When on-line BP training converges. IEEE Trans Neural Netw 20:1529–1539

    Article  Google Scholar 

  18. Moore RE (1966) Interval analysis. Prentice-Hall, Englewood Cliffs, NJ

    Google Scholar 

  19. Sunaga T (1958) Theory of an interval algebra and its applications to numerical analysis. RAAG Mem 2:29–46

    MATH  Google Scholar 

  20. Wu W, Wang J, Cheng MS, Li ZX (2011) Convergence analysis of online gradient method for BP neural networks. Neural Netw 24:91–98

    Article  Google Scholar 

  21. Xu DP, Zhang HS, Liu LJ (2010) Convergence analysis of three classes of split-complex gradient algorithms for complex-valued recurrent neural networks. Neural Comput 22:2655–2677

    Article  MathSciNet  Google Scholar 

  22. Yao XF, Wang SD, Dong SQ (2004) Approximation of interval models by neural networks. In: Proceeding of 2004 IEEE international joint conference on neural networks, vol. 2. pp 1027–1032

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China (11171367) and the Fundamental Research Funds for the Central Universities of China.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dakun Yang.

Appendix

Appendix

First we borrow an important lemma from [20]:

Lemma 1

Let \(\left\{ b_m\right\} \) be a bounded sequence satisfying \(\lim _{m\rightarrow \infty }(b_{m+1}-b_m)=0\). Write \(\gamma _1=\lim _{n\rightarrow \infty }\inf _{m>n} b_m,\,\gamma _2=\lim _{n\rightarrow \infty }\sup _{m>n} b_m\) and \(S=\{a\in {\mathbb {R}}:\) There exists a subsequence \(\{b_{i_k}\}\) of \(\{b_m\}\) such that \(b_{i_k}\rightarrow a \) as \(k\rightarrow \infty \}\). Then we have

$$\begin{aligned} S=[\gamma _1,\gamma _2]. \end{aligned}$$
(36)

Some useful estimates are gathered in the next two lemmas.

Lemma 2

For any \(k=0,1,2,\ldots \) and \(1\le j\le J\), we have:

$$\begin{aligned}&\displaystyle \sum ^J_{j=1}\beta (y^C_{j,k}-o^C_j)X_j\cdot \Delta \mathbf{u}^k=-\eta \Vert E_{\mathbf{u}}(\mathbf{w}^k)\Vert ^2,\end{aligned}$$
(37)
$$\begin{aligned}&\displaystyle \sum ^n_{i=1}\biggl (\sum ^J_{j=1}2(1-\beta )(y^R_{j,k}-o^R_j)v^k_i|x_{ji}|\biggr )\Delta v^k_i=-\eta \Vert E_{\mathbf{v}}(\mathbf{w}^k)\Vert ^2, \end{aligned}$$
(38)

Proof

These equalities are direct consequences of (24), (25) and (29). \(\square \)

Lemma 3

Suppose Assumption \((A1)\) holds, for any \(k=0,1,2,\ldots \) and \(1\le j\le J\), then we have:

$$\begin{aligned}&\displaystyle \max _{j,k}\left\{ \Vert X_j\Vert ,|o^C_j|,|o^R_j|,|y^C_{j,k}\Vert ,|y^R_{j,k}\Vert ,\Vert \mathbf{u}^k\Vert ,\Vert \mathbf{v}^k\Vert \right\} \le M_0,\end{aligned}$$
(39)
$$\begin{aligned}&\displaystyle \frac{1}{2}\sum ^J_{j=1}\beta (\Delta \mathbf{u}^{k}\cdot X_j)^2\le M_1\eta ^2\Vert E_{\mathbf{u}}(\mathbf{w}^k)\Vert ^2,\end{aligned}$$
(40)
$$\begin{aligned}&\displaystyle \sum ^J_{j=1}(1-\beta )(y^R_{j,k}-o^R_j)\biggl (\sum ^n_{i=1}(\Delta v^k_i)^2|x_{ji}|\biggr )\le M_2\eta ^2\Vert E_{\mathbf{v}}(\mathbf{w}^k)\Vert ^2,\end{aligned}$$
(41)
$$\begin{aligned}&\displaystyle \frac{1}{2}\sum ^J_{j=1}(1-\beta )\biggl (\sum ^n_{i=1}\Delta v^k_i(v^{k+1}_i+v^k_i)|x_{ji}|\biggr )^2\le M_3\eta ^2\Vert E_{\mathbf{v}}(\mathbf{w}^k)\Vert ^2, \end{aligned}$$
(42)

where \(M_i (i=0,1,2,3)\) are constants independent of \(k\).

Proof

Proof of (39): (39) can be easily proved for the given training sample set by using Assumption \((A1)\), (22) and (23).

Proof of (40): By (29) and (39), (40) follows from

$$\begin{aligned}&\frac{1}{2}\sum ^J_{j=1}\beta (\Delta \mathbf{u}^{k}\cdot X_j)^2 =\frac{\beta \eta ^2}{2}\sum ^J_{j=1}(E_{\mathbf{u}}(\mathbf{w}^k)\cdot X_j)^2\nonumber \\&\quad \le \frac{\beta \eta ^2}{2}\cdot J\Vert E_{\mathbf{u}}(\mathbf{w}^k)\Vert ^2\Vert X_j\Vert ^2 \le \frac{\beta J}{2}M_0^2\eta ^2\Vert E_{\mathbf{u}}(\mathbf{w}^k)\Vert ^2\nonumber \\&\quad =M_1\eta ^2\Vert E_{\mathbf{u}}(\mathbf{w}^k)\Vert ^2, \end{aligned}$$
(43)

where \(M_1=\frac{\beta J}{2}M_0^2\).

Proof of (41): (41) follows from (29), (39) and

$$\begin{aligned}&\sum ^J_{j=1}(1-\beta )(y^R_{j,k}-o^R_j)\biggl (\sum ^n_{i=1}(\Delta v^k_i)^2|x_{ji}|\biggr )\nonumber \\&\quad \le 2J(1-\beta )M^2_0\bigl (\sum ^n_{i=1}(\Delta v^k_i)^2\biggr )\nonumber \\&\quad =2J(1-\beta )M^2_0\eta ^2\Vert E_{\mathbf{v}}(\mathbf{w}^k)\Vert ^2\nonumber \\&\quad =M_2\eta ^2\Vert E_{\mathbf{v}}(\mathbf{w}^k)\Vert ^2, \end{aligned}$$
(44)

where \(M_2=2J(1-\beta )M^2_0\).

Proof of (42): By using (29), (39) and the Cauchy-Schwarz inequality, (42) results from

$$\begin{aligned}&\frac{1}{2}\sum ^J_{j=1}(1-\beta )\biggl (\sum ^n_{i=1}\Delta v^k_i(v^{k+1}_i+v^k_i)|x_{ji}|\biggr )^2\nonumber \\&\quad \le \frac{1}{2}(1-\beta )\sum ^J_{j=1}\biggl (\sum ^n_{i=1}(\Delta v^k_i)^2\biggr )\biggl (\sum ^n_{i=1}((v^{k+1}_i+v^k_i)|x_{ji}|)^2\biggr )\nonumber \\&\quad \le J(1-\beta )M^4_0\biggl (\sum ^n_{i=1}(\Delta v^k_i)^2\biggr )\nonumber \\&\quad =J(1-\beta )M^4_0\eta ^2\Vert E_{\mathbf{v}}(\mathbf{w}^k)\Vert ^2\nonumber \\&\quad =M_3\eta ^2\Vert E_{\mathbf{v}}(\mathbf{w}^k)\Vert ^2, \end{aligned}$$
(45)

where \(M_3=J(1-\beta )M^4_0\). This completes the proof of Lemma 3. \(\square \)

Now we are ready to prove Theorem 1.

Proof of Theorem 1

By (22) and (28), we can get

$$\begin{aligned} y^C_{j,k+1}-y^C_{j,k}=\mathbf{u}^{k+1}\cdot X_j-\mathbf{u}^{k}\cdot X_j=X_j\cdot \Delta \mathbf{u}^{k}. \end{aligned}$$
(46)

By (23) and (28), we obtain that

$$\begin{aligned} y^R_{j,k+1}-y^R_{j,k}&= \sum ^n_{i=1}((v^{k+1}_i)^2-(v^k_i)^2)|x_{ji}|\nonumber \\&= \sum ^n_{i=1}(2v^k_i(v^{k+1}_i-v^k_i)+(v^{k+1}_i-v^k_i)^2)|x_{ji}|\nonumber \\&= \sum ^n_{i=1}2v^k_i|x_{ji}|\Delta v^k_i+\sum ^n_{i=1}(\Delta v^k_i)^2|x_{ji}|. \end{aligned}$$
(47)

Using the Taylor’s expansion and Lemmas 2 and 3, for any \(k=0,1,2,\ldots \), we have

$$\begin{aligned} E(\mathbf{w}^{k+1})-E(\mathbf{w}^k)&= \frac{1}{2}\sum ^J_{j=1}\biggl (\beta ((o^C_j-y^C_{j,k+1})^2-(o^C_j-y^C_{j,k})^2)\nonumber \\&+\,(1-\beta )((o^R_j-y^R_{j,k+1})^2-(o^R_j-y^R_{j,k})^2)\biggr )\nonumber \\&= \frac{1}{2}\sum ^J_{j=1}\biggl (\beta (2(o^C_j-y^C_{j,k})-y^C_{j,k+1}+y^C_{j,k})(y^C_{j,k}-y^C_{j,k+1})\nonumber \\&+\,(1-\beta )(2(o^R_j-y^R_{j,k})-y^R_{j,k+1}+y^R_{j,k})(y^R_{j,k}-y^R_{j,k+1})\biggr )\nonumber \\&= \sum ^J_{j=1}\biggl (\beta (y^C_{j,k}-o^C_j)(y^C_{j,k+1}-y^C_{j,k})+\frac{1}{2}\beta (y^C_{j,k+1}-y^C_{j,k})^2\nonumber \\&+\,(1-\beta )(y^R_{j,k}-o^R_j)(y^R_{j,k+1}-y^R_{j,k})\nonumber \\&+\,\frac{1}{2}(1-\beta )(y^R_{j,k+1}-y^R_{j,k})^2\biggr )\nonumber \\&= \sum ^J_{j=1}\beta (y^C_{j,k}-o^C_j)X_j\cdot \Delta \mathbf{u}^{k}+\frac{\beta }{2}\sum ^J_{j=1}(X_j\cdot \Delta \mathbf{u}^{k})^2\nonumber \\&+\sum ^n_{i=1}\biggl (\sum ^J_{j=1}2(1-\beta )(y^R_{j,k}-o^R_j)v^k_i|x_{ji}|\biggr )\Delta v^k_i\nonumber \\&+\,(1-\beta )\sum ^J_{j=1}(y^R_{j,k}-o^R_j)\sum ^n_{i=1}(\Delta v^k_i)^2|x_{ji}|\nonumber \\&+\,\frac{1}{2}(1-\beta )\sum ^J_{j=1}\biggl (\sum ^n_{i=1}\Delta v^k_i(v^(k+1)_i+v^k_i)|x_{ji}|\biggr )^2\nonumber \\&\le -\,\eta \Vert E_{\mathbf{u}}(\mathbf{w}^k)\Vert ^2+M_2\eta ^2\Vert E_{\mathbf{v}}(\mathbf{w}^k)\Vert ^2-\eta \Vert E_{\mathbf{v}}(\mathbf{w}^k)\Vert ^2\nonumber \\&+M_1\eta ^2\Vert E_{\mathbf{u}}(\mathbf{w}^k)\Vert ^2+M_3\eta ^2\Vert E_{\mathbf{v}}(\mathbf{w}^k)\Vert ^2\nonumber \\&\le -\,(\eta -M_4\eta ^2)(\Vert E_{\mathbf{u}}(\mathbf{w}^k)\Vert ^2+\Vert E_{\mathbf{v}}(\mathbf{w}^k)\Vert ^2)\nonumber \\&= -\,(\eta -M_4\eta ^2)\Vert E_{\mathbf{w}}(\mathbf{w}^k)\Vert ^2\nonumber \\&= -\,\gamma \Vert E_{\mathbf{w}}(\mathbf{w}^k)\Vert ^2, \end{aligned}$$
(48)

where \(M_4=max\{M_2,M_1+M_3\}\) and \(\gamma =\eta -M_4\eta ^2\).

We require the learning rate \(\eta \) to satisfy

$$\begin{aligned} 0<\eta <\frac{1}{M_4}. \end{aligned}$$
(49)

This together with (48) and (49) leads to

$$\begin{aligned} E(\mathbf{w}^{k+1})\le E(\mathbf{w}^k),\quad k=0,1,2,\ldots . \end{aligned}$$
(50)

(32) is thus proved.

By (48), we can get

$$\begin{aligned} E(\mathbf{w}^{k+1})&\le E(\mathbf{w}^k)-\gamma \Vert E_{\mathbf{w}}(\mathbf{w}^k)\Vert ^2\\&\le \cdots \le E(\mathbf{w}^0)-\gamma \sum ^k_{t=0}\Vert E_{\mathbf{w}}(\mathbf{w}^t)\Vert ^2 \end{aligned}$$

Since \(E(\mathbf{w}^{k+1})\ge 0\), we have

$$\begin{aligned} \sum ^{k}_{t=0}\Vert E_{\mathbf{w}}(\mathbf{w}^t)\Vert ^2\le \frac{1}{\gamma }E(\mathbf{w}^0) \end{aligned}$$
(51)

Letting \(k\rightarrow \infty \) results in

$$\begin{aligned} \sum ^{\infty }_{t=0}{\Vert E_{\mathbf{w}}(\mathbf{w}^t)\Vert ^2}\le \frac{1}{\gamma }E(\mathbf{w}^0)<\infty \end{aligned}$$
(52)

This immediately gives

$$\begin{aligned} \lim _{k\rightarrow \infty }{\Vert E_{\mathbf{w}}(\mathbf{w}^k)\Vert }=0. \end{aligned}$$
(53)

This proves (33).

According to (A1), the sequence \(\left\{ \mathbf{w}^{m}\right\} \left( m\in \mathbb {N}\right) \) has a subsequence \(\left\{ \mathbf{w}^{m_k}\right\} \) (\(k\in \mathbb {N}\)) that is convergent to, say, \(\mathbf{w}^*\in \varOmega _0\). It follows from (33) and the continuity of \(E_{\mathbf{w}}\left( \mathbf{w}\right) \) that

$$\begin{aligned} \left\| E_{\mathbf{w}}\left( \mathbf{w}^*\right) \right\| =\lim _{k\rightarrow \infty }\left\| E_{\mathbf{w}}\left( \mathbf{w}^{m_k}\right) \right\| =\lim _{m\rightarrow \infty }\left\| E_{\mathbf{w}}\left( \mathbf{w}^{m}\right) \right\| =0. \end{aligned}$$
(54)

This implies that \(\mathbf{w}^*\) is a stationary point of \(E\left( \mathbf{w}\right) \). Hence, \(\left\{ \mathbf{w}^{m}\right\} \) has at least one accumulation point and every accumulation point must be a stationary point.

Next, by reduction to absurdity, we prove that \(\left\{ \mathbf{w}^{m}\right\} \) has precisely one accumulation point. Let us assume the contrary that \(\left\{ \mathbf{w}^{m}\right\} \) has at least two accumulation points \(\overline{\mathbf{w}}\ne \widetilde{\mathbf{w}}\). We write \(\mathbf{w}^m=(w^m_1,w^m_2,\ldots ,w^m_{2n})^T\). It is easy to see from (30) that \(\lim _{m\rightarrow \infty }\left\| \mathbf{w}^{m+1}-\mathbf{w}^m\right\| =0\), or equivalently, \(\lim _{m\rightarrow \infty }|w_i^{m+1}-w_i^m|=0\) for \(i=1,2,\ldots ,2n\). Without loss of generality, we assume that the first components of \(\overline{\mathbf{w}}\) and \(\widetilde{\mathbf{w}}\) do not equal to each other, that is, \(\overline{w}_1\ne \widetilde{w}_1\). For any real number \(\lambda \in (0,1)\), let \(w_1^{\lambda }=\lambda \overline{w}_1+(1-\lambda )\widetilde{w}_1\). By Lemma 1, there exists a subsequence \(\left\{ w_1^{m_{k_1}}\right\} \) of \(\left\{ w_1^{m}\right\} \) converging to \(w_1^{\lambda }\) as \(k_1\rightarrow \infty \). Due to the boundedness of \(\left\{ w_2^{m_{k_1}}\right\} \), there is a convergent subsequence \(\left\{ w_2^{m_{k_2}}\right\} \subset \left\{ w_2^{m_{k_1}}\right\} \). We define \(w_2^{\lambda }=\lim _{k_2\rightarrow \infty }w_2^{m_{k_2}}\). Repeating this procedure, we end up with decreasing subsequences \(\{m_{k_1}\}\supset \{m_{k_2}\}\supset \cdots \supset \{m_{k_{2n}}\}\) with \(w_i^{\lambda }=\lim _{k_i\rightarrow \infty }w_i^{m_{k_i}}\) for each \(i=1,2,\ldots ,2n\). Write \(\mathbf{w}^{\lambda }=(w_1^{\lambda },w_2^{\lambda },\cdots ,w_{2n}^{\lambda })^T\). Then, we see that \(\mathbf{w}^{\lambda }\) is an accumulation point of \(\{\mathbf{w}^m\}\) for any \(\lambda \in (0,1)\). But this means that \(\varOmega _{0,1}\) has interior points, which contradicts \((A3)\). Thus, \(\mathbf{w}^*\) must be a unique accumulation point of \(\{\mathbf{w}^m\}_{m=0}^{\infty }\). This proves (34). And finally, this completes the proof of Theorem 1. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, D., Li, Z., Liu, Y. et al. A Modified Learning Algorithm for Interval Perceptrons with Interval Weights. Neural Process Lett 42, 381–396 (2015). https://doi.org/10.1007/s11063-014-9362-9

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11063-014-9362-9

Keywords

Navigation