Abstract
In many applications, it is natural to use interval data to describe various kinds of uncertainties. This paper is concerned with a one-layer interval perceptron with the weights and the outputs being intervals and the inputs being real numbers. In the original learning method for this interval perceptron, an absolute value function is applied for newly learned radii of the interval weights, so as to force the radii to be positive. This approach seems unnatural somehow, and might cause oscillation in the learning procedure as indicated in our numerical experiments. In this paper, a modified learning method is proposed for this one-layer interval perceptron. We do not use the function of the absolute value, and instead, we replace, in the error function, the radius of each interval weight by a quadratic term. This simple trick does not cause any additional computational work for the learning procedure, but it brings about the following three advantages: First, the radii of the intervals of the weights are guaranteed to be positive during the learning procedure without the help of the absolute value function. Secondly, the oscillation mentioned above is eliminated and the convergence of the learning procedure is improved, as indicated by our numerical experiments. Finally, a by-product is that the convergence analysis of the learning procedure is now an easy job, while the analysis for the original learning method is at least difficult, if not impossible, due to the non-smoothness of the absolute value function involved.
Similar content being viewed by others
References
Han M, Fan JC, Wang J (2011) A dynamic feedforward neural network based on Gaussian particle swarm optimization and its application for predictive control. IEEE Trans Neural Netw 22:1457–1468
Ren XM, Lv XH (2011) Identification of extended Hammerstein systems using dynamic self-optimizing neural networks. IEEE Trans Neural Netw 22:1169–1179
Wade JJ, McDaid LJ, Santos JA, Sayers HM (2010) SWAT: a spiking neural network training algorithm for classification problems. IEEE Trans Neural Netw 21:1817–1830
Zhang NM (2011) Momentum algorithms in neural networks and the applications in numerical algebra, AIMSEC, pp 2192–2195
Heshmaty B, Kandel A (1985) Fuzzy linear regression and its applications to forecasting in uncertain environment. Fuzzy Sets Syst 15:159C191
Kaneyoshi M, Tanaka H, Kamei M, Farata H (1990) New system identification technique using fuzzy regression analysis. In: International symposium on uncertainty modeling and analysis, College Park, MD, USA, pp 528C533
Hashiyama T, Furuhash T, Uchikawa Y (1992) An interval fuzzy model using a fuzzy neural network. In: IEEE international conference neural networks, Baltimore, MD, USA, pp 745–750
Ishibuchi H, Tanaka H (1992) Fuzzy regression analysis using neural networks. Fuzzy Sets Syst 50:257–265
Ishibuchi H, Tanaka H, Okada H (1993) An architecture of neural networks with interval weights and its application to fuzzy regression analysis. Fuzzy Sets Syst 57:27–39
Ishibuchi H, Nii M (2001) Fuzzy regression using asymmetric fuzzy coefficients and fuzzified neural networks. Fuzzy Sets Syst 119:273–290
Hernandez CA, Espf J, Nakayama K, Fernandez M (1993) Interval arithmetic backpropagation. In: Proceeding of 1993 international joint conference on neural network, pp 375–378,
Drago GP, Ridella S (1998) Pruning with interval arithmetic perceptron. Neurocomputing 18:229–246
Drago GP, Ridella S (1999) Possibility and necessity pattern classification using an interval arithmetic perceptron. Neural Comput Appl 8:40–52
Roque AMS, Mate C, Arroyo J, Sarabia A (2007) iMLP applying multi-layer perceptrons to interval-valued data. Neural Process Lett 25:157–169
Shao HM, Zheng GF (2011) Convergence analysis of a back-propagation algorithm with adaptive momentum. Neurocomputing 74:749–752
Wang J, Yang J, Wu W (2011) Convergence of cyclic and almost-cyclic learning with momentum for feedforward neural networks. IEEE Trans Neural Netw 22:1297–1306
Xu ZB, Zhang R, Jing WF (2009) When on-line BP training converges. IEEE Trans Neural Netw 20:1529–1539
Moore RE (1966) Interval analysis. Prentice-Hall, Englewood Cliffs, NJ
Sunaga T (1958) Theory of an interval algebra and its applications to numerical analysis. RAAG Mem 2:29–46
Wu W, Wang J, Cheng MS, Li ZX (2011) Convergence analysis of online gradient method for BP neural networks. Neural Netw 24:91–98
Xu DP, Zhang HS, Liu LJ (2010) Convergence analysis of three classes of split-complex gradient algorithms for complex-valued recurrent neural networks. Neural Comput 22:2655–2677
Yao XF, Wang SD, Dong SQ (2004) Approximation of interval models by neural networks. In: Proceeding of 2004 IEEE international joint conference on neural networks, vol. 2. pp 1027–1032
Acknowledgments
This work is supported by the National Natural Science Foundation of China (11171367) and the Fundamental Research Funds for the Central Universities of China.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
First we borrow an important lemma from [20]:
Lemma 1
Let \(\left\{ b_m\right\} \) be a bounded sequence satisfying \(\lim _{m\rightarrow \infty }(b_{m+1}-b_m)=0\). Write \(\gamma _1=\lim _{n\rightarrow \infty }\inf _{m>n} b_m,\,\gamma _2=\lim _{n\rightarrow \infty }\sup _{m>n} b_m\) and \(S=\{a\in {\mathbb {R}}:\) There exists a subsequence \(\{b_{i_k}\}\) of \(\{b_m\}\) such that \(b_{i_k}\rightarrow a \) as \(k\rightarrow \infty \}\). Then we have
Some useful estimates are gathered in the next two lemmas.
Lemma 2
For any \(k=0,1,2,\ldots \) and \(1\le j\le J\), we have:
Proof
These equalities are direct consequences of (24), (25) and (29). \(\square \)
Lemma 3
Suppose Assumption \((A1)\) holds, for any \(k=0,1,2,\ldots \) and \(1\le j\le J\), then we have:
where \(M_i (i=0,1,2,3)\) are constants independent of \(k\).
Proof
Proof of (39): (39) can be easily proved for the given training sample set by using Assumption \((A1)\), (22) and (23).
Proof of (40): By (29) and (39), (40) follows from
where \(M_1=\frac{\beta J}{2}M_0^2\).
Proof of (41): (41) follows from (29), (39) and
where \(M_2=2J(1-\beta )M^2_0\).
Proof of (42): By using (29), (39) and the Cauchy-Schwarz inequality, (42) results from
where \(M_3=J(1-\beta )M^4_0\). This completes the proof of Lemma 3. \(\square \)
Now we are ready to prove Theorem 1.
Proof of Theorem 1
By (23) and (28), we obtain that
Using the Taylor’s expansion and Lemmas 2 and 3, for any \(k=0,1,2,\ldots \), we have
where \(M_4=max\{M_2,M_1+M_3\}\) and \(\gamma =\eta -M_4\eta ^2\).
We require the learning rate \(\eta \) to satisfy
This together with (48) and (49) leads to
(32) is thus proved.
By (48), we can get
Since \(E(\mathbf{w}^{k+1})\ge 0\), we have
Letting \(k\rightarrow \infty \) results in
This immediately gives
This proves (33).
According to (A1), the sequence \(\left\{ \mathbf{w}^{m}\right\} \left( m\in \mathbb {N}\right) \) has a subsequence \(\left\{ \mathbf{w}^{m_k}\right\} \) (\(k\in \mathbb {N}\)) that is convergent to, say, \(\mathbf{w}^*\in \varOmega _0\). It follows from (33) and the continuity of \(E_{\mathbf{w}}\left( \mathbf{w}\right) \) that
This implies that \(\mathbf{w}^*\) is a stationary point of \(E\left( \mathbf{w}\right) \). Hence, \(\left\{ \mathbf{w}^{m}\right\} \) has at least one accumulation point and every accumulation point must be a stationary point.
Next, by reduction to absurdity, we prove that \(\left\{ \mathbf{w}^{m}\right\} \) has precisely one accumulation point. Let us assume the contrary that \(\left\{ \mathbf{w}^{m}\right\} \) has at least two accumulation points \(\overline{\mathbf{w}}\ne \widetilde{\mathbf{w}}\). We write \(\mathbf{w}^m=(w^m_1,w^m_2,\ldots ,w^m_{2n})^T\). It is easy to see from (30) that \(\lim _{m\rightarrow \infty }\left\| \mathbf{w}^{m+1}-\mathbf{w}^m\right\| =0\), or equivalently, \(\lim _{m\rightarrow \infty }|w_i^{m+1}-w_i^m|=0\) for \(i=1,2,\ldots ,2n\). Without loss of generality, we assume that the first components of \(\overline{\mathbf{w}}\) and \(\widetilde{\mathbf{w}}\) do not equal to each other, that is, \(\overline{w}_1\ne \widetilde{w}_1\). For any real number \(\lambda \in (0,1)\), let \(w_1^{\lambda }=\lambda \overline{w}_1+(1-\lambda )\widetilde{w}_1\). By Lemma 1, there exists a subsequence \(\left\{ w_1^{m_{k_1}}\right\} \) of \(\left\{ w_1^{m}\right\} \) converging to \(w_1^{\lambda }\) as \(k_1\rightarrow \infty \). Due to the boundedness of \(\left\{ w_2^{m_{k_1}}\right\} \), there is a convergent subsequence \(\left\{ w_2^{m_{k_2}}\right\} \subset \left\{ w_2^{m_{k_1}}\right\} \). We define \(w_2^{\lambda }=\lim _{k_2\rightarrow \infty }w_2^{m_{k_2}}\). Repeating this procedure, we end up with decreasing subsequences \(\{m_{k_1}\}\supset \{m_{k_2}\}\supset \cdots \supset \{m_{k_{2n}}\}\) with \(w_i^{\lambda }=\lim _{k_i\rightarrow \infty }w_i^{m_{k_i}}\) for each \(i=1,2,\ldots ,2n\). Write \(\mathbf{w}^{\lambda }=(w_1^{\lambda },w_2^{\lambda },\cdots ,w_{2n}^{\lambda })^T\). Then, we see that \(\mathbf{w}^{\lambda }\) is an accumulation point of \(\{\mathbf{w}^m\}\) for any \(\lambda \in (0,1)\). But this means that \(\varOmega _{0,1}\) has interior points, which contradicts \((A3)\). Thus, \(\mathbf{w}^*\) must be a unique accumulation point of \(\{\mathbf{w}^m\}_{m=0}^{\infty }\). This proves (34). And finally, this completes the proof of Theorem 1. \(\square \)
Rights and permissions
About this article
Cite this article
Yang, D., Li, Z., Liu, Y. et al. A Modified Learning Algorithm for Interval Perceptrons with Interval Weights. Neural Process Lett 42, 381–396 (2015). https://doi.org/10.1007/s11063-014-9362-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-014-9362-9