Abstract
The computational power of a neuron lies in the spatial grouping of synapses belonging to any dendrite tree. Attempts to give a mathematical representation to the grouping process of synapses continue to be a fascinating field of work for researchers in the neural network community. In the literature, we generally find neuron models that comprise of summation, radial basis or product aggregation function, as basic unit of feed-forward multilayer neural network. All these models and their corresponding networks have their own merits and demerits. The MLP constructs global approximation to input–output mapping, while a RBF network, using exponentially decaying localized non-linearity, constructs local approximation to input–output mapping. In this paper, we propose two compensatory type novel aggregation functions for artificial neurons. They produce net potential as linear or non-linear composition of basic summation and radial basis operations over a set of input signals. The neuron models based on these aggregation functions ensure faster convergence, better training and prediction accuracy. The learning and generalization capabilities of these neurons have been tested over various classification and functional mapping problems. These neurons have also shown excellent generalization ability over the two-dimensional transformations.
Similar content being viewed by others
Notes
In this paper, a neuron with only summation aggregation function is referred as ‘conventional’ neuron and network of these neurons as ‘MLP’.
\(\Re\) and \(\Im\) stands for real and imaginary components of complex value.
References
Aizenberg I, Moraga C (2007) Multilayer feedforward neural network based on multi-valued neurons (MLMVN) and a back-propagation learning algorithm. Soft Comput 11(2):169–183
Aizenberg NN, Ivaskiv YL, Pospelov DA (1971) About one generalization of the threshold function Doklady Akademii Nauk SSSR (The Reports of the Academy of Sciences of the USSR) 196(6):1287–1290 (in Russian)
Aizenberg I, Aizenberg N, Vandewalle J (2000) Multi-valued and universal binary neurons: theory, learning, applications. Kluwer, Boston
Blake CL, Merz CJ (1998) UCI repository of machine learning database. http://www.ics.uci.edu/mealrn/MLRepository.html, Department of Information and Computer Science, University of California
Brown JW, Churchill RV (2003) Complex variables and applications, VIIth edn. Mc Graw-Hill, New York
Chaturvedi DK, Satsangi PS, Kalra PK (1999) New neuron models for simulating rotating electrical machines and load forecasting problems. Electr Power Syst Res 52:123–131
Chen X, Li S (2005) A modified error function for the complex-value back-propagation neural network. Neural Inf Process 8(1)
Foggel BD (1991) An information criterion for optimal neural network selection. IEEE Trans Neural Netw 2:490–497
Georgiou GM, Koutsougeras C (1992) Complex domain back-propagation. IEEE Trans Circuits Syst II 39(5)
Gorman RP, Sejnowski TJ (1988) Analysis of hidden units in a layered network trained to classify sonar targets. Neural Netw 1:75–89
Gupta MM, Homma N (2003) Static and dynamic neural networks, from fundamentals to advanced theory. Wiley, New York
Hirose A (2006) Complex-valued neural networks. Springer, New York
Homma N, Gupta MM (2002) A general second-order neural unit. Bull Coll Med Sci Tohoku Univ 11(1):1–6
Jianping D, Sundararajan N, Saratchandran P (2002) Communication channel equalization using complex-valued minimal radial basis function neural networks. IEEE Trans Neural Netw 13(3):687–696
Koch C (1999) Biophysics of computation: information processing in single neurons. Oxford University Press, Oxford
Lee CC, Chung PC, Tsai JR, Chang CI (1999) Robust radial basis function neural network. IEEE Trans Syst Man Cybern B Cybern 29(6)
Leung H, Haykin S (1991) The complex backpropagation algorithm. IEEE Trans Signal Process 39(9)
Mel BW (1995) Information processing in dendritic tree. Neural Comput 6:1013–1085
Nitta T (1997) An extension of the back-propagation algorithm to complex numbers. Neural Netw 10(8):1391–1415
Nitta T (2000) An analysis of the fundamental structure of complex-valued neurons. Neural Process Lett 12:239–246
Piazza F, Benvenuto N (1992) On the complex back-propagation algorithm. IEEE Trans Signal Process 40(4):967–969
Author information
Authors and Affiliations
Corresponding author
Appendix: Derivation of update rule
Appendix: Derivation of update rule
In a multilayer network, we consider a commonly used three-layer structure (L–M–N). First layer has L inputs, second layer has M proposed neurons described in Sect. 2 and the output layer consists of N conventional neurons. All weights, threshold, and input–output signals are complex numbers. By convention \(w_{lm}\) is the weight that connects \(l{\rm th}\) neuron to \(m{\rm th}\) neuron. \(\eta \in [0,1]\) is the learning rate and \(f^{\prime}\) is derivative of function f. Let \(Z=[ z_1, z_2, \ldots, z_L]\) be the vector of input signals. \(Z^{\rm T}\) is transpose of vector Z and \(\overline{z}\) is the complex conjugate of z. \(W^S_m = [ w^S_{1m}, w^S_{2m}, \ldots, w^S_{Lm}]\) is a vector of weights from inputs \((l=1,\ldots, L)\) to summation part of \(m{\rm th}\) SRS or SRP neuron and \(W^{{\rm RB}}_m=[w^{{\rm RB}}_{1m}, w^{{\rm RB}}_{2m}, \ldots, w^{{\rm RB}}_{Lm}]\) is a vector of weights from inputs to radial basis part of \(m{\rm th}\) SRS or SRP neuron. \(w_0\) is a bias and \(z_0\) is the bias input. From Eq. 7, the output of each neuron \((m=1,\ldots, M)\) in hidden layer can be expressed as:
Let \(V_m^\sigma\) be the net potential of SRS neuron in hidden layer then from Eqs. 4 and 5:
Let \(V_m^\pi\) be the net potential of SRP neuron in hidden layer then from Eqs. 1 and 2:
The net internal potential of SRP neuron in Eq. 18 can also be expressed term wise as follows:
The output of a neuron (\(n=1,\ldots, N\)) in output layer can be given by:
Let YD be the desired output. The output error consists of its real and imaginary parts and defined as:
where, \(\Re(e_n)=\Re(YD_n)-\Re(Y_n)\) and \(\Im(e_n)=\Im(YD_n)-\Im(Y_n).\) The real-valued cost function (MSE) can be given as:
The complex-BP algorithm minimizes cost function E by recursively altering the weight coefficients based on gradient descent, given by:
where the gradient \(\bigtriangledown_{w}E\) is derived with respect to both real and imaginary parts of complex weights.
For any weight in output layer, \(w = w_{mn}\)
and
The weight update equation for weights between input and hidden layer can be obtained by calculating the gradient of cost function with respect to these weights. For \(w=w_{lm},\) following the chain rule of derivation:
-
I
In a three-layer network, the difference is at hidden layer where we can use either of proposed neuron. When SRS neuron is in hidden layer then \(V_m=V_m^\sigma\) and Eq. 28 can be rewritten as:
$$ \begin{aligned} {\frac{-\partial E}{\partial \Re(w_{lm}^S)}} =&\;{\frac{1}{N}}\left\{{\frac{\partial \left(\Re\left(V_m^\sigma \right) \right)}{\partial \Re\left(w_{lm}^S \right)}}\Re\left(\Gamma_{m} ^{\sigma} \right)+{\frac{\partial \left(\Im\left(V_m^\sigma \right) \right)} {\partial \Re\left(w_{lm}^S \right)}}\Im\left(\Gamma_{m}^{\sigma} \right) \right\}\\ =&\, {\frac{1}{N}}\left\{(\Re(H_m) \Re(z_l)-\Im(H_m) \Im(z_l)) \Re\left(\Gamma_{m}^ {\sigma}\right)+(\Re(H_m) \Im(z_l)+\Im(H_m) \Re(z_l)) \Im\left(\Gamma_{m}^{\sigma} \right) \right\} \end{aligned} $$(29)Similarly
$$ {\frac{-\partial E}{\partial \Im\left(w_{lm}^S\right)}}={\frac{1}{N}}\left\{ (\Re(H_m) (-\Im(z_l))-\Im(H_m)\Re(z_l)) \Re\left(\Gamma_{m}^{\sigma}\right)+(\Re(H_m) \Re(z_l)-\Im(H_m) \Im(z_l)) \Im\left(\Gamma_{m}^{\sigma}\right) \right\} $$(30)Now substituting Eqs. 29 and 30 in Eq. 24
$$ \Updelta w^{S}_{lm}={\frac{\eta}{N}}\left\{ (\Re(z_l)-j \Im(z_l)) (\Re(H_m)-j \Im(H_m)) (\Re(\Gamma_{m}^{\sigma}) +j \Im(\Gamma_{m}^{\sigma}))\right\}={\frac{\eta}{N}} \overline{z_l} \overline{H}_{m} \Gamma_{m}^{\sigma} $$(31)Following the same procedure update equation for other learning parameters can be obtained:
$$ \Updelta w^{{\rm RB}}_{lm}={\frac{2\eta} {N}} \exp\left(-\left\|Z - W^{{\rm RB}}_m\right\|^2\right) \left(z_l - w^{{\rm RB}}_{lm}\right) \left\{ \Re\left(\Gamma_{m}^{\sigma}\right) \Re(K_m) + \Im\left(\Gamma_{m}^{\sigma}\right)\Im(K_m) \right\} $$(32)$$ \Updelta H_m={\frac{\eta}{N}} \overline{\left(W^{S}_{m} Z^{\rm T}\right)} \Gamma_{m}^{\sigma} \quad \Updelta K_m={\frac{\eta}{N}} \exp\left(-\left\|Z - W^{{\rm RB}}_m\right\|^2\right) \Gamma_{m}^{\sigma} \quad \Updelta w_{0m}={\frac{\eta}{N}} \overline{z_0} \Gamma_{m}^{\sigma} $$(33) -
II
When SRP neuron is in hidden layer then \(V_m=V_m^\pi\) and Eq. 28 can be rewritten as:
$$ \begin{aligned} {\frac{-\partial E}{\partial \Re\left(w_{lm}^S\right)}}=&\;{\frac{1}{N}}\left\{ {\frac{\partial\left(\Re\left(V_{m}^{\pi}\right)\right)} {\partial\Re\left(w_{lm}^S\right)}}\Re\left(\Gamma_{m}^{\pi}\right)+{\frac{\partial\left(\Im\left(V_m^\pi\right)\right)}{\partial\Re\left(w_{lm}^S\right)}} \Im\left(\Gamma_{m}^{\pi}\right) \right\}\\=&\; {\frac{1}{N}} \Re\left(\Gamma_{m}^{\pi}\right)\left\{\left(\Re(H_m)+\Re(H_m)\Re\left(V^{\pi_{2}}_m\right)-\Im(H_m)\Im\left(V^{\pi_{2}}_m\right)\right)\Re(z_l) -\left(\Im(H_m)+\Im(H_m)\Re\left(V^{\pi_{2}}_m\right)+\Re(H_m)\Im\left(V^{\pi_{2}}_m\right)\right)\Im(z_l)\right\}\\&\quad+\; \Im\left(\Gamma_{m}^{\pi}\right)\left\{\left(\Im(H_m)+\Im(H_m)\Re\left(V^{\pi_{2}}_m\right)+\Re(H_m)\Im(V^{\pi_{2}}_m)\right)\Re(z_l) +\left(\Re(H_m)+\Re(H_m)\Re\left(V^{\pi_{2}}_m\right)-\Im(H_m)\Im(V^{\pi_{2}}_m)\right)\Im(z_l)\right\}\end{aligned} $$(34)Similarly
$$ \begin{aligned} {\frac{-\partial E}{\partial \Im(w_{lm}^S)}}=&\;{\frac{1}{N}} \left\{{\frac{\partial (\Re(V_m^\pi))}{\partial \Im(w_{lm}^S)}} \Re(\Gamma_{m}^{\pi})+{\frac{\partial (\Im(V_m^\pi))}{\partial \Im(w_{lm}^S)}} \Im(\Gamma_{m}^{\pi}) \right\}\\ =&\;{\frac{1}{N}} \Re(\Gamma_{m}^{\pi}) \left\{-(\Re(H_m)+\Re(H_m)\Re(V^{\pi_{2}}_m) -\Im(H_m)\Im(V^{\pi_{2}}_m))\Im(z_l) - (\Im(H_m)+\Im(H_m)\Re(V^{\pi_{2}}_m) +\Re(H_m)\Im(V^{\pi_{2}}_m))\Re(z_l)\right\}\\ &+\; \Im(\Gamma_{m}^{\pi})\left\{(\Re(H_m)+\Re(H_m)\Re(V^{\pi_{2}}_m) -\Im(H_m)\Im(V^{\pi_{2}}_m))\Re(z_l)-(\Im(H_m)+\Im(H_m)\Re(V^{\pi_{2}}_m) +\Re(H_m)\Im(V^{\pi_{2}}_m))\Im(z_l)\right\} \end{aligned} $$(35)
Now substituting Eqs. 34 and 35 in Eq. 24
Following the same procedure update equation for other learning parameters can be obtained:
where
and
This completes the derivation.
Rights and permissions
About this article
Cite this article
Tripathi, B.K., Kalra, P.K. The novel aggregation function-based neuron models in complex domain. Soft Comput 14, 1069–1081 (2010). https://doi.org/10.1007/s00500-009-0502-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-009-0502-5