Piecewise Polynomial Activation Functions for Feedforward Neural Networks

López-Rubio, Ezequiel; Ortega-Zamorano, Francisco; Domínguez, Enrique; Muñoz-Pérez, José

doi:10.1007/s11063-018-09974-4

Piecewise Polynomial Activation Functions for Feedforward Neural Networks

Published: 10 January 2019

Volume 50, pages 121–147, (2019)
Cite this article

Neural Processing Letters Aims and scope Submit manuscript

874 Accesses
5 Citations
Explore all metrics

Abstract

Since the origins of artificial neural network research, many models of feedforward networks have been proposed. This paper presents an algorithm which adapts the shape of the activation function to the training data, so that it is learned along with the connection weights. The activation function is interpreted as a piecewise polynomial approximation to the distribution function of the argument of the activation function. An online learning procedure is given, and it is formally proved that it makes the training error decrease or stay the same except for extreme cases. Moreover, the model is computationally simpler than standard feedforward networks, so that it is suitable for implementation on FPGAs and microcontrollers. However, our present proposal is limited to two-layer, one-output-neuron architectures due to the lack of differentiability of the learned activation functions with respect to the node locations. Experimental results are provided, which show the performance of the proposal algorithm for classification and regression applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

An Empirical Study of Activation Functions for Function Approximation Tasks

Data-Driven Learning of Feedforward Neural Networks with Different Activation Functions

A New Activation Function Validated on Function Approximation Tasks

References

Agostinelli F, Hoffman M, Sadowski PJ, Baldi P (2014) Learning activation functions to improve deep neural networks. CoRR arXiv:1412.6830, URL http://arxiv.org/abs/1412.6830
Barron AR (1993) Universal approximation bounds for superpositions of a sigmoidal function. IEEE Trans Inf Theor 39(3):930–945
Article MathSciNet MATH Google Scholar
Bartlett PL, Maiorov V, Meir R (1998) Almost linear VC-dimension bounds for piecewise polynomial networks. Neural Comput 10(8):2159–2173
Article Google Scholar
Campo ID, Finker R, Echanobe J, Basterretxea K (2013) Controlled accuracy approximation of sigmoid function for efficient FPGA-based implementation of artificial neurons. Electron Lett 49(25):1598–1600
Article Google Scholar
Castelli I, Trentin E (2012a) Semi-unsupervised weighted maximum-likelihood estimation of joint densities for the co-training of adaptive activation functions. In: Schwenker F, Trentin E (eds) Partially supervised learning: first IAPR TC3 workshop, PSL 2011, Ulm, 15–16 Sept 2011. Revised selected papers, Springer, Berlin, Heidelberg, pp 62–71
Castelli I, Trentin E (2012b) Supervised and unsupervised co-training of adaptive activation functions in neural nets. In: Schwenker F, Trentin E (eds) Partially supervised learning: first IAPR TC3 workshop, PSL 2011, Ulm, 15–16 Sept 2011. Revised selected papers, Springer, Berlin, Heidelberg, pp 52–61
Castelli I, Trentin E (2014) Combination of supervised and unsupervised learning for training the activation functions of neural networks. Pattern Recognit Lett 37(Supplement C):178–191
Article Google Scholar
Chen CT, Chang WD (1996) A feedforward neural network with function shape autotuning. Neural Netw 9(4):627–641
Article MathSciNet Google Scholar
Costarelli D, Vinti G (2016) Max-product neural network and quasi-interpolation operators activated by sigmoidal functions. J Approx Theory 209:1–22
Article MathSciNet MATH Google Scholar
Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Math Control Signals Syst 2(4):303–314
Article MathSciNet MATH Google Scholar
Ertugrul ÖF (2018) A novel type of activation function in artificial neural networks: trained activation function. Neural Netw 99:148–157
Article Google Scholar
Fritsch FN, Carlson RE (1980) Monotone piecewise cubic interpolation. SIAM J Numer Anal 17:238–246
Article MathSciNet MATH Google Scholar
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics (AISTATS 2011)
Goodfellow IJ, Warde-Farley D, Mirza M, Courville AC, Bengio Y (2013) Maxout networks. In: Proceedings of the 30th international conference on machine learning, ICML 2013, Atlanta, 16–21 June 2013, pp 1319–1327
Gulcehre C, Cho K, Pascanu R, Bengio Y (2014) Learned-norm pooling for deep neural networks. Lect Notes Comput Sci 8724:530–546
Article Google Scholar
Hawkins DM (2004) The problem of overfitting. J Chem Inf Comput Sci 44(1):1–12
Article MathSciNet Google Scholar
Hornik K, Stinchcombe M, White H (1989) Multilayer feedforward networks are universal approximators. Neural Netw 2(5):359–366
Article MATH Google Scholar
Huynh HT, Won Y (2009) Extreme learning machine with fuzzy activation function. In: 2009 Fifth international joint conference on INC, IMS and IDC. https://doi.org/10.1109/NCM.2009.206
Kang M, Palmer-Brown D (2005) An adaptive function neural network (ADFUNN) for phrase recognition. In: IEEE international joint conference on neural networks, 2005. IJCNN 2005, vol 1, pp 593–597
Kang M, Palmer-Brown D (2007) A multi-layer adaptive function neural network (MADFUNN) for letter image recognition. In: International joint conference on neural networks, 2007. IJCNN 2007, pp 2817–2822
Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: 30 th International conference on machine learning, vol 28
Microelectronics Center of North Carolina (2016) MCNC benchmarks. http://www.cbl.ncsu.edu:16080/benchmarks/. Accessed 15 Oct 2016
Nair V, Hinton GE (2010) Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the 27th international conference on machine learning (ICML-10), pp 807–814
Ortega-Zamorano F, Jerez J, Juarez G, Perez J, Franco L (2014) High precision fpga implementation of neural network activation functions. In: IEEE symposium on intelligent embedded systems (IES), 2014, pp 55–60. https://doi.org/10.1109/INTELES.2014.7008986
Rumelhart D, Hinton G, Williams R (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536
Article MATH Google Scholar
Rumelhart DE, Hinton GE, Williams RJ (1988) Learning representations by back-propagating errors. In: Anderson JA, Rosenfeld E (eds) Neurocomputing: foundations of research. MIT Press, Cambridge, pp 696–699
Sakurai A (1998) Tight bounds for the VC-dimension of piecewise polynomial networks. In: Advances in neural information processing systems, vol 11, pp 323–329
Springenberg J, Riedmiller M (2013) Improving deep neural networks with probabilistic maxout units, pp 1–10. arXiv:1312.6116
Sunat K, Lursinsap C, Chu CHH (2007) The p-recursive piecewise polynomial sigmoid generators and first-order algorithms for multilayer tanh-like neurons. Neural Comput Appl 16(1):33–47
Article Google Scholar
Trentin E (2001) Networks with trainable amplitude of activation functions. Neural Netw 14(4–5):471–493
Article Google Scholar
University of California Irvine (2016) Machine learning repository. http://archive.ics.uci.edu/ml/. Accessed 17 Oct 2016
Vecci L, Piazza F, Uncini A (1998) Learning and approximation capabilities of adaptive spline activation function neural networks. Neural Netw 11(2):259–270
Article Google Scholar
Wang GT, Li P, Cao JT (2012) Variable activation function extreme learning machine based on residual prediction compensation. Soft Comput 16(9):1477–1484. https://doi.org/10.1007/s00500-012-0817-5
Article Google Scholar
Werbos PJ (1974) Beyond regression: new tools for prediction and analysis in the behavioral sciences. PhD thesis, Harvard University
Zhang M, Fulcher J, Scofield RA (1997) Rainfall estimation using artificial neural network group. Neurocomputing 16(2):97–115
Article Google Scholar

Download references

Acknowledgements

This work is partially supported by the Ministry of Economy and Competitiveness of Spain under Grants TIN2014-53465-R, project name Video surveillance by active search of anomalous events, and TIN2014-57341-R, project name Metaheuristics, holistic intelligence and smart mobility. It is also partially supported by the Autonomous Government of Andalusia (Spain) under project P12-TIC-657, project name Self-organizing systems and robust estimators for video surveillance. All of them include funds from the European Regional Development Fund (ERDF). The authors thankfully acknowledge the computer resources, technical expertise and assistance provided by the SCBI (Supercomputing and Bioinformatics) center of the University of Málaga. They also gratefully acknowledge the support of NVIDIA Corporation with the donation of two Titan X GPUs used for this research.

Author information

Authors and Affiliations

Department of Computer Languages and Computer Science, University of Málaga, Bulevar Louis Pasteur, 35, 29071, Málaga, Spain
Ezequiel López-Rubio, Francisco Ortega-Zamorano, Enrique Domínguez & José Muñoz-Pérez

Authors

Ezequiel López-Rubio
View author publications
You can also search for this author in PubMed Google Scholar
Francisco Ortega-Zamorano
View author publications
You can also search for this author in PubMed Google Scholar
Enrique Domínguez
View author publications
You can also search for this author in PubMed Google Scholar
José Muñoz-Pérez
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ezequiel López-Rubio.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Proof of Proposition 2

Let us assume that $u_{r,i}\in \left[ q_{i,k},q_{i,k+1}\right) $. Therefore,

$$\begin{aligned} g_{i}\left( u_{r,i}\right) =\frac{k+\delta _{i,k}\left( u_{r,i}\right) }{m} \end{aligned}$$

(63)

where $\delta _{i,k}\left( u_{r,i}\right) \in \left[ 0,1\right] $. The exact form of $\delta \left( u_{r,i}\right) $ depends on the order of the polynomials $\gamma $.

If $w_{2,i}\left( y_{r}-z_{r}\right) <0$ and $\left| \frac{2w_{2,i}}{m}\right| <\lambda \left| y_{r}-z_{r}\right| $, then the update Eq. (36) implies that:

$$\begin{aligned} q_{i,k}\le \bar{q}_{i,k+1}\le u_{r,i} \end{aligned}$$

(64)

Therefore,

$$\begin{aligned} \bar{g}_{i}\left( u_{r,i}\right) =\frac{k+1+\bar{\delta }_{i,k+1}\left( u_{r,i}\right) }{m} \end{aligned}$$

(65)

where the bars correspond to the values obtained after executing the update. Moreover, from (4):

$$\begin{aligned} y_{r}-\bar{y}_{r}= & {} \sum _{s=1}^{L}w_{2,s}g_{s}\left( u_{r,s}\right) -\sum _{s=1}^{L}w_{2,s}\bar{g}_{s}\left( u_{r,s}\right) \nonumber \\= & {} w_{2,i}g_{i}\left( u_{r,i}\right) -w_{2,i}\bar{g}_{i}\left( u_{r,i}\right) \end{aligned}$$

(66)

Then from (63), (65) and (66):

$$\begin{aligned} y_{r}-\bar{y}_{r}= & {} w_{2,i}\left( \frac{k+\delta _{i,k}\left( u_{r,i}\right) }{m}-\frac{k+1+\bar{\delta }_{i,k+1}\left( u_{r,i}\right) }{m}\right) \nonumber \\= & {} w_{2,i}\frac{\delta _{i,k}\left( u_{r,i}\right) -\bar{\delta }_{i,k+1}\left( u_{r,i}\right) -1}{m} \end{aligned}$$

(67)

Since $w_{2,i}\left( y_{r}-z_{r}\right) <0$ , there are two possible cases: (a) $\left( w_{2,i}>0\right) \wedge \left( y_{r}-z_{r}<0\right) $; (b) $\left( w_{2,i}<0\right) \wedge \left( y_{r}-z_{r}>0\right) $ .

For case (a), since $\delta _{i,k}\left( u_{r,i}\right) ,\bar{\delta }_{i,k+1}\left( u_{r,i}\right) \in \left[ 0,1\right] $, from (67) we obtain:

$$\begin{aligned} -\frac{2w_{2,i}}{m}\le y_{r}-\bar{y}_{r}\le 0 \end{aligned}$$

(68)

On the other hand, since $y_{r}-z_{r}<0$, $w_{2,i}>0$, $\lambda \in \left( 0,1\right] $ and $\left| \frac{2w_{2,i}}{m}\right| <\lambda \left| y_{r}-z_{r}\right| $ we have:

$$\begin{aligned} y_{r}-z_{r}\le -\frac{2w_{2,i}}{m} \end{aligned}$$

(69)

From (68) and (69):

$$\begin{aligned}&\displaystyle y_{r}-z_{r}\le y_{r}-\bar{y}_{r}\le 0 \end{aligned}$$

(70)

$$\begin{aligned}&\displaystyle -z_{r}\le -\bar{y}_{r}\le -y_{r} \end{aligned}$$

(71)

$$\begin{aligned}&\displaystyle y_{r} \le \bar{y}_{r}\le z_{r} \end{aligned}$$

(72)

$$\begin{aligned}&\displaystyle \bar{E}_{r} \le E_{r} \end{aligned}$$

(73)

That is, the new squared error $\bar{E}_{r}$ is lower than or equal to the old squared error $E_{r}$, as required.

For case (b), since $\delta _{i,k}\left( u_{r,i}\right) ,\bar{\delta }_{i,k-1}\left( u_{r,i}\right) \in \left[ 0,1\right] $, from (67) we obtain:

$$\begin{aligned} 0\le y_{r}-\bar{y}_{r}\le -\frac{2w_{2,i}}{m} \end{aligned}$$

(74)

On the other hand, since $y_{r}-z_{r}>0$, $w_{i}^{2}<0$, $\lambda \in \left( 0,1\right] $ and $\left| \frac{2w_{2,i}}{m}\right| <\lambda \left| y_{r}-z_{r}\right| $ we have:

$$\begin{aligned} -\frac{2w_{2,i}}{m}\le y_{r}-z_{r} \end{aligned}$$

(75)

From (74) and (75):

$$\begin{aligned}&\displaystyle 0\le y_{r}-\bar{y}_{r}\le y_{r}-z_{r} \end{aligned}$$

(76)

$$\begin{aligned}&\displaystyle -y_{r}\le -\bar{y}_{r}\le -z_{r} \end{aligned}$$

(77)

$$\begin{aligned}&\displaystyle z_{r} \le \bar{y}_{r}\le y_{r} \end{aligned}$$

(78)

$$\begin{aligned}&\displaystyle \bar{E}_{r} \le E_{r} \end{aligned}$$

(79)

And again the new squared error $\bar{E}_{r}$ is lower than or equal to the old squared error $E_{r}$, as required. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

López-Rubio, E., Ortega-Zamorano, F., Domínguez, E. et al. Piecewise Polynomial Activation Functions for Feedforward Neural Networks. Neural Process Lett 50, 121–147 (2019). https://doi.org/10.1007/s11063-018-09974-4

Download citation

Published: 10 January 2019
Issue Date: 15 August 2019
DOI: https://doi.org/10.1007/s11063-018-09974-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Piecewise Polynomial Activation Functions for Feedforward Neural Networks

Abstract

Access this article

Similar content being viewed by others

An Empirical Study of Activation Functions for Function Approximation Tasks

Data-Driven Learning of Feedforward Neural Networks with Different Activation Functions

A New Activation Function Validated on Function Approximation Tasks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Proof of Proposition 2

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Piecewise Polynomial Activation Functions for Feedforward Neural Networks

Abstract

Access this article

Similar content being viewed by others

An Empirical Study of Activation Functions for Function Approximation Tasks

Data-Driven Learning of Feedforward Neural Networks with Different Activation Functions

A New Activation Function Validated on Function Approximation Tasks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Proof of Proposition 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation