Learning Activation Functions by Means of Kernel Based Neural Networks

Marra, Giuseppe; Zanca, Dario; Betti, Alessandro; Gori, Marco

doi:10.1007/978-3-030-35166-3_30

Giuseppe Marra^11,12,
Dario Zanca¹²,
Alessandro Betti^11,12 &
…
Marco Gori¹²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11946))

Included in the following conference series:

International Conference of the Italian Association for Artificial Intelligence

1223 Accesses
1 Citations
1 Altmetric

Abstract

The neuron activation function plays a fundamental role in the complexity of learning. In particular, it is widely known that in recurrent networks the learning of long-term dependencies is problematic due to vanishing (or exploding) gradient and that such problem is directly related to the structure of the employed activation function. In this paper, we study the problem of learning neuron-specific activation functions through kernel-based neural networks (KBNN) and we make the following contributions. First, we give a representation theorem which indicates that the best activation function is a kernel expansion over the training set, then approximated with an opportune set of points modeling 1-D clusters. Second, we extend the idea to recurrent networks, where the expressiveness of KBNN can be an determinant factor to capture long-term dependencies. We provide experimental results on some key experiments which clearly show the effectiveness of KBNN when compared with RNN and LSTM cells.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We use Iverson’s notation: Given a statement A, we set (A) to 1 if A is true and to 0 if A is false.
2.
We are assuming here that the values of the functions in \(X_p\) at the boundaries together with the derivatives up to order \(p-1\) are fixed.
3.
Here we omit the dependencies of the optimization function from the parameters that defines k.
4.
This choice is due to the fact that we want to enforce the sparseness of \(\chi \), i.e. to use the smallest number of terms in expansion 5.

References

Agostinelli, F., Hoffman, M., Sadowski, P., Baldi, P.: Learning activation functions to improve deep neural networks. ArXiv preprint arXiv:1412.6830 (2014)
Bengio, Y., Frasconi, P., Simard, P.: The problem of learning long-term dependencies in recurrent networks. In: IEEE International Conference on Neural Networks, pp. 1183–1188. IEEE (1993)
Google Scholar
Castelli, I., Trentin, E.: Combination of supervised and unsupervised learning for training the activation functions of neural networks. Pattern Recogn. Lett. 37, 178–191 (2014)
Article Google Scholar
Eisenach, C., Wang, Z., Liu, H.: Nonparametrically learning activation functions in deep neural nets (2016)
Google Scholar
Girosi, F., Jones, M., Poggio, T.: Regularization theory and neural networks architectures. Neural Comput. 7, 219–269 (1995)
Article Google Scholar
Girosi, F., Jones, M., Poggio, T.: Regularization networks and support vector machines. Adv. Comput. Math. 13(1), 1–50 (2000)
Article MathSciNet Google Scholar
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315–323 (2011)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)
Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.E.: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
Mhaskar, H., Liao, Q., Poggio, T.A.: Learning real and boolean functions: when is deep better than shallow. ArXiv preprint arXiv:1603.00988 (2016)
Poggio, T., Girosi, F.: Networks for approximation and learning. Proc. IEEE 78(9), 1481–1497 (1990)
Article Google Scholar
Scardapane, S., Van Vaerenbergh, S., Totaro, S., Uncini, A.: Kafnets: kernel-based non-parametric activation functions for neural networks. arXiv preprint arXiv:1707.04035 (2017)
Smola, A.J., Schoelkopf, B., Mueller, K.R.: The connection between regularization operators and support vector kernels. Neural Netw. 11, 637–649 (1998)
Article Google Scholar
Su, Q., Liao, X., Carin, L.: A probabilistic framework for nonlinearities in stochastic neural networks. In: Advances in Neural Information Processing Systems 30, pp. 4486–4495. Curran Associates Inc. (2017)
Google Scholar
Turner, A.J., Miller, J.F.: Neuroevolution: evolving heterogeneous artificial neural networks. Evol. Intell. 7(3), 135–154 (2014)
Article Google Scholar
Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. ArXiv preprint arXiv:1409.2329 (2014)

Download references

Author information

Authors and Affiliations

DINFO, University of Firenze, Florence, Italy
Giuseppe Marra & Alessandro Betti
DIISM, University of Siena, Siena, Italy
Giuseppe Marra, Dario Zanca, Alessandro Betti & Marco Gori

Authors

Giuseppe Marra
View author publications
You can also search for this author in PubMed Google Scholar
Dario Zanca
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Betti
View author publications
You can also search for this author in PubMed Google Scholar
Marco Gori
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Giuseppe Marra .

Editor information

Editors and Affiliations

University of Calabria, Rende, Italy
Mario Alviano
University of Calabria, Rende, Italy
Gianluigi Greco
University of Calabria, Rende, Italy
Francesco Scarcello

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Marra, G., Zanca, D., Betti, A., Gori, M. (2019). Learning Activation Functions by Means of Kernel Based Neural Networks. In: Alviano, M., Greco, G., Scarcello, F. (eds) AI*IA 2019 – Advances in Artificial Intelligence. AI*IA 2019. Lecture Notes in Computer Science(), vol 11946. Springer, Cham. https://doi.org/10.1007/978-3-030-35166-3_30

Download citation

DOI: https://doi.org/10.1007/978-3-030-35166-3_30
Published: 12 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-35165-6
Online ISBN: 978-3-030-35166-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics