Abstract
The neuron activation function plays a fundamental role in the complexity of learning. In particular, it is widely known that in recurrent networks the learning of long-term dependencies is problematic due to vanishing (or exploding) gradient and that such problem is directly related to the structure of the employed activation function. In this paper, we study the problem of learning neuron-specific activation functions through kernel-based neural networks (KBNN) and we make the following contributions. First, we give a representation theorem which indicates that the best activation function is a kernel expansion over the training set, then approximated with an opportune set of points modeling 1-D clusters. Second, we extend the idea to recurrent networks, where the expressiveness of KBNN can be an determinant factor to capture long-term dependencies. We provide experimental results on some key experiments which clearly show the effectiveness of KBNN when compared with RNN and LSTM cells.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We use Iverson’s notation: Given a statement A, we set (A) to 1 if A is true and to 0 if A is false.
- 2.
We are assuming here that the values of the functions in \(X_p\) at the boundaries together with the derivatives up to order \(p-1\) are fixed.
- 3.
Here we omit the dependencies of the optimization function from the parameters that defines k.
- 4.
This choice is due to the fact that we want to enforce the sparseness of \(\chi \), i.e. to use the smallest number of terms in expansion 5.
References
Agostinelli, F., Hoffman, M., Sadowski, P., Baldi, P.: Learning activation functions to improve deep neural networks. ArXiv preprint arXiv:1412.6830 (2014)
Bengio, Y., Frasconi, P., Simard, P.: The problem of learning long-term dependencies in recurrent networks. In: IEEE International Conference on Neural Networks, pp. 1183–1188. IEEE (1993)
Castelli, I., Trentin, E.: Combination of supervised and unsupervised learning for training the activation functions of neural networks. Pattern Recogn. Lett. 37, 178–191 (2014)
Eisenach, C., Wang, Z., Liu, H.: Nonparametrically learning activation functions in deep neural nets (2016)
Girosi, F., Jones, M., Poggio, T.: Regularization theory and neural networks architectures. Neural Comput. 7, 219–269 (1995)
Girosi, F., Jones, M., Poggio, T.: Regularization networks and support vector machines. Adv. Comput. Math. 13(1), 1–50 (2000)
Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315–323 (2011)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)
LeCun, Y., Bengio, Y., Hinton, G.E.: Deep learning. Nature 521(7553), 436–444 (2015)
Mhaskar, H., Liao, Q., Poggio, T.A.: Learning real and boolean functions: when is deep better than shallow. ArXiv preprint arXiv:1603.00988 (2016)
Poggio, T., Girosi, F.: Networks for approximation and learning. Proc. IEEE 78(9), 1481–1497 (1990)
Scardapane, S., Van Vaerenbergh, S., Totaro, S., Uncini, A.: Kafnets: kernel-based non-parametric activation functions for neural networks. arXiv preprint arXiv:1707.04035 (2017)
Smola, A.J., Schoelkopf, B., Mueller, K.R.: The connection between regularization operators and support vector kernels. Neural Netw. 11, 637–649 (1998)
Su, Q., Liao, X., Carin, L.: A probabilistic framework for nonlinearities in stochastic neural networks. In: Advances in Neural Information Processing Systems 30, pp. 4486–4495. Curran Associates Inc. (2017)
Turner, A.J., Miller, J.F.: Neuroevolution: evolving heterogeneous artificial neural networks. Evol. Intell. 7(3), 135–154 (2014)
Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. ArXiv preprint arXiv:1409.2329 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Marra, G., Zanca, D., Betti, A., Gori, M. (2019). Learning Activation Functions by Means of Kernel Based Neural Networks. In: Alviano, M., Greco, G., Scarcello, F. (eds) AI*IA 2019 – Advances in Artificial Intelligence. AI*IA 2019. Lecture Notes in Computer Science(), vol 11946. Springer, Cham. https://doi.org/10.1007/978-3-030-35166-3_30
Download citation
DOI: https://doi.org/10.1007/978-3-030-35166-3_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-35165-6
Online ISBN: 978-3-030-35166-3
eBook Packages: Computer ScienceComputer Science (R0)