Skip to main content

Learning Activation Functions by Means of Kernel Based Neural Networks

  • Conference paper
  • First Online:
AI*IA 2019 – Advances in Artificial Intelligence (AI*IA 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11946))

Abstract

The neuron activation function plays a fundamental role in the complexity of learning. In particular, it is widely known that in recurrent networks the learning of long-term dependencies is problematic due to vanishing (or exploding) gradient and that such problem is directly related to the structure of the employed activation function. In this paper, we study the problem of learning neuron-specific activation functions through kernel-based neural networks (KBNN) and we make the following contributions. First, we give a representation theorem which indicates that the best activation function is a kernel expansion over the training set, then approximated with an opportune set of points modeling 1-D clusters. Second, we extend the idea to recurrent networks, where the expressiveness of KBNN can be an determinant factor to capture long-term dependencies. We provide experimental results on some key experiments which clearly show the effectiveness of KBNN when compared with RNN and LSTM cells.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    We use Iverson’s notation: Given a statement A, we set (A) to 1 if A is true and to 0 if A is false.

  2. 2.

    We are assuming here that the values of the functions in \(X_p\) at the boundaries together with the derivatives up to order \(p-1\) are fixed.

  3. 3.

    Here we omit the dependencies of the optimization function from the parameters that defines k.

  4. 4.

    This choice is due to the fact that we want to enforce the sparseness of \(\chi \), i.e. to use the smallest number of terms in expansion 5.

References

  1. Agostinelli, F., Hoffman, M., Sadowski, P., Baldi, P.: Learning activation functions to improve deep neural networks. ArXiv preprint arXiv:1412.6830 (2014)

  2. Bengio, Y., Frasconi, P., Simard, P.: The problem of learning long-term dependencies in recurrent networks. In: IEEE International Conference on Neural Networks, pp. 1183–1188. IEEE (1993)

    Google Scholar 

  3. Castelli, I., Trentin, E.: Combination of supervised and unsupervised learning for training the activation functions of neural networks. Pattern Recogn. Lett. 37, 178–191 (2014)

    Article  Google Scholar 

  4. Eisenach, C., Wang, Z., Liu, H.: Nonparametrically learning activation functions in deep neural nets (2016)

    Google Scholar 

  5. Girosi, F., Jones, M., Poggio, T.: Regularization theory and neural networks architectures. Neural Comput. 7, 219–269 (1995)

    Article  Google Scholar 

  6. Girosi, F., Jones, M., Poggio, T.: Regularization networks and support vector machines. Adv. Comput. Math. 13(1), 1–50 (2000)

    Article  MathSciNet  Google Scholar 

  7. Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pp. 315–323 (2011)

    Google Scholar 

  8. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  9. Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning, pp. 448–456 (2015)

    Google Scholar 

  10. LeCun, Y., Bengio, Y., Hinton, G.E.: Deep learning. Nature 521(7553), 436–444 (2015)

    Article  Google Scholar 

  11. Mhaskar, H., Liao, Q., Poggio, T.A.: Learning real and boolean functions: when is deep better than shallow. ArXiv preprint arXiv:1603.00988 (2016)

  12. Poggio, T., Girosi, F.: Networks for approximation and learning. Proc. IEEE 78(9), 1481–1497 (1990)

    Article  Google Scholar 

  13. Scardapane, S., Van Vaerenbergh, S., Totaro, S., Uncini, A.: Kafnets: kernel-based non-parametric activation functions for neural networks. arXiv preprint arXiv:1707.04035 (2017)

  14. Smola, A.J., Schoelkopf, B., Mueller, K.R.: The connection between regularization operators and support vector kernels. Neural Netw. 11, 637–649 (1998)

    Article  Google Scholar 

  15. Su, Q., Liao, X., Carin, L.: A probabilistic framework for nonlinearities in stochastic neural networks. In: Advances in Neural Information Processing Systems 30, pp. 4486–4495. Curran Associates Inc. (2017)

    Google Scholar 

  16. Turner, A.J., Miller, J.F.: Neuroevolution: evolving heterogeneous artificial neural networks. Evol. Intell. 7(3), 135–154 (2014)

    Article  Google Scholar 

  17. Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. ArXiv preprint arXiv:1409.2329 (2014)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Giuseppe Marra .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Marra, G., Zanca, D., Betti, A., Gori, M. (2019). Learning Activation Functions by Means of Kernel Based Neural Networks. In: Alviano, M., Greco, G., Scarcello, F. (eds) AI*IA 2019 – Advances in Artificial Intelligence. AI*IA 2019. Lecture Notes in Computer Science(), vol 11946. Springer, Cham. https://doi.org/10.1007/978-3-030-35166-3_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-35166-3_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-35165-6

  • Online ISBN: 978-3-030-35166-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics