EIS - Efficient and Trainable Activation Functions for Better Accuracy and Performance

Biswas, Koushik; Kumar, Sandeep; Banerjee, Shilpak; Pandey, Ashish Kumar

doi:10.1007/978-3-030-86340-1_21

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12892))

Included in the following conference series:

International Conference on Artificial Neural Networks

2518 Accesses

Abstract

Activation functions play a pivotal role in function learning using neural networks. The non-linearity in a neural network is achieved by repeated use of the activation function. Over the years, numerous activation functions have been proposed to improve neural network performance in several deep learning tasks. Basic functions like ReLU, Sigmoid, Tanh, or Softplus have been favorites among the deep learning community because of their simplicity. In recent years, several novel activation functions arising from these basic functions have been proposed, which have improved accuracy in some challenging datasets. We propose three activation functions with trainable parameters, namely EIS-1, EIS-2, and EIS-3. We show these three activation functions outperform widely used activation functions on some well-known datasets and models. For example, EIS-1, EIS-2, and EIS-3 beats ReLU by 5.55%, 5.32%, and 5.60% on ResNet V2 34, 5.27%, 5.24%, and 5.76% on VGG 16, 2.02%, 1.93%, and 2.01% on Wide-Res-Net 28-10, 2.30%, 2.11%, and 2.50% on Shufflenet V2 in CIFAR100 dataset while 1.40%, 1.27%, and 1.45% on ResNet V2 34, 1.21%, 1.09%, and 1.17% on VGG 16, 1.10%, 1.04%, and 1.16% on Wide-Res-Net 28-10, 1.85%, 1.60%, and 1.67% on Shufflenet V2 in CIFAR10 dataset respectively. The proposed functions also perform better than traditional activation functions like ReLU, Leaky ReLU, Swish, etc. in Object detection, Semantic segmentation, and Machine Translation problems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

SAU: Smooth Activation Function Using Convolution with Approximate Identities

ErfReLU: adaptive activation function for deep neural network

Article 29 May 2024

αSechSig and αTanhSig: two novel non-monotonic activation functions

Article 06 October 2023

References

Baker, B., Gupta, O., Naik, N., Raskar, R.: Designing neural network architectures using reinforcement learning (2016)
Google Scholar
Biswas, K., Kumar, S., Banerjee, S., Pandey, A.K.: TanhSoft - a family of activation functions combining Tanh and Softplus (2020)
Google Scholar
Carlile, B., Delamarter, G., Kinney, P., Marti, A., Whitney, B.: Improving deep learning by inverse square root linear units (ISRLUs) (2017)
Google Scholar
Chollet, F.: Keras (2015). https://github.com/fchollet/keras
Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (ELUs) (2015)
Google Scholar
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding (2016)
Google Scholar
Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning (2017)
Google Scholar
Everingham, M., Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Article Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Teh, Y.W., Titterington, M. (eds.) Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings, Chia Laguna Resort, Sardinia, Italy, 13–15 May 2010. Proceedings of Machine Learning Research, vol. 9, pp. 249–256 (2010) http://proceedings.mlr.press/v9/glorot10a.html
Zheng, H., Yang, Z., Liu, W., Liang, J., Li, Y.: Improving deep neural networks using Softplus units. In: 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1–4 (2015)
Google Scholar
Hasanpour, S.H., Rouhani, M., Fayyaz, M., Sabokrou, M.: Lets keep it simple, using simple architectures to outperform deeper and more complex architectures (2016)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification (2015)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_38
Chapter Google Scholar
Hendrycks, D., Gimpel, K.: Gaussian error linear units (GELUs) (2020)
Google Scholar
Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines Vinod Nair (2010)
Google Scholar
Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks (2016)
Google Scholar
Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and $<$0.5MB model size (2016)
Google Scholar
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift (2015)
Google Scholar
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2017)
Google Scholar
Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report (2009)
Google Scholar
LeCun, Y., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989). https://doi.org/10.1162/neco.1989.1.4.541
Article Google Scholar
LeCun, Y., Cortes, C., Burges, C.: Mnist handwritten digit database, February 2010. ATT Labs http://yann.lecun.com/exdb/mnist
Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: ShuffleNet V2: practical guidelines for efficient CNN architecture design. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 122–138. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_8
Chapter Google Scholar
Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: ICML Workshop on Deep Learning for Audio, Speech and Language Processing (2013)
Google Scholar
Misra, D.: Mish: a self regularized non-monotonic activation function (2020)
Google Scholar
Negrinho, R., Gordon, G.: Deeparchitect: automatically designing and training deep architectures (2017)
Google Scholar
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011)
Google Scholar
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library (2019)
Google Scholar
Ramachandran, P., Zoph, B., Le, Q.V.: Searching for activation functions (2017)
Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks (2019)
Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2015)
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision (2015)
Google Scholar
Vaswani, A., et al.: Attention is all you need (2017)
Google Scholar
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)
Xu, B., Wang, N., Chen, T., Li, M.: Empirical evaluation of rectified activations in convolutional network (2015)
Google Scholar
Zagoruyko, S., Komodakis, N.: Wide residual networks (2016)
Google Scholar
Zhou, Y., Li, D., Huo, S., Kung, S.Y.: Soft-root-sign activation function (2020)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, IIIT Delhi, New Delhi, India
Koushik Biswas & Sandeep Kumar
Department of Mathematics, IIIT Delhi, New Delhi, India
Shilpak Banerjee & Ashish Kumar Pandey
Department of Mathematics, Shaheed Bhagat Singh College, University of Delhi, New Delhi, India
Sandeep Kumar

Authors

Koushik Biswas
View author publications
You can also search for this author in PubMed Google Scholar
Sandeep Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Shilpak Banerjee
View author publications
You can also search for this author in PubMed Google Scholar
Ashish Kumar Pandey
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Koushik Biswas .

Editor information

Editors and Affiliations

Comenius University in Bratislava, Bratislava, Slovakia
Igor Farkaš
iMotions A/S, Copenhagen, Denmark
Paolo Masulli
University of Tübingen, Tübingen, Baden-Württemberg, Germany
Sebastian Otte
Universität Hamburg, Hamburg, Germany
Stefan Wermter

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Biswas, K., Kumar, S., Banerjee, S., Pandey, A.K. (2021). EIS - Efficient and Trainable Activation Functions for Better Accuracy and Performance. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12892. Springer, Cham. https://doi.org/10.1007/978-3-030-86340-1_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-86340-1_21
Published: 07 September 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86339-5
Online ISBN: 978-3-030-86340-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics