Abstract
Activation functions play a pivotal role in function learning using neural networks. The non-linearity in a neural network is achieved by repeated use of the activation function. Over the years, numerous activation functions have been proposed to improve neural network performance in several deep learning tasks. Basic functions like ReLU, Sigmoid, Tanh, or Softplus have been favorites among the deep learning community because of their simplicity. In recent years, several novel activation functions arising from these basic functions have been proposed, which have improved accuracy in some challenging datasets. We propose three activation functions with trainable parameters, namely EIS-1, EIS-2, and EIS-3. We show these three activation functions outperform widely used activation functions on some well-known datasets and models. For example, EIS-1, EIS-2, and EIS-3 beats ReLU by 5.55%, 5.32%, and 5.60% on ResNet V2 34, 5.27%, 5.24%, and 5.76% on VGG 16, 2.02%, 1.93%, and 2.01% on Wide-Res-Net 28-10, 2.30%, 2.11%, and 2.50% on Shufflenet V2 in CIFAR100 dataset while 1.40%, 1.27%, and 1.45% on ResNet V2 34, 1.21%, 1.09%, and 1.17% on VGG 16, 1.10%, 1.04%, and 1.16% on Wide-Res-Net 28-10, 1.85%, 1.60%, and 1.67% on Shufflenet V2 in CIFAR10 dataset respectively. The proposed functions also perform better than traditional activation functions like ReLU, Leaky ReLU, Swish, etc. in Object detection, Semantic segmentation, and Machine Translation problems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Baker, B., Gupta, O., Naik, N., Raskar, R.: Designing neural network architectures using reinforcement learning (2016)
Biswas, K., Kumar, S., Banerjee, S., Pandey, A.K.: TanhSoft - a family of activation functions combining Tanh and Softplus (2020)
Carlile, B., Delamarter, G., Kinney, P., Marti, A., Whitney, B.: Improving deep learning by inverse square root linear units (ISRLUs) (2017)
Chollet, F.: Keras (2015). https://github.com/fchollet/keras
Clevert, D.A., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (ELUs) (2015)
Cordts, M., et al.: The cityscapes dataset for semantic urban scene understanding (2016)
Elfwing, S., Uchibe, E., Doya, K.: Sigmoid-weighted linear units for neural network function approximation in reinforcement learning (2017)
Everingham, M., Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: Teh, Y.W., Titterington, M. (eds.) Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. JMLR Workshop and Conference Proceedings, Chia Laguna Resort, Sardinia, Italy, 13–15 May 2010. Proceedings of Machine Learning Research, vol. 9, pp. 249–256 (2010) http://proceedings.mlr.press/v9/glorot10a.html
Zheng, H., Yang, Z., Liu, W., Liang, J., Li, Y.: Improving deep neural networks using Softplus units. In: 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1–4 (2015)
Hasanpour, S.H., Rouhani, M., Fayyaz, M., Sabokrou, M.: Lets keep it simple, using simple architectures to outperform deeper and more complex architectures (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on ImageNet classification (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9908, pp. 630–645. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_38
Hendrycks, D., Gimpel, K.: Gaussian error linear units (GELUs) (2020)
Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines Vinod Nair (2010)
Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks (2016)
Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and \(<\)0.5MB model size (2016)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift (2015)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2017)
Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report (2009)
LeCun, Y., et al.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989). https://doi.org/10.1162/neco.1989.1.4.541
LeCun, Y., Cortes, C., Burges, C.: Mnist handwritten digit database, February 2010. ATT Labs http://yann.lecun.com/exdb/mnist
Ma, N., Zhang, X., Zheng, H.-T., Sun, J.: ShuffleNet V2: practical guidelines for efficient CNN architecture design. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) Computer Vision – ECCV 2018. LNCS, vol. 11218, pp. 122–138. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01264-9_8
Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: ICML Workshop on Deep Learning for Audio, Speech and Language Processing (2013)
Misra, D.: Mish: a self regularized non-monotonic activation function (2020)
Negrinho, R., Gordon, G.: Deeparchitect: automatically designing and training deep architectures (2017)
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning (2011)
Paszke, A., et al.: PyTorch: an imperative style, high-performance deep learning library (2019)
Ramachandran, P., Zoph, B., Le, Q.V.: Searching for activation functions (2017)
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks (2019)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition (2015)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision (2015)
Vaswani, A., et al.: Attention is all you need (2017)
Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747 (2017)
Xu, B., Wang, N., Chen, T., Li, M.: Empirical evaluation of rectified activations in convolutional network (2015)
Zagoruyko, S., Komodakis, N.: Wide residual networks (2016)
Zhou, Y., Li, D., Huo, S., Kung, S.Y.: Soft-root-sign activation function (2020)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Biswas, K., Kumar, S., Banerjee, S., Pandey, A.K. (2021). EIS - Efficient and Trainable Activation Functions for Better Accuracy and Performance. In: Farkaš, I., Masulli, P., Otte, S., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2021. ICANN 2021. Lecture Notes in Computer Science(), vol 12892. Springer, Cham. https://doi.org/10.1007/978-3-030-86340-1_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-86340-1_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86339-5
Online ISBN: 978-3-030-86340-1
eBook Packages: Computer ScienceComputer Science (R0)