Abstract
Deep learning models are based on a succession of multiple layers of artificial neural networks, which allows us to approach the resolution of several mathematical transformations and feed the next layer. This process is turned by exploiting the principle of non-linearity of the activation function that determine the output of neural network layer in aim to facilitate the learning process during training. Indeed, to improve the performance of these functions, it is essential to understand their non-linear behavior, in particular concerning their negative parts. In this context, the enhanced new activation functions which were implemented after ReLU function exploit the negative values to further optimize the gradient descent. In this paper, we propose a new activation function which is based on a trigonometric function and allows to further overcome the gradient problem, with less computation time compared to that of Mish function. The experiments that are performed over multiple datasets challenge show that the proposed activation function gives a high test accuracy than both ReLU and Mish functions in many deep network models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Kumar Roy, S., Manna, S., Ram Dubey, S., Chaudhuri, B.B.: LiSHT: Non-Parametric Linearly Scaled Hyperbolic Tangent Activation Function for Neural Networks. https://arxiv.org/pdf/1901.05894.pdf. Accessed 1 Jan 2019
Le, Q.V., Ramachandran, P., Zoph, B.: Swish: a Self-Gated activation function (2017)
Misra, D.: Mish: A Self Regularized Non-Monotonic Neural Activation Function. https://arxiv.org/pdf/1908.08681.pdf. Accessed 13 Aug 2020
LeCun, Y., Cortes, C., Burges, C.J.: Mnist handwritten digit database. ATT Labs. https://yann.lecun.com/exdb/mnist. Accessed 2 2010
Ioffe, S., Szegedy, C.: Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167 (2015)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J Mach. Learn. Res. 15(1), 1929–1958 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: ‘Identity Mappings in Deep Residual Networks
Sandler, M., Howard, A., Zhu, M., et al.: MobileNetV2: inverted residuals and linear bottlenecks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, June 2018, pp. 4510–4520 (2018)
Hu, J., Shen, L., Sun, G.: Squeeze-and-excitation networks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, June 2018, pp. 7132–7141 (2018)
Forrest, N. Iandola, S. Han, M.W., Moskewicz, K. Ashraf, W.J., Dally, K.: Keutzer’SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size. https://arxiv.org/abs/1602.07360. Accessed 24 Feb 2016
Zhang, X., Zhou, X., Lin, M., et al.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, USA, June 2018, pp. 6848–6856 (2018)
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Douge, K., Berrahou, A., Talibi Alaoui, Y., Talibi Alaoui, M. (2021). A Self-gated Activation Function SINSIG Based on the Sine Trigonometric for Neural Network Models. In: Renault, É., Boumerdassi, S., Mühlethaler, P. (eds) Machine Learning for Networking. MLN 2020. Lecture Notes in Computer Science(), vol 12629. Springer, Cham. https://doi.org/10.1007/978-3-030-70866-5_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-70866-5_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-70865-8
Online ISBN: 978-3-030-70866-5
eBook Packages: Computer ScienceComputer Science (R0)