Skip to main content
Log in

PSigmoid: Improving squeeze-and-excitation block with parametric sigmoid

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Squeeze-and-Excitation (SE) Networks won the last ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) classification competition and is very popular in today’s vision community. The SE block is the core of Squeeze-and-Excitation Network (SENet), which adaptively recalibrates channel-wise features and suppresses less useful ones. Since SE blocks can be directly used in existing models and effectively improve performance, SE blocks are widely used in a variety of tasks. In this paper, we propose a novel Parametric Sigmoid (PSigmoid) to enhance the SE block. We named the new module PSigmoid SE (PSE) block. The PSE block can not only suppress features in a channel-wise manner, but also enhance features. We evaluate the performance of our method on four common datasets including CIFAR-10, CIFAR-100, SVHN and Tiny ImageNet. Experimental results show the effectiveness of our method. In addition, we compare the differences between the PSE block and the SE block through a detailed analysis of the configuration. Finally, we use a combination of PSE block and SE block to obtain better performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. Width is the number of channels in a layer.

  2. One stage means that only one of the three stages of the model integrates the block.

  3. Two stages means that two of the three stages of the model integrate the block.

  4. Three stages means that the block is integrated into all stages of the model. That is the case of the original paper [18] and section 4.

References

  1. LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436–444. https://doi.org/10.1038/NATURE14539

  2. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) ImageNet Large Scale Visual Recognition Challenge. Int J Comput Vis 115:211–252. https://doi.org/10.1007/s11263-015-0816-y

  3. Krizhevsky A, Sutskever I, Hinton G (2012) ImageNet classication with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105. https://doi.org/10.1145/3065386

  4. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Proceedings of the International conference on learning representations

  5. Szegedy C, Liu W, Jia Y, Sermanet PS et al (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9. https://doi.org/10.1109/CVPR.2015.7298594

  6. Bianchini M, Scarselli F (2014) On the Complexity of Neural Network Classifiers: A Comparison Between Shallow and Deep Architectures. IEEE Trans Neural Netw Learn Syst 25(8):1553–1565. https://doi.org/10.1109/TNNLS.2013.2293637

  7. Huang G, Liu Z, Pleiss G, van der Maaten L, Weinberger KQ (2019) Convolutional Networks with Dense Connectivity. IEEE Trans Pattern Anal Mach Intell:1–1. https://doi.org/10.1109/TPAMI.2019.2918284

  8. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778. https://doi.org/10.1109/CVPR.2016.90

  9. Zhang K, Sun M, Han TX, Yuan Xf, Guo L, Liu T (2018) Residual Networks of Residual Networks: Multilevel Residual Networks. IEEE Trans Circ Syst Video Technol 28(6):1303–1314. https://doi.org/10.1109/TCSVT.2017.2654543

  10. Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of the lnternational Conference on Machine Learning, pp 448–456

  11. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. J Mach Learn Res 9:249–256

  12. Srivastava RK, Greff K, Schmidhuber J, Schmidhuber (2015) Training very deep networks. In: Advances in Neural Information Processing Systems, pp 2377–2385

  13. Huang G, Sun Y, Liu Z, Sedra D, Weinberger KQ (2016) Deep networks with stochastic depth. In: Proceedings of the European Conference on Computer Vision, pp 646–661. https://doi.org/10.1007/978-3-319-46493-0_39

  14. Veit A, Wilber M, Belongie S (2016) Residual networks are exponential ensembles of relatively shallow networks. In: Advances in neural information processing systems, pp 550–558

  15. Xie S, Girshick R, Dollar P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1492–1500. https://doi.org/10.1109/CVPR.2017.634

  16. Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 550–558. https://doi.org/10.1109/CVPR.2017.195

  17. Gao H, Wang Z, Cai L, Ji S (2020) ChannelNets: Compact and Efficient Convolutional Neural Networks via Channel-Wise Convolutions. IEEE Trans Pattern Anal Mach Intell:1–1. https://doi.org/10.1109/TPAMI.2020.2975796

  18. Hu J, Shen L, Sun G (2020) Squeeze-and-Excitation Networks. IEEE Trans Pattern Anal Mach Intell 42(8):2011–2023. https://doi.org/10.1109/TPAMI.2019.2913372

  19. Li Y, Fan C, Li Y, Wu Q, Ming Y (2018) Improving deep neural network with multiple parametric exponential linear units. Neurocomputing 301:11–24. https://doi.org/10.1016/j.neucom.2018.01.084

  20. Zhao H, Liu F, Li L, Luo C (2018) A novel softplus linear unit for deep convolutional neural networks. Appl Intell 48(7):1707–1720. https://doi.org/10.1007/s10489-017-1028-7

  21. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In: Proceedings of the IEEE International Conference on Computer Vision, pp 1026–1034. https://doi.org/10.1109/ICCV.2015.123

  22. Njikam ANS, Zhao H (2016) A novel activation function for multilayer feed-forward neural networks. Appl Intell 45(1):75–82. https://doi.org/10.1007/s10489-015-0744-0

  23. Ying Y, Su J, Shan P, Miao L, Wang X, Peng S (2019) Rectified exponential units for convolutional neural networks. IEEE Access 7:101633–101640. https://doi.org/10.1109/ACCESS.2019.2928442

  24. Kim D, Kim J, Kim J (2020) Elastic exponential linear units for convolutional neural networks. Neurocomputing 406:253–266. https://doi.org/10.1016/j.neucom.2020.03.051

  25. Yu X, Ye X, Gao Q (2020) Infrared Handprint Image Restoration Algorithm Based on Apoptotic Mechanism. IEEE Access 8:47334–47343. https://doi.org/10.1109/ACCESS.2020.2979018

  26. Liu X, Zhu X, Li M, Wang L, Zhu E, Liu T, Kloft M, Shen D, Yin J, Gao W (2020) Multiple Kernel k k-Means with Incomplete Kernels. IEEE Trans Pattern Anal Mach Intell1191–1204. https://doi.org/10.1109/TPAMI.2019.2892416

  27. Chandra P, Singh Y (2004) An activation function adapting training algorithm for sigmoidal feedforward networks. Neurocomputing 61:429–437. https://doi.org/10.1016/J.NEUCOM.2004.04.001

  28. Sharma SK, Chandra P (2010) An adaptive slope sigmoidal function cascading neural networks algorithm. In: International Conference on Emerging Trends in Engineering and Technology, pp 531–536. https://doi.org/10.1109/ICETET.2010.71

  29. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324. https://doi.org/10.1109/5.726791

  30. Nair V, Hinton G (2010) Rectified linear units improve restricted Boltzmann machines. In: Proceedings of the lnternational Conference on Machine Learning, pp 807–814

  31. Hahnloser RHR, Sarpeshkar R, Mahowald MA, Douglas RJ, Sebastian Seung H (2000) Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit. Nature 405(6789):947–951. https://doi.org/10.1038/35016072

  32. Zhao M, Zhong S, Fu X, Tang B, Dong S, Pecht M (2020) Deep Residual Networks with Adaptively Parametric Rectifier Linear Units for Fault Diagnosis. IEEE Trans Indust Electron:1-1. https://doi.org/10.1109/TIE.2020.2972458

  33. Clevert DA, Unterthiner T, Hochreiter S (2016) Fast and accurate deep network learning by exponential linear units (ELUS). In: Proceedings of the International Conference on Learning Representations

  34. Zagoruyko S., Komodakis N. (2016) Wide residual networks. In: British Machine Vision Conference. https://doi.org/10.5244/C.30.87

  35. Lin M, Chen Q, Yan S (2014) Network in network. In: Proceedings of the International Conference on Learning Representations

  36. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826. https://doi.org/10.1109/CVPR.2016.308

  37. Szegedy C, Ioffe S, Vanhoucke V (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 4278–4284

  38. Zhang X, Zhou X, Lin M, Sun J (2018) ShuffleNet: An extremely efficient convolutional neural network for mobile devices. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6848–6856. https://doi.org/10.1109/CVPR.2018.00716

  39. Hou S, Wang Z (2019) Weighted channel dropout for regularization of deep convolutional neural network. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 8425–8432. https://doi.org/10.1609/AAAI.V33I01.33018425

  40. Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Department of Computer Science, University of Toronto, Masters thesis

  41. Netzer Y, Wang T, Coates A, Bissacco A, Wu B, Ng AY (2011) Reading digits in natural images with unsupervised feature learning. In: Advances in neural information processing systems workshop on deep learning and unsupervised feature learning

  42. Tiny ImageNet Visual Recognition Challenge. [Online] Available: https://tinyimagenet.herokuapp.com

  43. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: Proceedings of the European Conference on Computer Vision, pp 818–833. https://doi.org/10.1007/978-3-319-10590-1_53

  44. Alain G, Bengio Y (2017) Understanding intermediate layers using linear classifier probes. In: Proceedings of the International Conference on Learning Representations Workshop

  45. Yosinski J, Clune J, Bengio Y, Lipson H (2014) How transferable are features in deep neural networks? In: Advances in neural information processing systems, pp 3320–3328

  46. Lin T-Y, Maire M, Belongie S et al (2014) Microsoft COCO: Common Objects in Context. In: Proceedings of the European Conference on Computer Vision, pp 740–755. https://doi.org/10.1007/978-3-319-10602-1_48

  47. Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The Caltech-UCSD Birds-200-2011 dataset. California Institute of Technology

  48. Song W, Zheng J, Wu Y, Chen C, Liu F (2020) Discriminative feature extraction for video person re-identification via multi-task network. Applied Intelligence. https://doi.org/10.1007/s10489-020-01844-8

  49. Wu L, Wang Y, Li X, Gao J (2019) Deep Attention-Based Spatially Recursive Networks for Fine-Grained Visual Recognition. IEEE Trans Cybern 49(5):1791–1802. https://doi.org/10.1109/TCYB.2018.2813971

  50. Zheng Z, An G, Wu D, Ruan Q (2020) Global and Local Knowledge-Aware Attention Network for Action Recognition. IEEE Trans Neural Netw Learn Syst:1–14. https://doi.org/10.1109/TNNLS.2020.2978613

  51. Choe J, Lee S, Shim H (2020) Attention-based Dropout Layer for Weakly Supervised Single Object Localization and Semantic Segmentation. IEEE Trans Pattern Anal Mach Intell:1–1. https://doi.org/10.1109/TPAMI.2020.2999099

  52. Woo S, Park J, Lee J-Y, So Kweon I (2018) Cbam: Convolutional block attention module. In: Proceedings of the European Conference on Computer Vision, pp 3–19. https://doi.org/10.1007/978-3-030-01234-2_1

Download references

Acknowledgements

This work was supported from the following foundations: National Natural Science Foundation of China (Grant no. 61601104) and the Fundamental Research Funds for the Central Universities (Grant no. N2023021).The authors thank Wei Li, Yang Li, Feng Zhang, Feng Jia, Jianlin Su, Xinyu Ou and Shenqi Lai for help discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peng Shan.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ying, Y., Zhang, N., Shan, P. et al. PSigmoid: Improving squeeze-and-excitation block with parametric sigmoid. Appl Intell 51, 7427–7439 (2021). https://doi.org/10.1007/s10489-021-02247-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02247-z

Keywords

Navigation