Skip to main content
Log in

Rethinking the activation function in lightweight network

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Activation function plays an important role in neural network. Applying activation function in network appropriately can improve accuracy and speed up converging. In this paper, we study the information loss caused by activation function in lightweight network, and discusses how to use activation function with negative value to solve this issue. We propose a method to minimize the changes to the existing network, we only need to replace ReLU with Swish in the appropriate position of lightweight network. We call this method enriching activation. Enriching activation is achieved by utilizing activation functions with negative value in the position where ReLU causes information loss. We also propose a novel activation function called (H)-SwishX for enriching activation, which adds a learnable maximal value to (H)-Swish. (H)-SwishX learns significant maximal value in each layer of network to reduce the accuracy reduction during the lightweight network quantization. We verify this enriching activation scheme on popular lightweight networks. Compared to existing activation schemes adopted by these lightweight networks, we demonstrate performance improvements on CIFAR-10 and ImageNet datasets. We further demonstrate that enriching activation has a good ability for transfer learning, and measure the performance on MSCOCO object detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Basha SHS, Dubey SR, Pulabaigari V, Mukherjee S (2019) Impact of fully connected layers on performance of convolutional neural networks for image classification. Neurocomputing

  2. Bender G, Kindermans PJ, Zoph B, Vasudevan V, Le Q (2018) Understanding and simplifying one-shot architecture search. In: ICML, pp 549–558

  3. Cai H, Gan C, Han S (2019a) Once for all: train one network and specialize it for efficient deployment. arXiv:190809791

  4. Cai H, Zhu L, Han S (2019b) Proxylessnas: direct neural architecture search on target task and hardware. In: ICLR

  5. Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: CVPR, pp 1610–02357

  6. Chu X, Zhang B, Xu R, Li J (2019) Fairnas: rethinking evaluation fairness of weight sharing neural architecture search. arXiv:190701845

  7. Clevert D, Unterthiner T, Hochreiter S (2015) Fast and accurate deep network learning by exponential linear units (elus). arXiv:1511.07289

  8. Deng J, Dong W, Socher R, Li L, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: CVPR, pp 248–255

  9. Gidaris S, Komodakis N (2016) Locnet: improving localization accuracy for object detection. In: CVPR, pp 789–798

  10. Girshick R (2015) Fast r-cnn. In: ICCV, pp 1440–1448

  11. Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp 580–587

  12. Goyal P, Dollár P, Girshick R, Noordhuis P, Wesolowski L, Kyrola A, Tulloch A, Jia Y, He K (2017) Accurate, large minibatch sgd: training imagenet in 1 hour. arXiv:170602677

  13. Han S, Mao H, Dally W (2015) Deep compression: compressing dnns with pruning, trained quantization and huffman coding. arxiv:151000149v3

  14. He K, Gkioxari G, Dollár P, Girshick R (2017a) Mask r-cnn. In: ICCV, pp 2980–2988

  15. He Y, Lin J, Liu Z, Wang H, Li LJ, Han S (2018) Amc: automl for model compression and acceleration on mobile devices. In: ECCV

  16. He K, Zhang X, Ren S, Sun J (2015a) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV

  17. He K, Zhang X, Ren S, Sun J (2015b) Spatial pyramid pooling in deep convolutional networks for visual recognition. TPAMI 37(9):1904–1916

    Article  Google Scholar 

  18. He K, Zhang X, Ren S, Sun J (2016a) Deep residual learning for image recognition. In: CVPR, pp 770–778

  19. He K, Zhang X, Ren S, Sun J (2016b) Identity mappings in deep residual networks. In: ECCV

  20. He Y, Zhang X, Sun J (2017b) Channel pruning for accelerating very deep neural networks. In: ICCV, pp 1389–1397

  21. Howard A, Sandler M, Chu G, Chen LC, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V (2019) Searching for mobilenetv3. arXiv:190502244

  22. Howard A, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. In: CVPR

  23. Hu J, Shen L, Sun G (2017) Squeeze-and-excitation networks. arXiv:170901507

  24. Huang G, Liu S, Maaten L, Weinberger K (2017) Condensenet: An efficient densenet using learned group convolutions. arXiv:171109224

  25. Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp 448–456

  26. Jin X, Xu C, Feng J, Wei Y, Xiong J, Yan S (2015) Deep learning with s-shaped rectified linear activation units. Computer Science 3:1–8

    Google Scholar 

  27. Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Tech Report

  28. Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: NIPS, pp 1097–1105

  29. Lin T, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: CVPR

  30. Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: ECCV, pp 740–755

  31. Liu W, Anguelov D, Erhan D, Szegedy C, Reed S (2016) Ssd: single shot multibox detector. In: ECCV, pp 21–37

  32. Liu H, Simonyan K, Yang Y (2018) Darts: differentiable architecture search. arXiv:180609055

  33. Long J, Shelhamer E, Darrell T (2014) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651

    Google Scholar 

  34. Luo JH, Wu J, Lin W (2017) Thinet: a filter level pruning method for deep neural network compression. In: ICCV, pp 5058–5066

  35. Ma N, Zhang X, Zheng H, Sun J (2018) Shufflenet v2: practical guidelines for efficient cnn architecture design. In: ECCV, pp 116–131

  36. Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: ICML, vol 30, no. 3, p 1

  37. Nair V, Hinton G (2010) Rectified linear units improve restricted boltzmann machines. In: ICML, pp 807–814

  38. Pham H, Guan M, Zoph B, Le Q, Dean J (2018) Efficient neural architecture search via parameter sharing. arXiv:180203268

  39. Prajit R, Barret Z, Quoc VL (2017) Swish: aself-gated activation function. arXiv:171005941

  40. Prajit R, Barret Z, Quoc VL (2017) Searching for activation functions. arXiv:171005941

  41. Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: NIPS, pp 91–99

  42. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) Imagenet large scale visual recognition challenge. IJCV 115(3):211–252

    Article  MathSciNet  Google Scholar 

  43. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L (2018) Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation. arXiv:180104381

  44. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: ICLR

  45. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Rabinovich A (2015) Going deeper with convolutions. In: CVPR, pp 1–9

  46. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: CVPR, pp 2818–2826

  47. Tan M, Le QV (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: ICML, pp 6105–6114

  48. Tan M, Chen B, Pang R, Vasudevan V, Le QV (2019) Mnasnet: platform-aware neural architecture search for mobile. In: CVPR

  49. Wang M, Liu B, Foroosh H (2016) Design of efficient convolutional layers using single intra-channel convolution, topological subdivisioning and spatial bottleneck structure. arXiv:160804337

  50. Wu B, Dai X, Zhang P, Wang Y, Sun F, Wu Y, Tian Y, Vajda P, Jia Y, Keutzer K (2019) Fbnet: Hardware-aware efficient convnet design via differentiableneural architecture search. In: CVPR

  51. Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: CVPR, pp 5987–5995

  52. Yang L, Song Q, Li Z, Wu Y, Li X, Hu M (2018) Cross connected network for efficient image recognition. In: ACCV

  53. Zhang Q, Zhang M, Chen T, Sun Z (2018) Ma y, Recent advances in convolutional neural network acceleration. Neurocomputing, Yu B

  54. Zhang X, Zhou X, Lin M, Sun J (2017) Shufflenet: an extremely efficient convolutional neural network for mobile devices. arXiv:170701083

  55. Zoph B, Vasudevan V, Shlens J, Le Q (2017) Learning transferable architectures for scalable image recognition. arXiv:170707012

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Qing Song.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, L., Song, Q., Fan, Z. et al. Rethinking the activation function in lightweight network. Multimed Tools Appl 82, 1355–1371 (2023). https://doi.org/10.1007/s11042-022-13217-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-13217-z

Keywords

Navigation