Rethinking the activation function in lightweight network

Yang, Lu; Song, Qing; Fan, Zimeng; Liu, Chun; Hu, Mengjie

doi:10.1007/s11042-022-13217-z

Rethinking the activation function in lightweight network

Published: 17 June 2022

Volume 82, pages 1355–1371, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Lu Yang¹,
Qing Song¹,
Zimeng Fan¹,
Chun Liu¹ &
…
Mengjie Hu¹

356 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Activation function plays an important role in neural network. Applying activation function in network appropriately can improve accuracy and speed up converging. In this paper, we study the information loss caused by activation function in lightweight network, and discusses how to use activation function with negative value to solve this issue. We propose a method to minimize the changes to the existing network, we only need to replace ReLU with Swish in the appropriate position of lightweight network. We call this method enriching activation. Enriching activation is achieved by utilizing activation functions with negative value in the position where ReLU causes information loss. We also propose a novel activation function called (H)-SwishX for enriching activation, which adds a learnable maximal value to (H)-Swish. (H)-SwishX learns significant maximal value in each layer of network to reduce the accuracy reduction during the lightweight network quantization. We verify this enriching activation scheme on popular lightweight networks. Compared to existing activation schemes adopted by these lightweight networks, we demonstrate performance improvements on CIFAR-10 and ImageNet datasets. We further demonstrate that enriching activation has a good ability for transfer learning, and measure the performance on MSCOCO object detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

αSechSig and αTanhSig: two novel non-monotonic activation functions

Article 06 October 2023

Learn to Enhance the Negative Information in Convolutional Neural Network

Deep networks with non-static activation function

Article 31 January 2018

References

Basha SHS, Dubey SR, Pulabaigari V, Mukherjee S (2019) Impact of fully connected layers on performance of convolutional neural networks for image classification. Neurocomputing
Bender G, Kindermans PJ, Zoph B, Vasudevan V, Le Q (2018) Understanding and simplifying one-shot architecture search. In: ICML, pp 549–558
Cai H, Gan C, Han S (2019a) Once for all: train one network and specialize it for efficient deployment. arXiv:190809791
Cai H, Zhu L, Han S (2019b) Proxylessnas: direct neural architecture search on target task and hardware. In: ICLR
Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: CVPR, pp 1610–02357
Chu X, Zhang B, Xu R, Li J (2019) Fairnas: rethinking evaluation fairness of weight sharing neural architecture search. arXiv:190701845
Clevert D, Unterthiner T, Hochreiter S (2015) Fast and accurate deep network learning by exponential linear units (elus). arXiv:1511.07289
Deng J, Dong W, Socher R, Li L, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: CVPR, pp 248–255
Gidaris S, Komodakis N (2016) Locnet: improving localization accuracy for object detection. In: CVPR, pp 789–798
Girshick R (2015) Fast r-cnn. In: ICCV, pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp 580–587
Goyal P, Dollár P, Girshick R, Noordhuis P, Wesolowski L, Kyrola A, Tulloch A, Jia Y, He K (2017) Accurate, large minibatch sgd: training imagenet in 1 hour. arXiv:170602677
Han S, Mao H, Dally W (2015) Deep compression: compressing dnns with pruning, trained quantization and huffman coding. arxiv:151000149v3
He K, Gkioxari G, Dollár P, Girshick R (2017a) Mask r-cnn. In: ICCV, pp 2980–2988
He Y, Lin J, Liu Z, Wang H, Li LJ, Han S (2018) Amc: automl for model compression and acceleration on mobile devices. In: ECCV
He K, Zhang X, Ren S, Sun J (2015a) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV
He K, Zhang X, Ren S, Sun J (2015b) Spatial pyramid pooling in deep convolutional networks for visual recognition. TPAMI 37(9):1904–1916
Article Google Scholar
He K, Zhang X, Ren S, Sun J (2016a) Deep residual learning for image recognition. In: CVPR, pp 770–778
He K, Zhang X, Ren S, Sun J (2016b) Identity mappings in deep residual networks. In: ECCV
He Y, Zhang X, Sun J (2017b) Channel pruning for accelerating very deep neural networks. In: ICCV, pp 1389–1397
Howard A, Sandler M, Chu G, Chen LC, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V (2019) Searching for mobilenetv3. arXiv:190502244
Howard A, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. In: CVPR
Hu J, Shen L, Sun G (2017) Squeeze-and-excitation networks. arXiv:170901507
Huang G, Liu S, Maaten L, Weinberger K (2017) Condensenet: An efficient densenet using learned group convolutions. arXiv:171109224
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp 448–456
Jin X, Xu C, Feng J, Wei Y, Xiong J, Yan S (2015) Deep learning with s-shaped rectified linear activation units. Computer Science 3:1–8
Google Scholar
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Tech Report
Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: NIPS, pp 1097–1105
Lin T, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: CVPR
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: ECCV, pp 740–755
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S (2016) Ssd: single shot multibox detector. In: ECCV, pp 21–37
Liu H, Simonyan K, Yang Y (2018) Darts: differentiable architecture search. arXiv:180609055
Long J, Shelhamer E, Darrell T (2014) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651
Google Scholar
Luo JH, Wu J, Lin W (2017) Thinet: a filter level pruning method for deep neural network compression. In: ICCV, pp 5058–5066
Ma N, Zhang X, Zheng H, Sun J (2018) Shufflenet v2: practical guidelines for efficient cnn architecture design. In: ECCV, pp 116–131
Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: ICML, vol 30, no. 3, p 1
Nair V, Hinton G (2010) Rectified linear units improve restricted boltzmann machines. In: ICML, pp 807–814
Pham H, Guan M, Zoph B, Le Q, Dean J (2018) Efficient neural architecture search via parameter sharing. arXiv:180203268
Prajit R, Barret Z, Quoc VL (2017) Swish: aself-gated activation function. arXiv:171005941
Prajit R, Barret Z, Quoc VL (2017) Searching for activation functions. arXiv:171005941
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: NIPS, pp 91–99
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) Imagenet large scale visual recognition challenge. IJCV 115(3):211–252
Article MathSciNet Google Scholar
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L (2018) Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation. arXiv:180104381
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: ICLR
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Rabinovich A (2015) Going deeper with convolutions. In: CVPR, pp 1–9
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: CVPR, pp 2818–2826
Tan M, Le QV (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: ICML, pp 6105–6114
Tan M, Chen B, Pang R, Vasudevan V, Le QV (2019) Mnasnet: platform-aware neural architecture search for mobile. In: CVPR
Wang M, Liu B, Foroosh H (2016) Design of efficient convolutional layers using single intra-channel convolution, topological subdivisioning and spatial bottleneck structure. arXiv:160804337
Wu B, Dai X, Zhang P, Wang Y, Sun F, Wu Y, Tian Y, Vajda P, Jia Y, Keutzer K (2019) Fbnet: Hardware-aware efficient convnet design via differentiableneural architecture search. In: CVPR
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: CVPR, pp 5987–5995
Yang L, Song Q, Li Z, Wu Y, Li X, Hu M (2018) Cross connected network for efficient image recognition. In: ACCV
Zhang Q, Zhang M, Chen T, Sun Z (2018) Ma y, Recent advances in convolutional neural network acceleration. Neurocomputing, Yu B
Zhang X, Zhou X, Lin M, Sun J (2017) Shufflenet: an extremely efficient convolutional neural network for mobile devices. arXiv:170701083
Zoph B, Vasudevan V, Shlens J, Le Q (2017) Learning transferable architectures for scalable image recognition. arXiv:170707012

Download references

Author information

Authors and Affiliations

Pattern Recognition and Intelligence Vision Lab, Beijing University of Posts and Telecommunications, Beijing, China
Lu Yang, Qing Song, Zimeng Fan, Chun Liu & Mengjie Hu

Authors

Lu Yang
View author publications
You can also search for this author in PubMed Google Scholar
Qing Song
View author publications
You can also search for this author in PubMed Google Scholar
Zimeng Fan
View author publications
You can also search for this author in PubMed Google Scholar
Chun Liu
View author publications
You can also search for this author in PubMed Google Scholar
Mengjie Hu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qing Song.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, L., Song, Q., Fan, Z. et al. Rethinking the activation function in lightweight network. Multimed Tools Appl 82, 1355–1371 (2023). https://doi.org/10.1007/s11042-022-13217-z

Download citation

Received: 02 October 2019
Revised: 04 October 2020
Accepted: 11 May 2022
Published: 17 June 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s11042-022-13217-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Rethinking the activation function in lightweight network

Abstract

Access this article

Similar content being viewed by others

αSechSig and αTanhSig: two novel non-monotonic activation functions

Learn to Enhance the Negative Information in Convolutional Neural Network

Deep networks with non-static activation function

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Rethinking the activation function in lightweight network

Abstract

Access this article

Similar content being viewed by others

α­SechSig and α­TanhSig: two novel non-monotonic activation functions

Learn to Enhance the Negative Information in Convolutional Neural Network

Deep networks with non-static activation function

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

αSechSig and αTanhSig: two novel non-monotonic activation functions