Abstract
Activation function plays an important role in neural network. Applying activation function in network appropriately can improve accuracy and speed up converging. In this paper, we study the information loss caused by activation function in lightweight network, and discusses how to use activation function with negative value to solve this issue. We propose a method to minimize the changes to the existing network, we only need to replace ReLU with Swish in the appropriate position of lightweight network. We call this method enriching activation. Enriching activation is achieved by utilizing activation functions with negative value in the position where ReLU causes information loss. We also propose a novel activation function called (H)-SwishX for enriching activation, which adds a learnable maximal value to (H)-Swish. (H)-SwishX learns significant maximal value in each layer of network to reduce the accuracy reduction during the lightweight network quantization. We verify this enriching activation scheme on popular lightweight networks. Compared to existing activation schemes adopted by these lightweight networks, we demonstrate performance improvements on CIFAR-10 and ImageNet datasets. We further demonstrate that enriching activation has a good ability for transfer learning, and measure the performance on MSCOCO object detection.
Similar content being viewed by others
References
Basha SHS, Dubey SR, Pulabaigari V, Mukherjee S (2019) Impact of fully connected layers on performance of convolutional neural networks for image classification. Neurocomputing
Bender G, Kindermans PJ, Zoph B, Vasudevan V, Le Q (2018) Understanding and simplifying one-shot architecture search. In: ICML, pp 549–558
Cai H, Gan C, Han S (2019a) Once for all: train one network and specialize it for efficient deployment. arXiv:190809791
Cai H, Zhu L, Han S (2019b) Proxylessnas: direct neural architecture search on target task and hardware. In: ICLR
Chollet F (2017) Xception: deep learning with depthwise separable convolutions. In: CVPR, pp 1610–02357
Chu X, Zhang B, Xu R, Li J (2019) Fairnas: rethinking evaluation fairness of weight sharing neural architecture search. arXiv:190701845
Clevert D, Unterthiner T, Hochreiter S (2015) Fast and accurate deep network learning by exponential linear units (elus). arXiv:1511.07289
Deng J, Dong W, Socher R, Li L, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: CVPR, pp 248–255
Gidaris S, Komodakis N (2016) Locnet: improving localization accuracy for object detection. In: CVPR, pp 789–798
Girshick R (2015) Fast r-cnn. In: ICCV, pp 1440–1448
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR, pp 580–587
Goyal P, Dollár P, Girshick R, Noordhuis P, Wesolowski L, Kyrola A, Tulloch A, Jia Y, He K (2017) Accurate, large minibatch sgd: training imagenet in 1 hour. arXiv:170602677
Han S, Mao H, Dally W (2015) Deep compression: compressing dnns with pruning, trained quantization and huffman coding. arxiv:151000149v3
He K, Gkioxari G, Dollár P, Girshick R (2017a) Mask r-cnn. In: ICCV, pp 2980–2988
He Y, Lin J, Liu Z, Wang H, Li LJ, Han S (2018) Amc: automl for model compression and acceleration on mobile devices. In: ECCV
He K, Zhang X, Ren S, Sun J (2015a) Delving deep into rectifiers: surpassing human-level performance on imagenet classification. In: ICCV
He K, Zhang X, Ren S, Sun J (2015b) Spatial pyramid pooling in deep convolutional networks for visual recognition. TPAMI 37(9):1904–1916
He K, Zhang X, Ren S, Sun J (2016a) Deep residual learning for image recognition. In: CVPR, pp 770–778
He K, Zhang X, Ren S, Sun J (2016b) Identity mappings in deep residual networks. In: ECCV
He Y, Zhang X, Sun J (2017b) Channel pruning for accelerating very deep neural networks. In: ICCV, pp 1389–1397
Howard A, Sandler M, Chu G, Chen LC, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V (2019) Searching for mobilenetv3. arXiv:190502244
Howard A, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. In: CVPR
Hu J, Shen L, Sun G (2017) Squeeze-and-excitation networks. arXiv:170901507
Huang G, Liu S, Maaten L, Weinberger K (2017) Condensenet: An efficient densenet using learned group convolutions. arXiv:171109224
Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift. In: ICML, pp 448–456
Jin X, Xu C, Feng J, Wei Y, Xiong J, Yan S (2015) Deep learning with s-shaped rectified linear activation units. Computer Science 3:1–8
Krizhevsky A, Hinton G (2009) Learning multiple layers of features from tiny images. Tech Report
Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: NIPS, pp 1097–1105
Lin T, Dollár P, Girshick R, He K, Hariharan B, Belongie S (2017) Feature pyramid networks for object detection. In: CVPR
Lin TY, Maire M, Belongie S, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft coco: common objects in context. In: ECCV, pp 740–755
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S (2016) Ssd: single shot multibox detector. In: ECCV, pp 21–37
Liu H, Simonyan K, Yang Y (2018) Darts: differentiable architecture search. arXiv:180609055
Long J, Shelhamer E, Darrell T (2014) Fully convolutional networks for semantic segmentation. IEEE Trans Pattern Anal Mach Intell 39(4):640–651
Luo JH, Wu J, Lin W (2017) Thinet: a filter level pruning method for deep neural network compression. In: ICCV, pp 5058–5066
Ma N, Zhang X, Zheng H, Sun J (2018) Shufflenet v2: practical guidelines for efficient cnn architecture design. In: ECCV, pp 116–131
Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: ICML, vol 30, no. 3, p 1
Nair V, Hinton G (2010) Rectified linear units improve restricted boltzmann machines. In: ICML, pp 807–814
Pham H, Guan M, Zoph B, Le Q, Dean J (2018) Efficient neural architecture search via parameter sharing. arXiv:180203268
Prajit R, Barret Z, Quoc VL (2017) Swish: aself-gated activation function. arXiv:171005941
Prajit R, Barret Z, Quoc VL (2017) Searching for activation functions. arXiv:171005941
Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: NIPS, pp 91–99
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) Imagenet large scale visual recognition challenge. IJCV 115(3):211–252
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L (2018) Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation. arXiv:180104381
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: ICLR
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Rabinovich A (2015) Going deeper with convolutions. In: CVPR, pp 1–9
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: CVPR, pp 2818–2826
Tan M, Le QV (2019) Efficientnet: rethinking model scaling for convolutional neural networks. In: ICML, pp 6105–6114
Tan M, Chen B, Pang R, Vasudevan V, Le QV (2019) Mnasnet: platform-aware neural architecture search for mobile. In: CVPR
Wang M, Liu B, Foroosh H (2016) Design of efficient convolutional layers using single intra-channel convolution, topological subdivisioning and spatial bottleneck structure. arXiv:160804337
Wu B, Dai X, Zhang P, Wang Y, Sun F, Wu Y, Tian Y, Vajda P, Jia Y, Keutzer K (2019) Fbnet: Hardware-aware efficient convnet design via differentiableneural architecture search. In: CVPR
Xie S, Girshick R, Dollár P, Tu Z, He K (2017) Aggregated residual transformations for deep neural networks. In: CVPR, pp 5987–5995
Yang L, Song Q, Li Z, Wu Y, Li X, Hu M (2018) Cross connected network for efficient image recognition. In: ACCV
Zhang Q, Zhang M, Chen T, Sun Z (2018) Ma y, Recent advances in convolutional neural network acceleration. Neurocomputing, Yu B
Zhang X, Zhou X, Lin M, Sun J (2017) Shufflenet: an extremely efficient convolutional neural network for mobile devices. arXiv:170701083
Zoph B, Vasudevan V, Shlens J, Le Q (2017) Learning transferable architectures for scalable image recognition. arXiv:170707012
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yang, L., Song, Q., Fan, Z. et al. Rethinking the activation function in lightweight network. Multimed Tools Appl 82, 1355–1371 (2023). https://doi.org/10.1007/s11042-022-13217-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-13217-z