Skip to main content
Log in

Falcon: lightweight and accurate convolution based on depthwise separable convolution

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

How can we efficiently compress convolutional neural network (CNN) using depthwise separable convolution, while retaining their accuracy on classification tasks? Depthwise separable convolution, which replaces a standard convolution with a depthwise convolution and a pointwise convolution, has been used for building lightweight architectures. However, previous works based on depthwise separable convolution are limited when compressing a trained CNN model since (1) they are mostly heuristic approaches without a precise understanding of their relations to standard convolution, and (2) their accuracies do not match that of the standard convolution. In this paper, we propose Falcon, an accurate and lightweight method to compress CNN based on depthwise separable convolution.Falcon uses generalized elementwise product (GEP), our proposed mathematical formulation to approximate the standard convolution kernel, to interpret existing convolution methods based on depthwise separable convolution. By exploiting the knowledge of a trained standard model and carefully determining the order of depthwise separable convolution via GEP, Falcon achieves sufficient accuracy close to that of the trained standard model. Furthermore, this interpretation leads to developing a generalized version rank-k Falcon which performs k independent Falcon operations and sums up the result. Experiments show that Falcon (1) provides higher accuracy than existing methods based on depthwise separable convolution and tensor decomposition and (2) reduces the number of parameters and FLOPs of standard convolution by up to a factor of 8 while ensuring similar accuracy. We also demonstrate that rank-k Falcon further improves the accuracy while sacrificing a bit of compression and computation reduction rates.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

The code of Falcon is at https://github.com/snudm-starlab/FALCON2. All relevant data are within the manuscript. CIFAR10, and CIFAR100 datasets are available from https://www.cs.toronto.edu/~kriz/cifar.html. ImageNet dataset is available from http://www.image-net.org/. SVHN dataset is available from http://ufldl.stanford.edu/housenumbers/.

References

  1. Agustsson E, Mentzer F, Tschannen M, Cavigelli L, Timofte R, Benini L, Gool LV (2017) Soft-to-hard vector quantization for end-to-end learning compressible representations. In: NeurIPS, pp. 1141–1151

  2. Chen W, Wilson JT, Tyree S, Weinberger KQ, Chen Y (2015) Compressing neural networks with the hashing trick. In: ICML, pp. 2285–2294

  3. Chen W, Wilson JT, Tyree S, Weinberger KQ, Chen, Y (2016) Compressing convolutional neural networks in the frequency domain. In: SIGKDD, pp. 1475–1484

  4. Cho I, Kang U (2022) Pea-kd: parameter-efficient and accurate knowledge distillation on bert. PLoS ONE 17(2):e0263592

    Article  Google Scholar 

  5. Choi Y, El-Khamy M, Lee J (2017) Towards the limit of network quantization. In: ICLR

  6. Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: CVPR, pp. 1800–1807

  7. Courbariaux M, Bengio Y, David J (2015) Binaryconnect: training deep neural networks with binary weights during propagations. In: NeurIPS, pp. 3123–3131

  8. Frankle J, Carbin M (2019) The lottery ticket hypothesis: finding sparse, trainable neural networks. In: ICLR. OpenReview.net

  9. Gao H, Wang Z, Ji S (2018) Channelnets: Compact and efficient convolutional neural networks via channel-wise convolutions. In: Advances in neural information processing systems 31: annual conference on neural information processing systems 2018, NeurIPS 2018, December 3–8, 2018, Montréal, Canada, pp. 5203–5211

  10. Gou J, Yu B, Maybank SJ, Tao D (2021) Knowledge distillation: a survey. Int J Comput Vision 129(6):1789–1819

    Article  Google Scholar 

  11. Guo J, Li Y, Lin W, Chen Y, Li J (2018) Network decoupling: From regular to depthwise separable convolutions. In: British machine vision conference 2018, BMVC 2018, Newcastle, UK, September 3–6, 2018, p. 248. BMVA Press

  12. Guo J, Ouyang W, Xu D (2020) Multi-dimensional pruning: a unified framework for model compression. In: CVPR, pp. 1505–1514. IEEE

  13. Han S, Mao H, Dally WJ (2016) Deep compression: compressing deep neural network with pruning, trained quantization and huffman coding. In: ICLR

  14. Han S, Pool J, Tran J, Dally WJ (2015) Learning both weights and connections for efficient neural network. In: NeurIPS, pp. 1135–1143

  15. Hinton G, Vinyals O, Dean J, et al (2015) Distilling the knowledge in a neural network. arXiv:1503.025312(7)

  16. Hou L, Yao Q, Kwok JT (2017) Loss-aware binarization of deep networks. In: ICLR

  17. Howard A, Pang R, Adam H, Le QV, Sandler M, Chen B, Wang W, Chen L, Tan M, Chu G, Vasudevan V, Zhu Y (2019) Searching for mobilenetv3. In: ICCV, pp. 1314–1324. IEEE

  18. Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR arXiv:1704.04861

  19. Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y (2016) Binarized neural networks. In: NeurIPS, pp. 4107–4115

  20. Kim DH, Park C, Oh J, Lee S, Yu H (2016) Convolutional matrix factorization for document context-aware recommendation. In: Recsys, pp. 233–240

  21. Kim J, Jung J, Kang U (2021) Compressing deep graph convolution network with multi-staged knowledge distillation. PLoS ONE 16(8):e0256187

    Article  Google Scholar 

  22. Kim Y, Park E, Yoo S, Choi T, Yang L, Shin D (2016) Compression of deep convolutional neural networks for fast and low power mobile applications. In: ICLR

  23. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: NeurIPS, pp. 1106–1114

  24. Lebedev V, Ganin Y, Rakhuba M, Oseledets IV, Lempitsky VS (2015) Speeding-up convolutional neural networks using fine-tuned cp-decomposition. In: ICLR

  25. Lee HD, Lee S, Kang U (2021) Auber: automated bert regularization. PLoS ONE 16(6):1–16. https://doi.org/10.1371/journal.pone.0253241

    Article  Google Scholar 

  26. Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2017) Pruning filters for efficient convnets. In: ICLR

  27. Li T, Wu B, Yang Y, Fan Y, Zhang Y, Liu W (2019) Compressing convolutional neural networks via factorized convolutional filters. In: CVPR, pp. 3977–3986. Computer Vision Foundation/IEEE

  28. Ma N, Zhang X, Zheng H, Sun J (2018) Shufflenet V2: practical guidelines for efficient CNN architecture design. In: ECCV, pp. 122–138

  29. Molchanov P, Tyree S, Karras T, Aila T, Kautz J (2017) Pruning convolutional neural networks for resource efficient inference. In: ICLR

  30. Nakajima S, Tomioka R, Sugiyama M, Babacan SD (2012) Perfect dimensionality recovery by variational bayesian PCA. In: NeurIPS, pp. 980–988

  31. Niu W, Ma X, Lin S, Wang S, Qian X, Lin X, Wang Y, Ren B (2020) Patdnn: Achieving real-time DNN execution on mobile devices with pattern-based weight pruning. In: ASPLOS, pp. 907–922. ACM

  32. Novikov A, Podoprikhin D, Osokin A, Vetrov DP (2015) Tensorizing neural networks. In: NeurIPS, pp. 442–450

  33. Piao T, Cho I, Kang U (2022) Sensimix: sensitivity-aware 8-bit index & 1-bit value mixed precision quantization for bert compression. PLoS ONE 17(4):e0265621

    Article  Google Scholar 

  34. Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y (2014) Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550

  35. Sandler M, Howard AG, Zhu M, Zhmoginov A, Chen L (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: CVPR, pp. 4510–4520

  36. Sifre L (2014) Rigid-motion scattering for image classification. Ph.D. thesis, École Polytechnique

  37. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: ICLR

  38. Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI, 2017, pp. 4278–4284

  39. Tung F, Mori G (2019) Similarity-preserving knowledge distillation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 1365–1374

  40. Ullrich K, Meeds E, Welling M (2017) Soft weight-sharing for neural network compression. In: ICLR

  41. Wan A, Dai X, Zhang P, He Z, Tian Y, Xie S, Wu B, Yu M, Xu T, Chen K, Vajda P, Gonzalez JE (2020) Fbnetv2: differentiable neural architecture search for spatial and channel dimensions. In: CVPR, pp. 12962–12971. IEEE

  42. Wang Y, Xu C, Qiu J, Xu C, Tao D (2018) Towards evolutionary compression. In: SIGKDD, pp. 2476–2485

  43. Wu B, Dai X, Zhang P, Wang Y, Sun F, Wu Y, Tian Y, Vajda P, Jia Y, Keutzer K (2019) Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. In: CVPR, pp. 10734–10742. Computer Vision Foundation/IEEE

  44. Yin W, Schütze H, Xiang B, Zhou B (2016) ABCNN: attention-based convolutional neural network for modeling sentence pairs. TACL 4:259–272

    Article  Google Scholar 

  45. Yoo J, Cho M, Kim T, Kang U (2019) Knowledge extraction with no observable data. In: NeurIPS, pp. 2701–2710

  46. Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: CVPR, pp. 6848–6856

  47. Zhu C, Han S, Mao H, Dally WJ (2017) Trained ternary quantization. In: ICLR

  48. Zhuang Z, Tan M, Zhuang B, Liu J, Guo Y, Wu Q, Huang J, Zhu J (2018) Discrimination-aware channel pruning for deep neural networks. In: NeurIPS, pp. 883–894

Download references

Acknowledgements

This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) [NO.2021-0-01343, Artificial Intelligence Graduate School Program (Seoul National University)], [NO.2021-0-02068, Artificial Intelligence Innovation Hub (Artificial Intelligence Institute, Seoul National University)], [No.2017-0-01772, Development of QA systems for Video Story Understanding to pass the Video Turing Test], and [No.2020-0-00894, Flexible and Efficient Model Compression Method for Various Applications and Environments]. The Institute of Engineering Research at Seoul National University provided research facilities for this work. The ICT at Seoul National University provides research facilities for this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to U. Kang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The work was done while Hyun Dong Lee was at Seoul National University.

The work was done while Chun Quan was at Seoul National University.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jang, JG., Quan, C., Lee, H.D. et al. Falcon: lightweight and accurate convolution based on depthwise separable convolution. Knowl Inf Syst 65, 2225–2249 (2023). https://doi.org/10.1007/s10115-022-01818-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-022-01818-x

Keywords

Navigation