Abstract
How can we efficiently compress convolutional neural network (CNN) using depthwise separable convolution, while retaining their accuracy on classification tasks? Depthwise separable convolution, which replaces a standard convolution with a depthwise convolution and a pointwise convolution, has been used for building lightweight architectures. However, previous works based on depthwise separable convolution are limited when compressing a trained CNN model since (1) they are mostly heuristic approaches without a precise understanding of their relations to standard convolution, and (2) their accuracies do not match that of the standard convolution. In this paper, we propose Falcon, an accurate and lightweight method to compress CNN based on depthwise separable convolution.Falcon uses generalized elementwise product (GEP), our proposed mathematical formulation to approximate the standard convolution kernel, to interpret existing convolution methods based on depthwise separable convolution. By exploiting the knowledge of a trained standard model and carefully determining the order of depthwise separable convolution via GEP, Falcon achieves sufficient accuracy close to that of the trained standard model. Furthermore, this interpretation leads to developing a generalized version rank-k Falcon which performs k independent Falcon operations and sums up the result. Experiments show that Falcon (1) provides higher accuracy than existing methods based on depthwise separable convolution and tensor decomposition and (2) reduces the number of parameters and FLOPs of standard convolution by up to a factor of 8 while ensuring similar accuracy. We also demonstrate that rank-k Falcon further improves the accuracy while sacrificing a bit of compression and computation reduction rates.




Similar content being viewed by others
Data availability
The code of Falcon is at https://github.com/snudm-starlab/FALCON2. All relevant data are within the manuscript. CIFAR10, and CIFAR100 datasets are available from https://www.cs.toronto.edu/~kriz/cifar.html. ImageNet dataset is available from http://www.image-net.org/. SVHN dataset is available from http://ufldl.stanford.edu/housenumbers/.
References
Agustsson E, Mentzer F, Tschannen M, Cavigelli L, Timofte R, Benini L, Gool LV (2017) Soft-to-hard vector quantization for end-to-end learning compressible representations. In: NeurIPS, pp. 1141–1151
Chen W, Wilson JT, Tyree S, Weinberger KQ, Chen Y (2015) Compressing neural networks with the hashing trick. In: ICML, pp. 2285–2294
Chen W, Wilson JT, Tyree S, Weinberger KQ, Chen, Y (2016) Compressing convolutional neural networks in the frequency domain. In: SIGKDD, pp. 1475–1484
Cho I, Kang U (2022) Pea-kd: parameter-efficient and accurate knowledge distillation on bert. PLoS ONE 17(2):e0263592
Choi Y, El-Khamy M, Lee J (2017) Towards the limit of network quantization. In: ICLR
Chollet F (2017) Xception: Deep learning with depthwise separable convolutions. In: CVPR, pp. 1800–1807
Courbariaux M, Bengio Y, David J (2015) Binaryconnect: training deep neural networks with binary weights during propagations. In: NeurIPS, pp. 3123–3131
Frankle J, Carbin M (2019) The lottery ticket hypothesis: finding sparse, trainable neural networks. In: ICLR. OpenReview.net
Gao H, Wang Z, Ji S (2018) Channelnets: Compact and efficient convolutional neural networks via channel-wise convolutions. In: Advances in neural information processing systems 31: annual conference on neural information processing systems 2018, NeurIPS 2018, December 3–8, 2018, Montréal, Canada, pp. 5203–5211
Gou J, Yu B, Maybank SJ, Tao D (2021) Knowledge distillation: a survey. Int J Comput Vision 129(6):1789–1819
Guo J, Li Y, Lin W, Chen Y, Li J (2018) Network decoupling: From regular to depthwise separable convolutions. In: British machine vision conference 2018, BMVC 2018, Newcastle, UK, September 3–6, 2018, p. 248. BMVA Press
Guo J, Ouyang W, Xu D (2020) Multi-dimensional pruning: a unified framework for model compression. In: CVPR, pp. 1505–1514. IEEE
Han S, Mao H, Dally WJ (2016) Deep compression: compressing deep neural network with pruning, trained quantization and huffman coding. In: ICLR
Han S, Pool J, Tran J, Dally WJ (2015) Learning both weights and connections for efficient neural network. In: NeurIPS, pp. 1135–1143
Hinton G, Vinyals O, Dean J, et al (2015) Distilling the knowledge in a neural network. arXiv:1503.025312(7)
Hou L, Yao Q, Kwok JT (2017) Loss-aware binarization of deep networks. In: ICLR
Howard A, Pang R, Adam H, Le QV, Sandler M, Chen B, Wang W, Chen L, Tan M, Chu G, Vasudevan V, Zhu Y (2019) Searching for mobilenetv3. In: ICCV, pp. 1314–1324. IEEE
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR arXiv:1704.04861
Hubara I, Courbariaux M, Soudry D, El-Yaniv R, Bengio Y (2016) Binarized neural networks. In: NeurIPS, pp. 4107–4115
Kim DH, Park C, Oh J, Lee S, Yu H (2016) Convolutional matrix factorization for document context-aware recommendation. In: Recsys, pp. 233–240
Kim J, Jung J, Kang U (2021) Compressing deep graph convolution network with multi-staged knowledge distillation. PLoS ONE 16(8):e0256187
Kim Y, Park E, Yoo S, Choi T, Yang L, Shin D (2016) Compression of deep convolutional neural networks for fast and low power mobile applications. In: ICLR
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: NeurIPS, pp. 1106–1114
Lebedev V, Ganin Y, Rakhuba M, Oseledets IV, Lempitsky VS (2015) Speeding-up convolutional neural networks using fine-tuned cp-decomposition. In: ICLR
Lee HD, Lee S, Kang U (2021) Auber: automated bert regularization. PLoS ONE 16(6):1–16. https://doi.org/10.1371/journal.pone.0253241
Li H, Kadav A, Durdanovic I, Samet H, Graf HP (2017) Pruning filters for efficient convnets. In: ICLR
Li T, Wu B, Yang Y, Fan Y, Zhang Y, Liu W (2019) Compressing convolutional neural networks via factorized convolutional filters. In: CVPR, pp. 3977–3986. Computer Vision Foundation/IEEE
Ma N, Zhang X, Zheng H, Sun J (2018) Shufflenet V2: practical guidelines for efficient CNN architecture design. In: ECCV, pp. 122–138
Molchanov P, Tyree S, Karras T, Aila T, Kautz J (2017) Pruning convolutional neural networks for resource efficient inference. In: ICLR
Nakajima S, Tomioka R, Sugiyama M, Babacan SD (2012) Perfect dimensionality recovery by variational bayesian PCA. In: NeurIPS, pp. 980–988
Niu W, Ma X, Lin S, Wang S, Qian X, Lin X, Wang Y, Ren B (2020) Patdnn: Achieving real-time DNN execution on mobile devices with pattern-based weight pruning. In: ASPLOS, pp. 907–922. ACM
Novikov A, Podoprikhin D, Osokin A, Vetrov DP (2015) Tensorizing neural networks. In: NeurIPS, pp. 442–450
Piao T, Cho I, Kang U (2022) Sensimix: sensitivity-aware 8-bit index & 1-bit value mixed precision quantization for bert compression. PLoS ONE 17(4):e0265621
Romero A, Ballas N, Kahou SE, Chassang A, Gatta C, Bengio Y (2014) Fitnets: Hints for thin deep nets. arXiv preprint arXiv:1412.6550
Sandler M, Howard AG, Zhu M, Zhmoginov A, Chen L (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. In: CVPR, pp. 4510–4520
Sifre L (2014) Rigid-motion scattering for image classification. Ph.D. thesis, École Polytechnique
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: ICLR
Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: AAAI, 2017, pp. 4278–4284
Tung F, Mori G (2019) Similarity-preserving knowledge distillation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp. 1365–1374
Ullrich K, Meeds E, Welling M (2017) Soft weight-sharing for neural network compression. In: ICLR
Wan A, Dai X, Zhang P, He Z, Tian Y, Xie S, Wu B, Yu M, Xu T, Chen K, Vajda P, Gonzalez JE (2020) Fbnetv2: differentiable neural architecture search for spatial and channel dimensions. In: CVPR, pp. 12962–12971. IEEE
Wang Y, Xu C, Qiu J, Xu C, Tao D (2018) Towards evolutionary compression. In: SIGKDD, pp. 2476–2485
Wu B, Dai X, Zhang P, Wang Y, Sun F, Wu Y, Tian Y, Vajda P, Jia Y, Keutzer K (2019) Fbnet: Hardware-aware efficient convnet design via differentiable neural architecture search. In: CVPR, pp. 10734–10742. Computer Vision Foundation/IEEE
Yin W, Schütze H, Xiang B, Zhou B (2016) ABCNN: attention-based convolutional neural network for modeling sentence pairs. TACL 4:259–272
Yoo J, Cho M, Kim T, Kang U (2019) Knowledge extraction with no observable data. In: NeurIPS, pp. 2701–2710
Zhang X, Zhou X, Lin M, Sun J (2018) Shufflenet: An extremely efficient convolutional neural network for mobile devices. In: CVPR, pp. 6848–6856
Zhu C, Han S, Mao H, Dally WJ (2017) Trained ternary quantization. In: ICLR
Zhuang Z, Tan M, Zhuang B, Liu J, Guo Y, Wu Q, Huang J, Zhu J (2018) Discrimination-aware channel pruning for deep neural networks. In: NeurIPS, pp. 883–894
Acknowledgements
This work was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) [NO.2021-0-01343, Artificial Intelligence Graduate School Program (Seoul National University)], [NO.2021-0-02068, Artificial Intelligence Innovation Hub (Artificial Intelligence Institute, Seoul National University)], [No.2017-0-01772, Development of QA systems for Video Story Understanding to pass the Video Turing Test], and [No.2020-0-00894, Flexible and Efficient Model Compression Method for Various Applications and Environments]. The Institute of Engineering Research at Seoul National University provided research facilities for this work. The ICT at Seoul National University provides research facilities for this study.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The work was done while Hyun Dong Lee was at Seoul National University.
The work was done while Chun Quan was at Seoul National University.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Jang, JG., Quan, C., Lee, H.D. et al. Falcon: lightweight and accurate convolution based on depthwise separable convolution. Knowl Inf Syst 65, 2225–2249 (2023). https://doi.org/10.1007/s10115-022-01818-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-022-01818-x