Abstract
The success of deep neural networks has been impressive in many areas. However, the increase in model performance is usually accompanied by an increase in depth and width, which is not conducive to the model being deployed at the edge. To address this problem, a new inference framework, multi-scale adaptive networks (MSAN), is proposed. Specifically, several branches are added at different stages of the network, and a scalable attention as well as self-distillation are used to improve the performance of shallow branches. To enhance the distillation effect and to reuse features efficiently, the knowledge from shallow and deep layers is fused through selective feature connections. In addition, two adaptive distillation strategies are proposed to further improve the performance of self-distillation. MSAN can be used to promote the performance of networks, static model compression and dynamic inference. Extensive experiments have demonstrated the superior performance of MSAN in these three aspects.
Similar content being viewed by others
Data availability
The data that support the findings of this study are available on request from the corresponding author, upon reasonable request.
References
AAAI Press, pp 7945–7952. https://ojs.aaai.org/index.php/AAAI/article/view/16969
Chen P, Liu S, Zhao H et al (2021b) Distilling knowledge via knowledge review. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021 Computer Vision Foundation / IEEE, pp 5008–5017. https://doi.org/10.1109/CVPR46437.2021.00497, https://openaccess.thecvf.com/content/CVPR2021/html/Chen_Distilling_Knowledge_via_Knowledge_Review_CVPR_2021_paper.html
Chen D, Mei J, Zhang Y et al (2021a) Cross-layer distillation with semantic calibration. In: Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021 AAAI Press, pp 7028–7036. https://ojs.aaai.org/index.php/AAAI/article/view/16865
Du G, Zhang J, Jiang M et al (2021) Graph-based class-imbalance learning with label enhancement. IEEE Trans Neural Netw Learn Syst Early Access. https://doi.org/10.1109/TNNLS.2021.3133262
He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016 IEEE Computer Society, pp 770–778, https://doi.org/10.1109/CVPR.2016.90
Hinton GE, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. CoRR. arXiv:1503.02531
Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, (2021) Computer Vision Foundation / IEEE, pp 13,713–13,722, https://openaccess.thecvf.com/content/CVPR2021/html/Hou_Coordinate_Attention_for_Efficient_Mobile_Network_Design_CVPR_2021_paper.html
Howard AG, Zhu M, Chen B et al (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. CoRR. arXiv: 1704.04861
Howard A, Pang R, Adam H et al (2019) Searching for mobilenetv 3 In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, (2019) IEEE, pp 1314–1324. https://doi.org/10.1109/ICCV.2019.00140
Huang G, Chen D, Li T et al (2018) Multi-scale dense networks for resource efficient image classification. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30–May 3, 2018, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=Hk2aImxAb
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18–22, (2018) Computer Vision Foundation/IEEE Computer Society, pp 7132–7141. https://doi.org/10.1109/CVPR.2018.00745
IEEE Computer Society, pp 936–944. https://doi.org/10.1109/CVPR.2017.106
Ji M, Heo B, Park S (2021a) Show, attend and distill: Knowledge distillation via attention-based feature matching. In: Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2–9
Ji M, Shin S, Hwang S et al (2021b) Refine myself by teaching myself: Feature refinement via self-knowledge distillation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19–25, (2021) Computer Vision Foundation/IEEE, pp 10,664–10,673. https://openaccess.thecvf.com/content/CVPR2021/html/Ji_Refine_Myself_by_Teaching_Myself_Feature_Refinement_via_Self-Knowledge_Distillation_CVPR_2021_paper.html
Lee C, Hong S, Hong S et al (2020) Performance analysis of local exit for distributed deep neural networks over cloud and edge computing. ETRI J 42(5):658–668
Lin T, Dollár P, Girshick RB et al (2017) Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21–26,
Li L, Su W, Liu F et al (2023) Knowledge fusion distillation: improving distillation with multi-scale attention mechanisms. Neural Process Lett 1–16
Liu Y, Ng MK (2022) Deep neural network compression by tucker decomposition with nonlinear response. Knowl Based Syst 241(108):171. https://doi.org/10.1016/j.knosys.2022.108171
Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536. https://www.nature.com/articles/323533a0
Shao M, Dai J, Wang R et al (2022) CSHE: network pruning by using cluster similarity and matrix eigenvalues. Int J Mach Learn Cybern 13(2):371–382. https://doi.org/10.1007/s13042-021-01411-8
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Bengio Y, LeCun Y (eds) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings. arXiv: http://arxiv.org/abs/1409.1556
Su W, Li L, Liu F et al (2022) AI on the edge: a comprehensive review. Artif Intell Rev 55(8):6125–6183. https://doi.org/10.1007/s10462-022-10141-4
Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13–19, (2020) Computer Vision Foundation / IEEE, pp 10,778–10,787. https://doi.org/10.1109/CVPR42600.2020.01079, https://openaccess.thecvf.com/content_CVPR_2020/html/Tan_EfficientDet_Scalable_and_Efficient_Object_Detection_CVPR_2020_paper.html
Teerapittayanon S, McDanel B, Kung HT (2016) Branchynet: fast inference via early exiting from deep neural networks. In: 23rd International Conference on Pattern Recognition, ICPR 2016, Cancún, Mexico, December 4–8, 2016. IEEE, pp 2464–2469. https://doi.org/10.1109/ICPR.2016.7900006
Wang Z, Zhu H, Liu M et al (2023) Tagnet: a tiny answer-guided network for conversational question generation. Int J Mach Learn Cybern 14(5):1921–1932. https://doi.org/10.1007/s13042-022-01737-x
Wang F, Jiang M, Qian C et al (2017) Residual attention network for image classification. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21–26,(2017) IEEE Computer Society, pp 6450–6458. https://doi.org/10.1109/CVPR.2017.683
Woo S, Park J, Lee J et al (2018) CBAM: convolutional block attention module. In: Ferrari V, Hebert M, Sminchisescu C, et al (eds) Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part VII, Lecture Notes in Computer Science, vol 11211. Springer, pp 3–19. https://doi.org/10.1007/978-3-030-01234-2_1
Xie S, Girshick RB, Dollár P et al (2017) Aggregated residual transformations for deep neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, (2017) IEEE Computer Society, pp 5987–5995. https://doi.org/10.1109/CVPR.2017.634
Yang J, Martínez B, Bulat A et al (2020) Knowledge distillation via adaptive instance normalization. CoRR. arXiv: 2003.04289
Young SI, Wang Z, Taubman D et al (2022) Transform quantization for CNN compression. IEEE Trans Pattern Anal Mach Intell 44(9):5700–5714. https://doi.org/10.1109/TPAMI.2021.3084839
Zagoruyko S, Komodakis N (2016) Wide residual networks. In: Wilson RC, Hancock ER, Smith WAP (eds) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19–22, (2016) BMVA Press. http://www.bmva.org/bmvc/2016/papers/paper087/index.html
Zhang L, Bao C, Ma K (2021) Self-distillation: towards efficient and compact neural networks. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2021.3067100
Zhang L, Song J, Gao A et al (2019a) Be your own teacher: improve the performance of convolutional neural networks via self distillation. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, (2019) IEEE, pp 3712–3721. https://doi.org/10.1109/ICCV.2019.00381
Zhang L, Tan Z, Song J et al (2019b) SCAN: a scalable neural networks framework towards compact and efficient models. In: Wallach HM, Larochelle H, Beygelzimer A et al (eds) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8–14, 2019, Vancouver, BC, Canada, pp 4029–4038. https://proceedings.neurips.cc/paper/2019/hash/934b535800b1cba8f96a5d72f72f1611-Abstract.html
Acknowledgements
This work was supported in part by National Key R &D Program of China (2021YFB2501800); Tianjin Technology Innovation Guide Special (21YDTPJC00130).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, L., Su, W., Liu, F. et al. Multi-scale adaptive networks for efficient inference. Int. J. Mach. Learn. & Cyber. 15, 267–282 (2024). https://doi.org/10.1007/s13042-023-01908-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13042-023-01908-4