Skip to main content
Log in

Multi-scale adaptive networks for efficient inference

  • Original Article
  • Published:
International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Abstract

The success of deep neural networks has been impressive in many areas. However, the increase in model performance is usually accompanied by an increase in depth and width, which is not conducive to the model being deployed at the edge. To address this problem, a new inference framework, multi-scale adaptive networks (MSAN), is proposed. Specifically, several branches are added at different stages of the network, and a scalable attention as well as self-distillation are used to improve the performance of shallow branches. To enhance the distillation effect and to reuse features efficiently, the knowledge from shallow and deep layers is fused through selective feature connections. In addition, two adaptive distillation strategies are proposed to further improve the performance of self-distillation. MSAN can be used to promote the performance of networks, static model compression and dynamic inference. Extensive experiments have demonstrated the superior performance of MSAN in these three aspects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability

The data that support the findings of this study are available on request from the corresponding author, upon reasonable request.

References

  1. AAAI Press, pp 7945–7952. https://ojs.aaai.org/index.php/AAAI/article/view/16969

  2. Chen P, Liu S, Zhao H et al (2021b) Distilling knowledge via knowledge review. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, 2021 Computer Vision Foundation / IEEE, pp 5008–5017. https://doi.org/10.1109/CVPR46437.2021.00497, https://openaccess.thecvf.com/content/CVPR2021/html/Chen_Distilling_Knowledge_via_Knowledge_Review_CVPR_2021_paper.html

  3. Chen D, Mei J, Zhang Y et al (2021a) Cross-layer distillation with semantic calibration. In: Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2-9, 2021 AAAI Press, pp 7028–7036. https://ojs.aaai.org/index.php/AAAI/article/view/16865

  4. Du G, Zhang J, Jiang M et al (2021) Graph-based class-imbalance learning with label enhancement. IEEE Trans Neural Netw Learn Syst Early Access. https://doi.org/10.1109/TNNLS.2021.3133262

    Article  Google Scholar 

  5. He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, Las Vegas, NV, USA, June 27-30, 2016 IEEE Computer Society, pp 770–778, https://doi.org/10.1109/CVPR.2016.90

  6. Hinton GE, Vinyals O, Dean J (2015) Distilling the knowledge in a neural network. CoRR. arXiv:1503.02531

  7. Hou Q, Zhou D, Feng J (2021) Coordinate attention for efficient mobile network design. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19-25, (2021) Computer Vision Foundation / IEEE, pp 13,713–13,722, https://openaccess.thecvf.com/content/CVPR2021/html/Hou_Coordinate_Attention_for_Efficient_Mobile_Network_Design_CVPR_2021_paper.html

  8. Howard AG, Zhu M, Chen B et al (2017) Mobilenets: efficient convolutional neural networks for mobile vision applications. CoRR. arXiv: 1704.04861

  9. Howard A, Pang R, Adam H et al (2019) Searching for mobilenetv 3 In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, (2019) IEEE, pp 1314–1324. https://doi.org/10.1109/ICCV.2019.00140

  10. Huang G, Chen D, Li T et al (2018) Multi-scale dense networks for resource efficient image classification. In: 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30–May 3, 2018, Conference Track Proceedings. OpenReview.net. https://openreview.net/forum?id=Hk2aImxAb

  11. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18–22, (2018) Computer Vision Foundation/IEEE Computer Society, pp 7132–7141. https://doi.org/10.1109/CVPR.2018.00745

  12. IEEE Computer Society, pp 936–944. https://doi.org/10.1109/CVPR.2017.106

  13. Ji M, Heo B, Park S (2021a) Show, attend and distill: Knowledge distillation via attention-based feature matching. In: Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Thirty-Third Conference on Innovative Applications of Artificial Intelligence, IAAI 2021, The Eleventh Symposium on Educational Advances in Artificial Intelligence, EAAI 2021, Virtual Event, February 2–9

  14. Ji M, Shin S, Hwang S et al (2021b) Refine myself by teaching myself: Feature refinement via self-knowledge distillation. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, virtual, June 19–25, (2021) Computer Vision Foundation/IEEE, pp 10,664–10,673. https://openaccess.thecvf.com/content/CVPR2021/html/Ji_Refine_Myself_by_Teaching_Myself_Feature_Refinement_via_Self-Knowledge_Distillation_CVPR_2021_paper.html

  15. Lee C, Hong S, Hong S et al (2020) Performance analysis of local exit for distributed deep neural networks over cloud and edge computing. ETRI J 42(5):658–668

    Article  Google Scholar 

  16. Lin T, Dollár P, Girshick RB et al (2017) Feature pyramid networks for object detection. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21–26,

  17. Li L, Su W, Liu F et al (2023) Knowledge fusion distillation: improving distillation with multi-scale attention mechanisms. Neural Process Lett 1–16

  18. Liu Y, Ng MK (2022) Deep neural network compression by tucker decomposition with nonlinear response. Knowl Based Syst 241(108):171. https://doi.org/10.1016/j.knosys.2022.108171

    Article  Google Scholar 

  19. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323(6088):533–536. https://www.nature.com/articles/323533a0

  20. Shao M, Dai J, Wang R et al (2022) CSHE: network pruning by using cluster similarity and matrix eigenvalues. Int J Mach Learn Cybern 13(2):371–382. https://doi.org/10.1007/s13042-021-01411-8

    Article  Google Scholar 

  21. Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: Bengio Y, LeCun Y (eds) 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, May 7–9, 2015, Conference Track Proceedings. arXiv: http://arxiv.org/abs/1409.1556

  22. Su W, Li L, Liu F et al (2022) AI on the edge: a comprehensive review. Artif Intell Rev 55(8):6125–6183. https://doi.org/10.1007/s10462-022-10141-4

    Article  Google Scholar 

  23. Tan M, Pang R, Le QV (2020) Efficientdet: scalable and efficient object detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, June 13–19, (2020) Computer Vision Foundation / IEEE, pp 10,778–10,787. https://doi.org/10.1109/CVPR42600.2020.01079, https://openaccess.thecvf.com/content_CVPR_2020/html/Tan_EfficientDet_Scalable_and_Efficient_Object_Detection_CVPR_2020_paper.html

  24. Teerapittayanon S, McDanel B, Kung HT (2016) Branchynet: fast inference via early exiting from deep neural networks. In: 23rd International Conference on Pattern Recognition, ICPR 2016, Cancún, Mexico, December 4–8, 2016. IEEE, pp 2464–2469. https://doi.org/10.1109/ICPR.2016.7900006

  25. Wang Z, Zhu H, Liu M et al (2023) Tagnet: a tiny answer-guided network for conversational question generation. Int J Mach Learn Cybern 14(5):1921–1932. https://doi.org/10.1007/s13042-022-01737-x

    Article  Google Scholar 

  26. Wang F, Jiang M, Qian C et al (2017) Residual attention network for image classification. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21–26,(2017) IEEE Computer Society, pp 6450–6458. https://doi.org/10.1109/CVPR.2017.683

  27. Woo S, Park J, Lee J et al (2018) CBAM: convolutional block attention module. In: Ferrari V, Hebert M, Sminchisescu C, et al (eds) Computer Vision - ECCV 2018 - 15th European Conference, Munich, Germany, September 8–14, 2018, Proceedings, Part VII, Lecture Notes in Computer Science, vol 11211. Springer, pp 3–19. https://doi.org/10.1007/978-3-030-01234-2_1

  28. Xie S, Girshick RB, Dollár P et al (2017) Aggregated residual transformations for deep neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, July 21–26, (2017) IEEE Computer Society, pp 5987–5995. https://doi.org/10.1109/CVPR.2017.634

  29. Yang J, Martínez B, Bulat A et al (2020) Knowledge distillation via adaptive instance normalization. CoRR. arXiv: 2003.04289

  30. Young SI, Wang Z, Taubman D et al (2022) Transform quantization for CNN compression. IEEE Trans Pattern Anal Mach Intell 44(9):5700–5714. https://doi.org/10.1109/TPAMI.2021.3084839

    Article  Google Scholar 

  31. Zagoruyko S, Komodakis N (2016) Wide residual networks. In: Wilson RC, Hancock ER, Smith WAP (eds) Proceedings of the British Machine Vision Conference 2016, BMVC 2016, York, UK, September 19–22, (2016) BMVA Press. http://www.bmva.org/bmvc/2016/papers/paper087/index.html

  32. Zhang L, Bao C, Ma K (2021) Self-distillation: towards efficient and compact neural networks. IEEE Trans Pattern Anal Mach Intell. https://doi.org/10.1109/TPAMI.2021.3067100

    Article  Google Scholar 

  33. Zhang L, Song J, Gao A et al (2019a) Be your own teacher: improve the performance of convolutional neural networks via self distillation. In: 2019 IEEE/CVF International Conference on Computer Vision, ICCV 2019, Seoul, Korea (South), October 27–November 2, (2019) IEEE, pp 3712–3721. https://doi.org/10.1109/ICCV.2019.00381

  34. Zhang L, Tan Z, Song J et al (2019b) SCAN: a scalable neural networks framework towards compact and efficient models. In: Wallach HM, Larochelle H, Beygelzimer A et al (eds) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8–14, 2019, Vancouver, BC, Canada, pp 4029–4038. https://proceedings.neurips.cc/paper/2019/hash/934b535800b1cba8f96a5d72f72f1611-Abstract.html

Download references

Acknowledgements

This work was supported in part by National Key R &D Program of China (2021YFB2501800); Tianjin Technology Innovation Guide Special (21YDTPJC00130).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fang Liu.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, L., Su, W., Liu, F. et al. Multi-scale adaptive networks for efficient inference. Int. J. Mach. Learn. & Cyber. 15, 267–282 (2024). https://doi.org/10.1007/s13042-023-01908-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13042-023-01908-4

Keywords

Navigation