Skip to main content

Advertisement

Log in

Attention-based variable-size feature compression module for edge inference

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Artificial intelligence has made significant breakthroughs in many fields, especially with the broad deployment of edge devices, which provides opportunities to develop and apply various intelligent models in edge networks. Edge device-server co-inference system has gradually become the mainstream of edge intelligent computing. However, the existing feature procession works in the edge inference framework neglect the focus on whether features are important, and the processed features are still redundant, affecting the inference efficiency. In this paper, we propose a novel attention-based variable-size feature compression module to enhance edge systems’ inference efficiency by leveraging input data’s varying importance levels. First, a multi-scale attention mechanism is introduced, which operates jointly in the channel spatial to effectively compute importance weights from the intermediate output features of the edge devices. These weights are then utilized to assign different transmission probabilities, filtering out irrelevant feature data and prioritizing task-relevant information. Second, the new loss algorithm and progressive model training strategy are designed to optimize the proposed module, enabling the model to adapt to the reduced feature data gradually and effectively. Finally, experimental results on CIFAR-10 and ImageNet datasets demonstrate the effectiveness of our proposed solution, showcasing a significant reduction in the data output volume of edge devices and minimizing communication overhead while ensuring minimal loss in model accuracy.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

Not applicable.

References

  1. Guo M-H, Xu T-X, Liu J-J, Liu Z-N, Jiang P-T, Mu T-J, Zhang S-H, Martin RR, Cheng M-M, Hu S-M (2022) Attention mechanisms in computer vision: a survey. Comput Vis Media 8(3):331–368

    Article  Google Scholar 

  2. Voulodimos A, Doulamis N, Doulamis A, Protopapadakis E et al (2018) Deep learning for computer vision: a brief review. Comput Intell Neurosci 2018(2018):1–13

    Google Scholar 

  3. Minaee S, Kalchbrenner N, Cambria E, Nikzad N, Chenaghlu M, Gao J (2021) Deep learning-based text classification: a comprehensive review. ACM Comput Surv: CSUR 54(3):1–40

    Article  Google Scholar 

  4. Hu H-X, Jiang Z-W, Zhao Y-F, Zhang Y, Wang H, Wang W (2020) Network representation learning-enhanced multisource information fusion model for poi recommendation in smart city. IEEE Internet Things J 8(12):9539–9548

    Article  Google Scholar 

  5. Eshratifar AE, Abrishami MS, Pedram M (2019) Jointdnn: an efficient training and inference engine for intelligent mobile cloud computing services. IEEE Trans Mob Comput 20(2):565–576

    Article  Google Scholar 

  6. Liang X, Zhang Y, Wang G, Xu S (2019) A deep learning model for transportation mode detection based on smartphone sensing data. IEEE Trans Intell Transp Syst 21(12):5223–5235

    Article  Google Scholar 

  7. Qian T, Shao C, Wang X, Shahidehpour M (2019) Deep reinforcement learning for EV charging navigation by coordinating smart grid and intelligent transportation system. IEEE Trans Smart Grid 11(2):1714–1723

    Article  Google Scholar 

  8. Deng S, Zhao H, Fang W, Yin J, Dustdar S, Zomaya AY (2020) Edge intelligence: the confluence of edge computing and artificial intelligence. IEEE Internet Things J 7(8):7457–7469

    Article  Google Scholar 

  9. Xu D, Li T, Li Y, Su X, Tarkoma S, Hui P (2020) A survey on edge intelligence. arXiv:2003.12172

  10. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25(2012):1097–1105

    Google Scholar 

  11. Zhou Z, Chen X, Li E, Zeng L, Luo K, Zhang J (2019) Edge intelligence: paving the last mile of artificial intelligence with edge computing. Proc IEEE 107(8):1738–1762

    Article  Google Scholar 

  12. Shao J, Mao Y, Zhang J (2022) Task-oriented communication for multidevice cooperative edge inference. IEEE Trans Wirel Commun 22(1):73–87

    Article  Google Scholar 

  13. Shao J, Zhang J (2020) Communication-computation trade-off in resource-constrained edge inference. IEEE Commun Mag 58(12):20–26

    Article  Google Scholar 

  14. Shi Y, Yang K, Jiang T, Zhang J, Letaief KB (2020) Communication-efficient edge AI: algorithms and systems. IEEE Commun Surv Tutor 22(4):2167–2191

    Article  Google Scholar 

  15. Ko JH, Na T, Amir MF, Mukhopadhyay S (2018) Edge-host partitioning of deep neural networks with feature space encoding for resource-constrained internet-of-things platforms. In: 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, pp 1–6

  16. Kang Y, Hauswald J, Gao C, Rovinski A, Mudge T, Mars J, Tang L (2017) Neurosurgeon: collaborative intelligence between the cloud and mobile edge. ACM SIGARCH Comput Archit News 45(1):615–629

    Article  Google Scholar 

  17. Li H, Hu C, Jiang J, Wang Z, Wen Y, Zhu W (2018) Jalad: joint accuracy-and latency-aware deep structure decoupling for edge-cloud execution. In: 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS). IEEE, pp 671–678

  18. Shao J, Mao Y, Zhang J (2021) Learning task-oriented communication for edge inference: an information bottleneck approach. IEEE J Sel Areas Commun 40(1):197–211

    Article  Google Scholar 

  19. Shao J, Zhang J (2020) Bottlenet++: an end-to-end approach for feature compression in device-edge co-inference systems. In: 2020 IEEE International Conference on Communications Workshops (ICC Workshops). IEEE, pp 1–6

  20. Shi W, Hou Y, Zhou S, Niu Z, Zhang Y, Geng L (2019) Improving device-edge cooperative inference of deep learning via 2-step pruning. In: IEEE INFOCOM 2019-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). IEEE, pp 1–6

  21. Anwar S, Hwang K, Sung W (2015) Fixed point optimization of deep convolutional neural networks for object recognition. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 1131–1135

  22. Wang J, Bao W, Sun L, Zhu X, Cao B, Philip SY (2019) Private model compression via knowledge distillation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp 1190–1197

  23. Walawalkar D, Shen Z, Savvides M (2020) Online ensemble model compression using knowledge distillation. In: Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIX 16. Springer, pp 18–35

  24. Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19

  25. Gao H, Fang D, Xiao J, Hussain W, Kim JY (2022) CAMRL: a joint method of channel attention and multidimensional regression loss for 3d object detection in automated vehicles. IEEE Trans Intell Transp Syst 24:8831–8845

    Article  Google Scholar 

  26. Krizhevsky A, Hinton G et al (2009) Learning multiple layers of features from tiny images. Master's thesis, Department of Computer Science, University of Toronto, 2009. [Online] Available: https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf

  27. Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 248–255

  28. Li E, Zhou Z, Chen X (2018) Edge intelligence: on-demand deep learning model co-inference with device-edge synergy. In: Proceedings of the 2018 Workshop on Mobile Edge Communications, pp 31–36

  29. Teerapittayanon S, McDanel B, Kung H-T (2016) Branchynet: fast inference via early exiting from deep neural networks. In: 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE, pp 2464–2469

  30. Jankowski M, Gündüz D, Mikolajczyk K (2020) Joint device-edge inference over wireless links with pruning. In: 2020 IEEE 21st International Workshop on Signal Processing Advances in Wireless Communications (SPAWC). IEEE, pp 1–5

  31. Huang X, Zhou S (2020) Dynamic compression ratio selection for edge inference systems with hard deadlines. IEEE Internet Things J 7(9):8800–8810

    Article  Google Scholar 

  32. Ding C, Lu Z, Wang S, Guo S (2022) A resource-efficient feature extraction framework for image processing in iot devices. In: Proceedings of the IEEE Transactions on Mobile Computing, pp 1–14

  33. Lu Z, Ding C, Juefei-Xu F, Boddeti VN, Wang S, Yang Y (2023) Tformer: a transmission-friendly ViT model for IoT devices. IEEE Trans Parallel Distrib Syst 1:598–610. https://doi.org/10.1109/tpds.2022.3222765

    Article  Google Scholar 

  34. Ding C, Lu Z, Juefei-Xu F, Boddeti VN, Li Y, Cao J (2023) Towards transmission-friendly and robust CNN models over cloud and device. IEEE Trans Mob Comput. https://doi.org/10.1109/TMC.2022.3186496

    Article  Google Scholar 

  35. Gao H, Wang X, Wei W, Al-Dulaimi A, Xu Y (2023) Com-DDPG: task offloading based on multiagent reinforcement learning for information-communication-enhanced mobile edge computing in the internet of vehicles. In: Proceedings of the IEEE Transactions on Vehicular Technology, pp 1–14

  36. Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141

  37. Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) Eca-net: efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11534–11542

  38. Qin Z, Zhang P, Wu F, Li X (2021) Fcanet: frequency channel attention networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 783–792

  39. Lee H, Kim H-E, Nam H (2019) Srm: a style-based recalibration module for convolutional neural networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1854–1862

  40. Gao H, Xiao J, Yin Y, Liu T, Shi J (2022) A mutually supervised graph attention network for few-shot segmentation: the perspective of fully utilizing limited samples. In: Proceedings of the IEEE Transactions on Neural Networks and Learning Systems, pp 1–13

  41. Gao H, Wu Y, Xu Y, Li R, Jiang Z (2023) Neural collaborative learning for user preference discovery from biased behavior sequences. In: Proceedings of the IEEE Transactions on Computational Social Systems, pp 1–11

  42. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778

  43. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  44. McMahan B, Moore E, Ramage D, Hampson S, Arcas BA (2017) Communication-efficient learning of deep networks from decentralized data. In: Artificial Intelligence and Statistics. PMLR, pp 1273–1282

Download references

Funding

This work is funded by the National Natural Science Foundation of China (No. 61972417).

Author information

Authors and Affiliations

Authors

Contributions

Shibao Li came up with the idea, Chenxu Ma and Yunwu Zhang wrote and tested the code, Longfei Li and Chengzhi Wang drew all the drawings and tables, and all authors contributed to the writing of the article and reviewed the manuscript.

Corresponding author

Correspondence to Shibao Li.

Ethics declarations

Conflict of interest

We declare that the authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.

Ethical approval

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, S., Ma, C., Zhang, Y. et al. Attention-based variable-size feature compression module for edge inference. J Supercomput 80, 8469–8484 (2024). https://doi.org/10.1007/s11227-023-05779-y

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-023-05779-y

Keywords

Navigation