Abstract
Artificial intelligence has made significant breakthroughs in many fields, especially with the broad deployment of edge devices, which provides opportunities to develop and apply various intelligent models in edge networks. Edge device-server co-inference system has gradually become the mainstream of edge intelligent computing. However, the existing feature procession works in the edge inference framework neglect the focus on whether features are important, and the processed features are still redundant, affecting the inference efficiency. In this paper, we propose a novel attention-based variable-size feature compression module to enhance edge systems’ inference efficiency by leveraging input data’s varying importance levels. First, a multi-scale attention mechanism is introduced, which operates jointly in the channel spatial to effectively compute importance weights from the intermediate output features of the edge devices. These weights are then utilized to assign different transmission probabilities, filtering out irrelevant feature data and prioritizing task-relevant information. Second, the new loss algorithm and progressive model training strategy are designed to optimize the proposed module, enabling the model to adapt to the reduced feature data gradually and effectively. Finally, experimental results on CIFAR-10 and ImageNet datasets demonstrate the effectiveness of our proposed solution, showcasing a significant reduction in the data output volume of edge devices and minimizing communication overhead while ensuring minimal loss in model accuracy.





Similar content being viewed by others
Data availability
Not applicable.
References
Guo M-H, Xu T-X, Liu J-J, Liu Z-N, Jiang P-T, Mu T-J, Zhang S-H, Martin RR, Cheng M-M, Hu S-M (2022) Attention mechanisms in computer vision: a survey. Comput Vis Media 8(3):331–368
Voulodimos A, Doulamis N, Doulamis A, Protopapadakis E et al (2018) Deep learning for computer vision: a brief review. Comput Intell Neurosci 2018(2018):1–13
Minaee S, Kalchbrenner N, Cambria E, Nikzad N, Chenaghlu M, Gao J (2021) Deep learning-based text classification: a comprehensive review. ACM Comput Surv: CSUR 54(3):1–40
Hu H-X, Jiang Z-W, Zhao Y-F, Zhang Y, Wang H, Wang W (2020) Network representation learning-enhanced multisource information fusion model for poi recommendation in smart city. IEEE Internet Things J 8(12):9539–9548
Eshratifar AE, Abrishami MS, Pedram M (2019) Jointdnn: an efficient training and inference engine for intelligent mobile cloud computing services. IEEE Trans Mob Comput 20(2):565–576
Liang X, Zhang Y, Wang G, Xu S (2019) A deep learning model for transportation mode detection based on smartphone sensing data. IEEE Trans Intell Transp Syst 21(12):5223–5235
Qian T, Shao C, Wang X, Shahidehpour M (2019) Deep reinforcement learning for EV charging navigation by coordinating smart grid and intelligent transportation system. IEEE Trans Smart Grid 11(2):1714–1723
Deng S, Zhao H, Fang W, Yin J, Dustdar S, Zomaya AY (2020) Edge intelligence: the confluence of edge computing and artificial intelligence. IEEE Internet Things J 7(8):7457–7469
Xu D, Li T, Li Y, Su X, Tarkoma S, Hui P (2020) A survey on edge intelligence. arXiv:2003.12172
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 25(2012):1097–1105
Zhou Z, Chen X, Li E, Zeng L, Luo K, Zhang J (2019) Edge intelligence: paving the last mile of artificial intelligence with edge computing. Proc IEEE 107(8):1738–1762
Shao J, Mao Y, Zhang J (2022) Task-oriented communication for multidevice cooperative edge inference. IEEE Trans Wirel Commun 22(1):73–87
Shao J, Zhang J (2020) Communication-computation trade-off in resource-constrained edge inference. IEEE Commun Mag 58(12):20–26
Shi Y, Yang K, Jiang T, Zhang J, Letaief KB (2020) Communication-efficient edge AI: algorithms and systems. IEEE Commun Surv Tutor 22(4):2167–2191
Ko JH, Na T, Amir MF, Mukhopadhyay S (2018) Edge-host partitioning of deep neural networks with feature space encoding for resource-constrained internet-of-things platforms. In: 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS). IEEE, pp 1–6
Kang Y, Hauswald J, Gao C, Rovinski A, Mudge T, Mars J, Tang L (2017) Neurosurgeon: collaborative intelligence between the cloud and mobile edge. ACM SIGARCH Comput Archit News 45(1):615–629
Li H, Hu C, Jiang J, Wang Z, Wen Y, Zhu W (2018) Jalad: joint accuracy-and latency-aware deep structure decoupling for edge-cloud execution. In: 2018 IEEE 24th International Conference on Parallel and Distributed Systems (ICPADS). IEEE, pp 671–678
Shao J, Mao Y, Zhang J (2021) Learning task-oriented communication for edge inference: an information bottleneck approach. IEEE J Sel Areas Commun 40(1):197–211
Shao J, Zhang J (2020) Bottlenet++: an end-to-end approach for feature compression in device-edge co-inference systems. In: 2020 IEEE International Conference on Communications Workshops (ICC Workshops). IEEE, pp 1–6
Shi W, Hou Y, Zhou S, Niu Z, Zhang Y, Geng L (2019) Improving device-edge cooperative inference of deep learning via 2-step pruning. In: IEEE INFOCOM 2019-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS). IEEE, pp 1–6
Anwar S, Hwang K, Sung W (2015) Fixed point optimization of deep convolutional neural networks for object recognition. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp 1131–1135
Wang J, Bao W, Sun L, Zhu X, Cao B, Philip SY (2019) Private model compression via knowledge distillation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp 1190–1197
Walawalkar D, Shen Z, Savvides M (2020) Online ensemble model compression using knowledge distillation. In: Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XIX 16. Springer, pp 18–35
Woo S, Park J, Lee J-Y, Kweon IS (2018) Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp 3–19
Gao H, Fang D, Xiao J, Hussain W, Kim JY (2022) CAMRL: a joint method of channel attention and multidimensional regression loss for 3d object detection in automated vehicles. IEEE Trans Intell Transp Syst 24:8831–8845
Krizhevsky A, Hinton G et al (2009) Learning multiple layers of features from tiny images. Master's thesis, Department of Computer Science, University of Toronto, 2009. [Online] Available: https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 248–255
Li E, Zhou Z, Chen X (2018) Edge intelligence: on-demand deep learning model co-inference with device-edge synergy. In: Proceedings of the 2018 Workshop on Mobile Edge Communications, pp 31–36
Teerapittayanon S, McDanel B, Kung H-T (2016) Branchynet: fast inference via early exiting from deep neural networks. In: 2016 23rd International Conference on Pattern Recognition (ICPR). IEEE, pp 2464–2469
Jankowski M, Gündüz D, Mikolajczyk K (2020) Joint device-edge inference over wireless links with pruning. In: 2020 IEEE 21st International Workshop on Signal Processing Advances in Wireless Communications (SPAWC). IEEE, pp 1–5
Huang X, Zhou S (2020) Dynamic compression ratio selection for edge inference systems with hard deadlines. IEEE Internet Things J 7(9):8800–8810
Ding C, Lu Z, Wang S, Guo S (2022) A resource-efficient feature extraction framework for image processing in iot devices. In: Proceedings of the IEEE Transactions on Mobile Computing, pp 1–14
Lu Z, Ding C, Juefei-Xu F, Boddeti VN, Wang S, Yang Y (2023) Tformer: a transmission-friendly ViT model for IoT devices. IEEE Trans Parallel Distrib Syst 1:598–610. https://doi.org/10.1109/tpds.2022.3222765
Ding C, Lu Z, Juefei-Xu F, Boddeti VN, Li Y, Cao J (2023) Towards transmission-friendly and robust CNN models over cloud and device. IEEE Trans Mob Comput. https://doi.org/10.1109/TMC.2022.3186496
Gao H, Wang X, Wei W, Al-Dulaimi A, Xu Y (2023) Com-DDPG: task offloading based on multiagent reinforcement learning for information-communication-enhanced mobile edge computing in the internet of vehicles. In: Proceedings of the IEEE Transactions on Vehicular Technology, pp 1–14
Hu J, Shen L, Sun G (2018) Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 7132–7141
Wang Q, Wu B, Zhu P, Li P, Zuo W, Hu Q (2020) Eca-net: efficient channel attention for deep convolutional neural networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 11534–11542
Qin Z, Zhang P, Wu F, Li X (2021) Fcanet: frequency channel attention networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 783–792
Lee H, Kim H-E, Nam H (2019) Srm: a style-based recalibration module for convolutional neural networks. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp 1854–1862
Gao H, Xiao J, Yin Y, Liu T, Shi J (2022) A mutually supervised graph attention network for few-shot segmentation: the perspective of fully utilizing limited samples. In: Proceedings of the IEEE Transactions on Neural Networks and Learning Systems, pp 1–13
Gao H, Wu Y, Xu Y, Li R, Jiang Z (2023) Neural collaborative learning for user preference discovery from biased behavior sequences. In: Proceedings of the IEEE Transactions on Computational Social Systems, pp 1–11
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 770–778
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556
McMahan B, Moore E, Ramage D, Hampson S, Arcas BA (2017) Communication-efficient learning of deep networks from decentralized data. In: Artificial Intelligence and Statistics. PMLR, pp 1273–1282
Funding
This work is funded by the National Natural Science Foundation of China (No. 61972417).
Author information
Authors and Affiliations
Contributions
Shibao Li came up with the idea, Chenxu Ma and Yunwu Zhang wrote and tested the code, Longfei Li and Chengzhi Wang drew all the drawings and tables, and all authors contributed to the writing of the article and reviewed the manuscript.
Corresponding author
Ethics declarations
Conflict of interest
We declare that the authors have no competing interests as defined by Springer, or other interests that might be perceived to influence the results and/or discussion reported in this paper.
Ethical approval
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, S., Ma, C., Zhang, Y. et al. Attention-based variable-size feature compression module for edge inference. J Supercomput 80, 8469–8484 (2024). https://doi.org/10.1007/s11227-023-05779-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-023-05779-y