Abstract
As users demand higher inference accuracy, the number of network layers and neurons in Deep Neural Network (DNN) models continues to grow, resulting in increasingly demanding requirements for computational power, storage, and other resources for DNN inference tasks. On the edge side, partitioning resource-intensive DNN inference tasks into multiple dependent subtasks and deploying them to different nodes has become a crucial approach to ensuring task computation efficiency. To address the problem of fine-grained partitioning of DNN inference tasks with directed acyclic graph (DAG) topology, a graph-cut-based method for DNN inference task partitioning and deployment is proposed. Firstly, a distributed edge-terminal collaborative architecture is constructed to model the partitioning and deployment of DNN inference tasks with DAG topology. Then, the problem of optimal partitioning and deployment of DNN inference tasks with minimal latency and energy consumption is formulated. Finally, graph-cut-based algorithms for DNN inference task partitioning and computation resource allocation are designed. Experimental results demonstrate that the proposed method optimally utilizes the limited and distributed resources at the edge, effectively ensuring service timeliness.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, vol. 25 (2012)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:14091556 (2014)
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
Howard, A.G., Zhu, M., Chen, B., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:170404861 (2017)
Mao, H., Yao, S., Tang, T., et al.: Towards real-time object detection on embedded systems. IEEE Trans. Emerg. Top. Comput. 6(3), 417–431 (2018)
Redmon, J., Divvala, S., Girshick, R., et al.: You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 779–788 (2016)
Liu, W., et al.: SSD: single shot MultiBox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Zhang, Z., Wang, C.: SaPus: self-adaptive parameter update strategy for DNN training on multi-GPU clusters. IEEE Trans. Parallel Distrib. Syst. 33(7), 1569–1580 (2022)
Rehr, R., Gerkmann, T.: SNR-based features and diverse training data for robust DNN-based speech enhancement. IEEE/ACM Trans. Audio Speech Lang. Process. 29, 1937–1949 (2021)
Shin, D., Kim, G., Jo, J., et al.: Low complexity gradient computation techniques to accelerate deep neural network training. IEEE Trans. Neural Netw. Learn. Syst., 1–15 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Shi, X., Song, Y., Dai, M. (2024). Graph-Cut Based DNN Inference Task Partitioning and Deployment Method. In: Jin, H., Pan, Y., Lu, J. (eds) Computer Networks and IoT. IAIC 2023. Communications in Computer and Information Science, vol 2060. Springer, Singapore. https://doi.org/10.1007/978-981-97-1332-5_12
Download citation
DOI: https://doi.org/10.1007/978-981-97-1332-5_12
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-1331-8
Online ISBN: 978-981-97-1332-5
eBook Packages: Computer ScienceComputer Science (R0)