ABSTRACT
With the increasing demand for real-time video processing in various applications, optimizing the deployment of deep learning models becomes crucial for efficient execution on resource-constrained devices. This paper investigates the problem of determining the optimal split point for video inference models to achieve enhanced throughput in a client-server setup. The split point refers to the layer number at which the model is divided between the client and server. In this study, we propose an asynchronous execution approach, where the client handles the initial portion of the video inference, while the server takes over the subsequent stages by processing the asynchronous execution requests sent by the client. By doing so, we use the available computational resources effectively and enhance overall throughput. The primary factors considered to identify the optimal split point are the Floating-Point Operations (FLOPs) of the client model and the data size transmitted to the server. We explore various split points within the model architecture and analyze their impact on performance in terms of computation and communication overhead. Furthermore, we quantify the benefits of the asynchronous approach by comparing it against traditional synchronous execution. Our experimental results indicate that asynchronous running achieves up to 32% higher throughput compared to synchronous execution across many state-of-the-art DNNs, demonstrating its potential for real-time video processing tasks.
- Ahmad Ayad, Melvin Renner, and Anke Schmeink. 2021. Improving the Communication and Computation Efficiency of Split Learning for IoT Applications. In 2021 IEEE Global Communications Conference (GLOBECOM). 01–06.Google Scholar
- Arian Bakhtiarnia, Nemanja Milosevic, Qi Zhang, Dragana Bajovic, and Alexandros Iosifidis. 2022. Dynamic Split Computing for Efficient Deep Edge Intelligence. In Proceedings of International Conference on Machine Learning.Google Scholar
- Kai-Jung Fu, Ya-Ting Yang, and Hung-Yu Wei. 2022. Split Computing Video Analytics Performance Enhancement With Auction-based Resource Management. IEEE Access 10 (2022), 106495–106505. https://doi.org/10.1109/ACCESS.2022.3211984Google ScholarCross Ref
- Jin Huang, Colin Samplawski, Deepak Ganesan, Benjamin Marlin, and Heesung Kwon. 2020. CLIO: enabling automatic compilation of deep learning pipelines across IoT and cloud. In ACM Digital Library.Google Scholar
- Yakun Huang, Xiuquan Qiao, Schahram Dustdar, and Yan Li. 2022. AoDNN: An Auto-Offloading Approach to Optimize Deep Inference for Fostering Mobile Web. In IEEE INFOCOM 2022 - IEEE Conference on Computer Communications. 2198–2207.Google ScholarDigital Library
- I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. The Journal of Machine Learning Research (2017).Google Scholar
- Woosung Kang, Siwoo Chung, Jeremy Yuhyun Kim, Youngmoon Lee, Kilho Lee, Jinkyu Lee, Kang G. Shin, and Hoon Sung Chwa. 2022. DNN-SAM: Split-and-Merge DNN Execution for Real-Time Object Detection. In 2022 IEEE 28th Real-Time and Embedded Technology and Applications Symposium (RTAS. Milano, Italy, 123–130.Google Scholar
- Jyotirmoy Karjee, Praveen Naik S, Kartik Anand, and Vanamala N. Bhargav. [n. d.]. Split Computing: DNN Inference Partition with Load Balancing in IoT-Edge Platform for Beyond 5G. Measurement: Sensors 23 ([n. d.]), 100409.Google Scholar
- Joo Chan Lee, Yongwoo Kim, SungTae Moon, and Jong Hwan Ko. 2021. A Splittable DNN-Based Object Detector for Edge-Cloud Collaborative Real-Time Video Inference. In AVSS 2021.Google ScholarCross Ref
- Marco Levorato and Y. Matsubara. 2020. Split computing for complex object detectors: Challenges and preliminary results. EMDL (2020).Google Scholar
- R. Li, Y. Wang, F. Liang, H. Qin, J. Yan, and R. Fan. 2019. Fully Quantized Network for Object Detection. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2805–2814.Google Scholar
- Yoshitomo Matsubara, Sabur Baidya, Davide Callegaro, Marco Levorato, and Sameer Singh. 2019. Distilled Split Deep Neural Networks for Edge-Assisted Real-Time Systems. In ACM.Google Scholar
- Y. Matsubara and Marco Levorato. 2021. Neural Compression and Filtering for Edge-assisted Real-time Object Detection in Challenged Networks. In 2020 25th International Conference on Pattern Recognition (ICPR). 2272–2279.Google Scholar
- S. Ren, K. He, R. Girshick, and J. Sun. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 6 (2017), 1137–1149.Google ScholarDigital Library
Index Terms
- ASAP: Asynchronous Split Inference for Accelerated DNN Execution
Recommendations
Accelerated test execution using GPUs
ASE '14: Proceedings of the 29th ACM/IEEE International Conference on Automated Software EngineeringAs product life-cycles become shorter and the scale and complexity of systems increase, accelerating the execution of large test suites gains importance. Existing research has primarily focussed on techniques that reduce the size of the test suite. By ...
Optimization of Asynchronous Logging Kernels for a GPU Accelerated CFD Solver
Computational Science – ICCS 2023AbstractThanks to their large number of threads, GPUs allow massive parallelization, hence good performance for numerical simulations, but also make asynchronous execution more common. Kernels that do not actively take part in a computation can be ...
ASAP: reconciling asynchronous real-time operations and proofs of execution in simple embedded systems
DAC '22: Proceedings of the 59th ACM/IEEE Design Automation ConferenceEmbedded devices are increasingly ubiquitous and their importance is hard to overestimate. While they often support safety-critical functions (e.g., in medical devices and sensor-alarm combinations), they are usually implemented under strict cost/energy ...
Comments