research-article

ASAP: Asynchronous Split Inference for Accelerated DNN Execution

Authors:
Waleed Hassan Mubark

Science and Engineering, University of Missouri-Kansas City, MO, USA, USA and Management Information System, University of Jeddah, Saudi Arabia

Science and Engineering, University of Missouri-Kansas City, MO, USA, USA and Management Information System, University of Jeddah, Saudi Arabia

0000-0001-7692-1496
View Profile

,
Jagannath Guptha Kasula

Science and Engineering, University of Missouri-Kansas City, MO, USA, USA

Science and Engineering, University of Missouri-Kansas City, MO, USA, USA

0009-0009-9081-3841
View Profile

,
Md Yusuf Sarwar Uddin

Science and Engineering, University of Missouri-Kansas City, MO, USA, USA

Science and Engineering, University of Missouri-Kansas City, MO, USA, USA

0000-0003-2184-0140
View Profile

ICDCN '24: Proceedings of the 25th International Conference on Distributed Computing and NetworkingJanuary 2024Pages 32–44https://doi.org/10.1145/3631461.3631552

Published:22 January 2024Publication History

ICDCN '24: Proceedings of the 25th International Conference on Distributed Computing and Networking

Pages 32–44

ABSTRACT

With the increasing demand for real-time video processing in various applications, optimizing the deployment of deep learning models becomes crucial for efficient execution on resource-constrained devices. This paper investigates the problem of determining the optimal split point for video inference models to achieve enhanced throughput in a client-server setup. The split point refers to the layer number at which the model is divided between the client and server. In this study, we propose an asynchronous execution approach, where the client handles the initial portion of the video inference, while the server takes over the subsequent stages by processing the asynchronous execution requests sent by the client. By doing so, we use the available computational resources effectively and enhance overall throughput. The primary factors considered to identify the optimal split point are the Floating-Point Operations (FLOPs) of the client model and the data size transmitted to the server. We explore various split points within the model architecture and analyze their impact on performance in terms of computation and communication overhead. Furthermore, we quantify the benefits of the asynchronous approach by comparing it against traditional synchronous execution. Our experimental results indicate that asynchronous running achieves up to 32% higher throughput compared to synchronous execution across many state-of-the-art DNNs, demonstrating its potential for real-time video processing tasks.

References

Ahmad Ayad, Melvin Renner, and Anke Schmeink. 2021. Improving the Communication and Computation Efficiency of Split Learning for IoT Applications. In 2021 IEEE Global Communications Conference (GLOBECOM). 01–06.Google Scholar
Arian Bakhtiarnia, Nemanja Milosevic, Qi Zhang, Dragana Bajovic, and Alexandros Iosifidis. 2022. Dynamic Split Computing for Efficient Deep Edge Intelligence. In Proceedings of International Conference on Machine Learning.Google Scholar
Kai-Jung Fu, Ya-Ting Yang, and Hung-Yu Wei. 2022. Split Computing Video Analytics Performance Enhancement With Auction-based Resource Management. IEEE Access 10 (2022), 106495–106505. https://doi.org/10.1109/ACCESS.2022.3211984Google ScholarCross Ref
Jin Huang, Colin Samplawski, Deepak Ganesan, Benjamin Marlin, and Heesung Kwon. 2020. CLIO: enabling automatic compilation of deep learning pipelines across IoT and cloud. In ACM Digital Library.Google Scholar
Yakun Huang, Xiuquan Qiao, Schahram Dustdar, and Yan Li. 2022. AoDNN: An Auto-Offloading Approach to Optimize Deep Inference for Fostering Mobile Web. In IEEE INFOCOM 2022 - IEEE Conference on Computer Communications. 2198–2207.Google ScholarDigital Library
I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. The Journal of Machine Learning Research (2017).Google Scholar
Woosung Kang, Siwoo Chung, Jeremy Yuhyun Kim, Youngmoon Lee, Kilho Lee, Jinkyu Lee, Kang G. Shin, and Hoon Sung Chwa. 2022. DNN-SAM: Split-and-Merge DNN Execution for Real-Time Object Detection. In 2022 IEEE 28th Real-Time and Embedded Technology and Applications Symposium (RTAS. Milano, Italy, 123–130.Google Scholar
Jyotirmoy Karjee, Praveen Naik S, Kartik Anand, and Vanamala N. Bhargav. [n. d.]. Split Computing: DNN Inference Partition with Load Balancing in IoT-Edge Platform for Beyond 5G. Measurement: Sensors 23 ([n. d.]), 100409.Google Scholar
Joo Chan Lee, Yongwoo Kim, SungTae Moon, and Jong Hwan Ko. 2021. A Splittable DNN-Based Object Detector for Edge-Cloud Collaborative Real-Time Video Inference. In AVSS 2021.Google ScholarCross Ref
Marco Levorato and Y. Matsubara. 2020. Split computing for complex object detectors: Challenges and preliminary results. EMDL (2020).Google Scholar
R. Li, Y. Wang, F. Liang, H. Qin, J. Yan, and R. Fan. 2019. Fully Quantized Network for Object Detection. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2805–2814.Google Scholar
Yoshitomo Matsubara, Sabur Baidya, Davide Callegaro, Marco Levorato, and Sameer Singh. 2019. Distilled Split Deep Neural Networks for Edge-Assisted Real-Time Systems. In ACM.Google Scholar
Y. Matsubara and Marco Levorato. 2021. Neural Compression and Filtering for Edge-assisted Real-time Object Detection in Challenged Networks. In 2020 25th International Conference on Pattern Recognition (ICPR). 2272–2279.Google Scholar
S. Ren, K. He, R. Girshick, and J. Sun. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 6 (2017), 1137–1149.Google ScholarDigital Library

Index Terms

ASAP: Asynchronous Split Inference for Accelerated DNN Execution

Index terms have been assigned to the content through auto-classification.

Recommendations

Accelerated test execution using GPUs
ASE '14: Proceedings of the 29th ACM/IEEE International Conference on Automated Software Engineering

As product life-cycles become shorter and the scale and complexity of systems increase, accelerating the execution of large test suites gains importance. Existing research has primarily focussed on techniques that reduce the size of the test suite. By ...
Read More
Optimization of Asynchronous Logging Kernels for a GPU Accelerated CFD Solver
Computational Science – ICCS 2023
Abstract
Thanks to their large number of threads, GPUs allow massive parallelization, hence good performance for numerical simulations, but also make asynchronous execution more common. Kernels that do not actively take part in a computation can be ...
Read More
ASAP: reconciling asynchronous real-time operations and proofs of execution in simple embedded systems
DAC '22: Proceedings of the 59th ACM/IEEE Design Automation Conference

Embedded devices are increasingly ubiquitous and their importance is hard to overestimate. While they often support safety-critical functions (e.g., in medical devices and sensor-alarm combinations), they are usually implemented under strict cost/energy ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

ICDCN '24: Proceedings of the 25th International Conference on Distributed Computing and Networking
January 2024
423 pages
ISBN:9798400716737
DOI:10.1145/3631461

Copyright © 2024 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 January 2024
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
ACM proceedings
Split DNN
Split inference.
asynchronous inferencing
video inferencing
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 42
  Total Downloads
- Downloads (Last 12 months)42
- Downloads (Last 6 weeks)6
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

ASAP: Asynchronous Split Inference for Accelerated DNN Execution

ICDCN '24: Proceedings of the 25th International Conference on Distributed Computing and Networking

ABSTRACT

References

Cited By

Index Terms

Recommendations

Accelerated test execution using GPUs

Optimization of Asynchronous Logging Kernels for a GPU Accelerated CFD Solver

ASAP: reconciling asynchronous real-time operations and proofs of execution in simple embedded systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

ASAP: Asynchronous Split Inference for Accelerated DNN Execution

ICDCN '24: Proceedings of the 25th International Conference on Distributed Computing and Networking

ABSTRACT

References

Cited By

Index Terms

Recommendations

Accelerated test execution using GPUs

Optimization of Asynchronous Logging Kernels for a GPU Accelerated CFD Solver

ASAP: reconciling asynchronous real-time operations and proofs of execution in simple embedded systems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media