skip to main content
10.1145/3631461.3631552acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicdcnConference Proceedingsconference-collections
research-article

ASAP: Asynchronous Split Inference for Accelerated DNN Execution

Published:22 January 2024Publication History

ABSTRACT

With the increasing demand for real-time video processing in various applications, optimizing the deployment of deep learning models becomes crucial for efficient execution on resource-constrained devices. This paper investigates the problem of determining the optimal split point for video inference models to achieve enhanced throughput in a client-server setup. The split point refers to the layer number at which the model is divided between the client and server. In this study, we propose an asynchronous execution approach, where the client handles the initial portion of the video inference, while the server takes over the subsequent stages by processing the asynchronous execution requests sent by the client. By doing so, we use the available computational resources effectively and enhance overall throughput. The primary factors considered to identify the optimal split point are the Floating-Point Operations (FLOPs) of the client model and the data size transmitted to the server. We explore various split points within the model architecture and analyze their impact on performance in terms of computation and communication overhead. Furthermore, we quantify the benefits of the asynchronous approach by comparing it against traditional synchronous execution. Our experimental results indicate that asynchronous running achieves up to 32% higher throughput compared to synchronous execution across many state-of-the-art DNNs, demonstrating its potential for real-time video processing tasks.

References

  1. Ahmad Ayad, Melvin Renner, and Anke Schmeink. 2021. Improving the Communication and Computation Efficiency of Split Learning for IoT Applications. In 2021 IEEE Global Communications Conference (GLOBECOM). 01–06.Google ScholarGoogle Scholar
  2. Arian Bakhtiarnia, Nemanja Milosevic, Qi Zhang, Dragana Bajovic, and Alexandros Iosifidis. 2022. Dynamic Split Computing for Efficient Deep Edge Intelligence. In Proceedings of International Conference on Machine Learning.Google ScholarGoogle Scholar
  3. Kai-Jung Fu, Ya-Ting Yang, and Hung-Yu Wei. 2022. Split Computing Video Analytics Performance Enhancement With Auction-based Resource Management. IEEE Access 10 (2022), 106495–106505. https://doi.org/10.1109/ACCESS.2022.3211984Google ScholarGoogle ScholarCross RefCross Ref
  4. Jin Huang, Colin Samplawski, Deepak Ganesan, Benjamin Marlin, and Heesung Kwon. 2020. CLIO: enabling automatic compilation of deep learning pipelines across IoT and cloud. In ACM Digital Library.Google ScholarGoogle Scholar
  5. Yakun Huang, Xiuquan Qiao, Schahram Dustdar, and Yan Li. 2022. AoDNN: An Auto-Offloading Approach to Optimize Deep Inference for Fostering Mobile Web. In IEEE INFOCOM 2022 - IEEE Conference on Computer Communications. 2198–2207.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. The Journal of Machine Learning Research (2017).Google ScholarGoogle Scholar
  7. Woosung Kang, Siwoo Chung, Jeremy Yuhyun Kim, Youngmoon Lee, Kilho Lee, Jinkyu Lee, Kang G. Shin, and Hoon Sung Chwa. 2022. DNN-SAM: Split-and-Merge DNN Execution for Real-Time Object Detection. In 2022 IEEE 28th Real-Time and Embedded Technology and Applications Symposium (RTAS. Milano, Italy, 123–130.Google ScholarGoogle Scholar
  8. Jyotirmoy Karjee, Praveen Naik S, Kartik Anand, and Vanamala N. Bhargav. [n. d.]. Split Computing: DNN Inference Partition with Load Balancing in IoT-Edge Platform for Beyond 5G. Measurement: Sensors 23 ([n. d.]), 100409.Google ScholarGoogle Scholar
  9. Joo Chan Lee, Yongwoo Kim, SungTae Moon, and Jong Hwan Ko. 2021. A Splittable DNN-Based Object Detector for Edge-Cloud Collaborative Real-Time Video Inference. In AVSS 2021.Google ScholarGoogle ScholarCross RefCross Ref
  10. Marco Levorato and Y. Matsubara. 2020. Split computing for complex object detectors: Challenges and preliminary results. EMDL (2020).Google ScholarGoogle Scholar
  11. R. Li, Y. Wang, F. Liang, H. Qin, J. Yan, and R. Fan. 2019. Fully Quantized Network for Object Detection. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2805–2814.Google ScholarGoogle Scholar
  12. Yoshitomo Matsubara, Sabur Baidya, Davide Callegaro, Marco Levorato, and Sameer Singh. 2019. Distilled Split Deep Neural Networks for Edge-Assisted Real-Time Systems. In ACM.Google ScholarGoogle Scholar
  13. Y. Matsubara and Marco Levorato. 2021. Neural Compression and Filtering for Edge-assisted Real-time Object Detection in Challenged Networks. In 2020 25th International Conference on Pattern Recognition (ICPR). 2272–2279.Google ScholarGoogle Scholar
  14. S. Ren, K. He, R. Girshick, and J. Sun. 2017. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 6 (2017), 1137–1149.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. ASAP: Asynchronous Split Inference for Accelerated DNN Execution
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Other conferences
            ICDCN '24: Proceedings of the 25th International Conference on Distributed Computing and Networking
            January 2024
            423 pages
            ISBN:9798400716737
            DOI:10.1145/3631461

            Copyright © 2024 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 22 January 2024

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed limited
          • Article Metrics

            • Downloads (Last 12 months)42
            • Downloads (Last 6 weeks)6

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format