skip to main content
10.1145/3581783.3613785acmconferencesArticle/Chapter ViewAbstractPublication PagesmmConference Proceedingsconference-collections
research-article

ENTRO: Tackling the Encoding and Networking Trade-off in Offloaded Video Analytics

Published:27 October 2023Publication History

ABSTRACT

With the rapid advances of deep learning and the commercialization of high-definition cameras in mobile and embedded devices, the demands from latency-critical applications such as AR and XR for high-quality video analytics (HVA) are soaring. By the nature of HVA aiming at enabling detailed analytics even for small objects, its on-device implementation is suffering from thermal and battery issues, which makes offloaded HVA an attractive solution. This work provides unique observations on the tradeoff pertaining to offloaded HVA: the frame encoding time, the frame transmission time, and the HVA accuracy. Our observations pose a fundamental question: given a latency budget, how to choose the encoding option that properly combines between the encoding time and the transmission time to maximize the HVA accuracy. To answer this question, we propose an offloaded HVA system, ENTRO, which exploits this tradeoff in real-time to maximize the HVA accuracy under the latency budget. Our extensive evaluations with ENTRO implemented on Nvidia AGX Xavier and Samsung Galaxy S20 Ultra over WiFi networks show 8.8× improvement in latency without accuracy loss compared to DDS, the state-of-the-art offloaded video analytics. Our evaluation over commercial 5G and LTE networks also indicates that ENTRO flexibly adapts its encoding option under the tradeoff and enables the latency-bounded HVA with 4K frames.

References

  1. California traffic 4k free video. https://pixabay.com/videos/los-angeles-trafficcalifornia-road-53125/.Google ScholarGoogle Scholar
  2. Drone 4k free video. https://drive.google.com/file/d/1PjB4UmHkN3kbypduRjfaI8VmOxzQKco/ view?usp=sharing.Google ScholarGoogle Scholar
  3. Great lakes beach in downtown chicago 4k free video. https://www.vecteezy. com/video/1615007-great-lakes-beach-in-downtown-chicago-4k.Google ScholarGoogle Scholar
  4. Oxford street in london, england 4k free video. https://www.videezy.com/travel/ 4984-crowds-and-shoppers-on-oxford-street-in-london-england-4k.Google ScholarGoogle Scholar
  5. Road traffic 4k free video. https://www.youtube.com/watch?v=MNn9qKG2UFI.Google ScholarGoogle Scholar
  6. Artacho, B., and Savakis, A. Unipose: Unified human pose estimation in single images and videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2020), pp. 7035--7044.Google ScholarGoogle ScholarCross RefCross Ref
  7. Borji, A., Cheng, M.-M., Hou, Q., Jiang, H., and Li, J. Salient object detection: A survey. Computational visual media 5, 2 (2019), 117--150.Google ScholarGoogle Scholar
  8. Chen, T. Y.-H., Ravindranath, L., Deng, S., Bahl, P., and Balakrishnan, H. Glimpse: Continuous, real-time object recognition on mobile devices. In Proceedings of the 13th ACM Conference on Embedded Networked Sensor Systems (2015), pp. 155--168.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Corporation, N. Nvjpeg gpu-accelerated jpeg decoder, encoder and transcoder. https://developer.nvidia.com/nvjpeg, 2018.Google ScholarGoogle Scholar
  10. Corporation., N. Jetson agx xavier developer kit. https://developer.nvidia.com/embedded/jetson-agx-xavier-developer-kit, 2020.Google ScholarGoogle Scholar
  11. Du, K., Pervaiz, A., Yuan, X., Chowdhery, A., Zhang, Q., Hoffmann, H., and Jiang, J. Server-driven video streaming for deep learning inference. In Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication (2020), pp. 557--570.Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Guan, Y., Zheng, C., Zhang, X., Guo, Z., and Jiang, J. Pano: Optimizing 360 video streaming with a better understanding of quality perception. In Proceedings of the ACM Special Interest Group on Data Communication. 2019, pp. 394--407.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Hanyao, M., Jin, Y., Qian, Z., Zhang, S., and Lu, S. Edge-assisted online ondevice object detection for real-time video analytics. In IEEE INFOCOM 2021-IEEE Conference on Computer Communications (2021), IEEE, pp. 1--10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Huang, T., Zhang, R.-X., Zhou, C., and Sun, L. Qarc: Video quality aware rate control for real-time video streaming based on deep reinforcement learning. In Proceedings of the 26th ACM international conference on Multimedia (2018), pp. 1208--1216.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Hubert, B. Linux traffic control (tc). https://manpages.ubuntu.com/manpages/xenial/man8/tc.8.html.Google ScholarGoogle Scholar
  16. Itseez. Open source computer vision library. https://github.com/itseez/opencv, 2015.Google ScholarGoogle Scholar
  17. Jiang, J., Ananthanarayanan, G., Bodik, P., Sen, S., and Stoica, I. Chameleon: scalable adaptation of video analytics. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication (2018), pp. 253--266.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Jiang, S., Lin, Z., Li, Y., Shu, Y., and Liu, Y. Flexible high-resolution object detection on edge devices with tunable latency. In Proceedings of the 27th Annual International Conference on Mobile Computing and Networking (2021), pp. 559--572.Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kim, S., Bin, K., Ha, S., Lee, K., and Chong, S. ztt: learning-based dvfs with zero thermal throttling for mobile devices. In Proceedings of the 19th Annual International Conference on Mobile Systems, Applications, and Services (2021), pp. 41--53.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. KuntaiDu. Dds repository. https://github.com/KuntaiDu/dds, 2020.Google ScholarGoogle Scholar
  21. Lee, J., Lee, S., Lee, J., Sathyanarayana, S. D., Lim, H., Lee, J., Zhu, X., Ramakrishnan, S., Grunwald, D., Lee, K., et al. Perceive: Deep learning-based cellular uplink prediction using real-time scheduling patterns. In Proceedings of the 18th International Conference on Mobile Systems, Applications, and Services (2020), pp. 377--390.Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Li, Y., Padmanabhan, A., Zhao, P., Wang, Y., Xu, G. H., and Netravali, R. Reducto: On-camera filtering for resource-efficient real-time video analytics. In Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication (2020), pp. 359--376.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Liu, L., Li, H., and Gruteser, M. Edge assisted real-time object detection for mobile augmented reality. In The 25th Annual International Conference on Mobile Computing and Networking (2019), pp. 1--16.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Liu, Z., Gao, G., Sun, L., and Fang, Z. Hrdnet: high-resolution detection network for small objects. In 2021 IEEE International Conference on Multimedia and Expo (ICME) (2021), IEEE, pp. 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  25. Narayanan, A., Zhang, X., Zhu, R., Hassan, A., Jin, S., Zhu, X., Zhang, X., Rybkin, D., Yang, Z., Mao, Z. M., et al. A variegated look at 5g in the wild: performance, power, and qoe implications. In Proceedings of the 2021 ACM SIGCOMM 2021 Conference (2021), pp. 610--625.Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., and Chintala, S. Pytorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Eds. Curran Associates, Inc., 2019, pp. 8024--8035.Google ScholarGoogle Scholar
  27. Ran, X., Chen, H., Zhu, X., Liu, Z., and Chen, J. Deepdecision: A mobile deep learning framework for edge video analytics. In IEEE INFOCOM 2018-IEEE Conference on Computer Communications (2018), IEEE, pp. 1421--1429.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Samsung. Samsung galaxy s20 ultra 5g. https://www.samsung.com/us/mobile/galaxy-s20-5g/specs/, 2020.Google ScholarGoogle Scholar
  29. Tan, M., Pang, R., and Le, Q. V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (2020), pp. 10781--10790.Google ScholarGoogle ScholarCross RefCross Ref
  30. Ultralytics. Yolov5. https://https://github.com/ultralytics/yolov5, 2020.Google ScholarGoogle Scholar
  31. Vakili, A., and Gregoire, J.-C. Accurate one-way delay estimation: Limitations and improvements. IEEE Transactions on Instrumentation and Measurement 61, 9 (2012), 2428--2435.Google ScholarGoogle ScholarCross RefCross Ref
  32. Wang, C.-Y., Bochkovskiy, A., and Liao, H.-Y. M. Scaled-yolov4: Scaling cross stage partial network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (2021), pp. 13029--13038.Google ScholarGoogle ScholarCross RefCross Ref
  33. Wang, X., Yang, Z., Wu, J., Zhao, Y., and Zhou, Z. Edgeduet: Tiling small object detection for edge assisted autonomous mobile vision. In IEEE INFOCOM 2021-IEEE Conference on Computer Communications (2021), IEEE, pp. 1--10.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Xia, G.-S., Bai, X., Ding, J., Zhu, Z., Belongie, S., Luo, J., Datcu, M., Pelillo, M., and Zhang, L. Dota: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE conference on computer vision and pattern recognition (2018), pp. 3974--3983.Google ScholarGoogle ScholarCross RefCross Ref
  35. Xu, D., Zhou, A., Zhang, X., Wang, G., Liu, X., An, C., Shi, Y., Liu, L., and Ma, H. Understanding operational 5g: A first measurement study on its coverage, performance and energy consumption. In Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication (2020), pp. 479--494.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Xu, M., Xu, T., Liu, Y., and Lin, F. X. Video analytics with zero-streaming cameras. In 2021 USENIX Annual Technical Conference (USENIX ATC 21) (2021), pp. 459--472.Google ScholarGoogle Scholar
  37. Yan, M., Zhao, M., Xu, Z., Zhang, Q., Wang, G., and Su, Z. Vargfacenet: An efficient variable group convolutional neural network for lightweight face recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops (2019), pp. 0--0.Google ScholarGoogle ScholarCross RefCross Ref
  38. Yuan, Y., Chen, X., andWang, J. Object-contextual representations for semantic segmentation. In Computer Vision--ECCV 2020: 16th European Conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part VI 16 (2020), Springer, pp. 173--190.Google ScholarGoogle Scholar
  39. Zhang, B., Jin, X., Ratnasamy, S., Wawrzynek, J., and Lee, E. A. Awstream: Adaptive wide-area streaming analytics. In Proceedings of the 2018 Conference of the ACM Special Interest Group on Data Communication (2018), pp. 236--252.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Zhang, W., He, Z., Liu, L., Jia, Z., Liu, Y., Gruteser, M., Raychaudhuri, D., and Zhang, Y. Elf: accelerate high-resolution mobile deep vision with content-aware parallel offloading. In Proceedings of the 27th Annual International Conference on Mobile Computing and Networking (2021), pp. 201--214.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. ENTRO: Tackling the Encoding and Networking Trade-off in Offloaded Video Analytics

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        MM '23: Proceedings of the 31st ACM International Conference on Multimedia
        October 2023
        9913 pages
        ISBN:9798400701085
        DOI:10.1145/3581783

        Copyright © 2023 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 27 October 2023

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate995of4,171submissions,24%

        Upcoming Conference

        MM '24
        MM '24: The 32nd ACM International Conference on Multimedia
        October 28 - November 1, 2024
        Melbourne , VIC , Australia
      • Article Metrics

        • Downloads (Last 12 months)193
        • Downloads (Last 6 weeks)34

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader