Skip to main content

Niagara: Scheduling DNN Inference Services on Heterogeneous Edge Processors

  • Conference paper
  • First Online:
Service-Oriented Computing (ICSOC 2023)

Abstract

Intelligent applications heavily rely on deep neural network (DNN) inference services executed on edge devices to fulfill functional prerequisites while safeguarding user data privacy. However, the execution of such DNN services on resource-constrained edge devices poses a significant challenge: low throughput of inference tasks. To this end, this paper proposes Niagara, a novel system designed to maximize system throughput by judiciously scheduling DNN inference services on heterogeneous processors available on edge devices. Niagara faces two critical challenges: uncertain workload dynamics and high scheduling complexity. To effectively address these challenges, Niagara employs a predictive model to anticipate incoming workload patterns and orchestrates the allocation of services across heterogeneous processors through a combination of offline scheduling optimization and online service dispatching strategies. We have implemented Niagara and conducted thorough experiments. The results demonstrate that Niagara surpasses state-of-the-art approaches by elevating DNN inference throughput by up to 4.67\(\times \), all while satisfying the same stringent inference latency requirements. Furthermore, Niagara has been successfully deployed in real-world power supply substations to detect violations, ensuring uninterrupted, accident-free operation during its six-month deployment period.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 74.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Cortex A57. https://en.wikipedia.org/wiki/ARM_Cortex-A57

  2. Gurobi solver. http://www.gurobi.com

  3. Jetson TX2. https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-tx2/

  4. Kirin 9000. https://www.hisilicon.com/cn/products/Kirin/Kirin-flagship-chips/Kirin-9000

  5. Powerful 64-bit heterogeneous processing, advanced analytics and 4G LTE redefine the IP camera. https://www.edge-ai-vision.com/2015/11/qualcomm-announces-ip-camera-reference-platform-with-high-end-processing-imaging-and-analytics-capabilities-to-advance-security-cameras/

  6. Qualcomm snapdragon 625 IP camera. https://anyconnect.com/recommended-sbcs/thundercomm/thundercomm-qualcomm-snapdragon-625-ip-camera

  7. Snapdragon 650 IP camera brings consciousness to camera security. https://www.qualcomm.com/news/onq/2016/02/snapdragon-650-ip-camera-brings-consciousness-camera-security

  8. Snapdragon 750G SOC. https://www.qualcomm.com/products/mobile/snapdragon/smartphones/snapdragon-7-series-mobile-platforms/snapdragon-750g-5g-mobile-platform

  9. Snapdragon 855 SOC. https://www.qualcomm.com/products/mobile/snapdragon/smartphones/snapdragon-8-series-mobile-platforms/snapdragon-855-mobile-platform

  10. Snapdragon 865 SOC. https://www.qualcomm.com/products/mobile/snapdragon/smartphones/snapdragon-8-series-mobile-platforms/snapdragon-865-plus-5g-mobile-platform

  11. Tflite. https://www.tensorflow.org/lite/

  12. Edge TPU (2021). https://github.com/XiaoMi/mace

  13. Almeida, M., Laskaridis, S., Mehrotra, A., Dudziak, L., Leontiadis, I., Lane, N.D.: Smart at what cost? Characterising mobile deep neural networks in the wild. In: ACM IMC, pp. 658–672 (2021)

    Google Scholar 

  14. Chai, F., Zhang, Q., Yao, H., Xin, X., Gao, R., Guizani, M.: Joint multi-task offloading and resource allocation for mobile edge computing systems in satellite IoT. IEEE Trans. Veh. Technol. 72(6), 7783–7795 (2023)

    Article  Google Scholar 

  15. Danielsson, P.E.: Euclidean distance mapping. Comput. Graphics Image Process. 14(3), 227–248 (1980)

    Article  Google Scholar 

  16. Diggle, P., Al-Wasel, I.: Time series (1990)

    Google Scholar 

  17. Dorigo, M., Gambardella, L.M.: Ant colonies for the travelling salesman problem. Biosystems 43(2), 73–81 (1997)

    Google Scholar 

  18. Eshraghi, N., Liang, B.: Joint offloading decision and resource allocation with uncertain task computing requirement. In: IEEE INFOCOM, pp. 1414–1422 (2019)

    Google Scholar 

  19. Fu, X., Tang, B., Guo, F., Kang, L.: Priority and dependency-based DAG tasks offloading in fog/edge collaborative environment. In: CSCWD, pp. 440–445 (2021)

    Google Scholar 

  20. Hu, S., et al.: Temporal-aware qos prediction via dynamic graph neural collaborative learning. In: Troya, J., Medjahed, B., Piattini, M., Yao, L., Fernández, P., Ruiz-Cortés, A. (eds.) ICSOC, vol. 13740, pp. 125–133. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20984-0_8

  21. Huang, V., Wang, C., Ma, H., Chen, G., Christopher, K.: Cost-aware dynamic multi-workflow scheduling in cloud data center using evolutionary reinforcement learning. In: Troya, J., Medjahed, B., Piattini, M., Yao, L., Fernández, P., Ruiz-Cortés, A. (eds.) ICSOC, vol. 13740, pp. 449–464. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20984-0_32

  22. Jeong, J.S., et al.: Band: coordinated multi-DNN inference on heterogeneous mobile processors. In: ACM MobiSys, pp. 235–247 (2022)

    Google Scholar 

  23. Kim, Y., Kim, J., Chae, D., Kim, D., Kim, J.: \(\mu \)layer: low latency on-device inference using cooperative single-layer acceleration and processor-friendly quantization. In: EuroSys, pp. 1–15 (2019)

    Google Scholar 

  24. Li, Z., Yang, C., Huang, X., Zeng, W., Xie, S.: CoOR: collaborative task offloading and service caching replacement for vehicular edge computing networks. IEEE Trans. Veh. Technol., 1–6 (2023)

    Google Scholar 

  25. Liao, H., Li, X., Guo, D., Kang, W., Li, J.: Dependency-aware application assigning and scheduling in edge computing. IEEE IoT (2021)

    Google Scholar 

  26. Liu, J., Ren, J., Zhang, Y., Peng, X., Zhang, Y., Yang, Y.: Efficient dependent task offloading for multiple applications in MEC-cloud system. IEEE TMC (2021)

    Google Scholar 

  27. Meng, Z., Xu, H., Huang, L., Xi, P., Yang, S.: Achieving energy efficiency through dynamic computing offloading in mobile edge-clouds. In: IEEE MASS, pp. 175–183. IEEE (2018)

    Google Scholar 

  28. Shen, H., et al.: Nexus: a GPU cluster engine for accelerating DNN-based video analysis. In: ACM SOSP, pp. 322–337 (2019)

    Google Scholar 

  29. Sze, V., Chen, Y.H., Yang, T.J., Emer, J.S.: Efficient processing of deep neural networks: a tutorial and survey. Proc. IEEE 105(12), 2295–2329 (2017)

    Article  Google Scholar 

  30. Tan, T., Cao, G.: FastVA: deep learning video analytics through edge processing and NPU in mobile. In: IEEE INFOCOM, pp. 1947–1956. IEEE (2020)

    Google Scholar 

  31. Wang, M., Ding, S., Cao, T., Liu, Y., Xu, F.: AsyMo: scalable and efficient deep-learning inference on asymmetric mobile CPUs. In: ACM MobiCom, pp. 215–228 (2021)

    Google Scholar 

  32. Wei, T., Zhang, P., Dong, H., Jin, H., Bouguettaya, A.: Mobility-aware proactive QoS monitoring for mobile edge computing. In: Troya, J., Medjahed, B., Piattini, M., Yao, L., Fernández, P., Ruiz-Cortés, A. (eds.) ICSOC, vol. 13740, pp. 134–142. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20984-0_9

  33. Wei, W.W.: Time series analysis. In: The Oxford Handbook of Quantitative Methods in Psychology, vol. 2 (2006)

    Google Scholar 

  34. Xiao, H., Xu, C., Ma, Y., Yang, S., Zhong, L., Muntean, G.M.: Edge intelligence: a computational task offloading scheme for dependent IoT application. IEEE Wirel. Commun. 21(9), 7222–7237 (2022)

    Article  Google Scholar 

  35. Xu, M., Zhang, X., Liu, Y., Huang, G., Liu, X., Lin, F.X.: Approximate query service on autonomous IoT cameras. In: ACM MobiSys, pp. 191–205 (2020)

    Google Scholar 

  36. Yang, Y., Chen, G., Ma, H., Zhang, M.: Dual-tree genetic programming for deadline-constrained dynamic workflow scheduling in cloud. In: Troya, J., Medjahed, B., Piattini, M., Yao, L., Fernández, P., Ruiz-Cortés, A. (eds.) ICSOC, vol. 13740, pp. 433–448. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20984-0_31

  37. Yeo, H., Chong, C.J., Jung, Y., Ye, J., Han, D.: NEMO: enabling neural-enhanced video streaming on commodity mobile devices. In: ACM MobiCom, pp. 1–14 (2020)

    Google Scholar 

  38. Yi, J., Lee, Y.: Heimdall: mobile GPU coordination platform for augmented reality applications. In: ACM MobiCom, pp. 1–14 (2020)

    Google Scholar 

  39. Zhang, J., et al.: MobiPose: real-time multi-person pose estimation on mobile devices. In: ACM SenSys, pp. 136–149 (2020)

    Google Scholar 

  40. Zhao, G., Xu, H., Zhao, Y., Qiao, C., Huang, L.: Offloading tasks with dependency and service caching in mobile edge computing. IEEE Trans. Parallel Distrib. Syst. 32(11), 2777–2792 (2021)

    Article  Google Scholar 

  41. Zhao, Z., Luo, H., Chu, S.C., Shang, Y., Wu, X.: An immersive online shopping system based on virtual reality. J. Netw. Intell. 3(4), 235–246 (2018)

    Google Scholar 

Download references

Acknowledgement

This work was supported by the National Key Research and Development Program of China under the grant number 2022YFB4500700, the National Natural Science Foundation of China under the grant numbers 62325201, 62172008, 62102009, and 62102045, the National Natural Science Fund for the Excellent Young Scientists Fund Program (Overseas), the China Postdoctoral Science Foundation 8206300713, the Beijing Outstanding Young Scientist Program under the grant number BJJWZYJH01201910001004, and Center for Data Space Technology and System, Peking University.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Mengwei Xu , Xin Jin or Yun Ma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xu, D. et al. (2023). Niagara: Scheduling DNN Inference Services on Heterogeneous Edge Processors. In: Monti, F., Rinderle-Ma, S., Ruiz Cortés, A., Zheng, Z., Mecella, M. (eds) Service-Oriented Computing. ICSOC 2023. Lecture Notes in Computer Science, vol 14419. Springer, Cham. https://doi.org/10.1007/978-3-031-48421-6_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-48421-6_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-48420-9

  • Online ISBN: 978-3-031-48421-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics