Niagara: Scheduling DNN Inference Services on Heterogeneous Edge Processors

Xu, Daliang; Li, Qing; Xu, Mengwei; Huang, Kang; Huang, Gang; Wang, Shangguang; Jin, Xin; Ma, Yun; Liu, Xuanzhe

doi:10.1007/978-3-031-48421-6_6

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14419))

Included in the following conference series:

International Conference on Service-Oriented Computing

934 Accesses

Abstract

Intelligent applications heavily rely on deep neural network (DNN) inference services executed on edge devices to fulfill functional prerequisites while safeguarding user data privacy. However, the execution of such DNN services on resource-constrained edge devices poses a significant challenge: low throughput of inference tasks. To this end, this paper proposes Niagara, a novel system designed to maximize system throughput by judiciously scheduling DNN inference services on heterogeneous processors available on edge devices. Niagara faces two critical challenges: uncertain workload dynamics and high scheduling complexity. To effectively address these challenges, Niagara employs a predictive model to anticipate incoming workload patterns and orchestrates the allocation of services across heterogeneous processors through a combination of offline scheduling optimization and online service dispatching strategies. We have implemented Niagara and conducted thorough experiments. The results demonstrate that Niagara surpasses state-of-the-art approaches by elevating DNN inference throughput by up to 4.67\(\times \), all while satisfying the same stringent inference latency requirements. Furthermore, Niagara has been successfully deployed in real-world power supply substations to detect violations, ensuring uninterrupted, accident-free operation during its six-month deployment period.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Cortex A57. https://en.wikipedia.org/wiki/ARM_Cortex-A57
Gurobi solver. http://www.gurobi.com
Jetson TX2. https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-tx2/
Kirin 9000. https://www.hisilicon.com/cn/products/Kirin/Kirin-flagship-chips/Kirin-9000
Powerful 64-bit heterogeneous processing, advanced analytics and 4G LTE redefine the IP camera. https://www.edge-ai-vision.com/2015/11/qualcomm-announces-ip-camera-reference-platform-with-high-end-processing-imaging-and-analytics-capabilities-to-advance-security-cameras/
Qualcomm snapdragon 625 IP camera. https://anyconnect.com/recommended-sbcs/thundercomm/thundercomm-qualcomm-snapdragon-625-ip-camera
Snapdragon 650 IP camera brings consciousness to camera security. https://www.qualcomm.com/news/onq/2016/02/snapdragon-650-ip-camera-brings-consciousness-camera-security
Snapdragon 750G SOC. https://www.qualcomm.com/products/mobile/snapdragon/smartphones/snapdragon-7-series-mobile-platforms/snapdragon-750g-5g-mobile-platform
Snapdragon 855 SOC. https://www.qualcomm.com/products/mobile/snapdragon/smartphones/snapdragon-8-series-mobile-platforms/snapdragon-855-mobile-platform
Snapdragon 865 SOC. https://www.qualcomm.com/products/mobile/snapdragon/smartphones/snapdragon-8-series-mobile-platforms/snapdragon-865-plus-5g-mobile-platform
Tflite. https://www.tensorflow.org/lite/
Edge TPU (2021). https://github.com/XiaoMi/mace
Almeida, M., Laskaridis, S., Mehrotra, A., Dudziak, L., Leontiadis, I., Lane, N.D.: Smart at what cost? Characterising mobile deep neural networks in the wild. In: ACM IMC, pp. 658–672 (2021)
Google Scholar
Chai, F., Zhang, Q., Yao, H., Xin, X., Gao, R., Guizani, M.: Joint multi-task offloading and resource allocation for mobile edge computing systems in satellite IoT. IEEE Trans. Veh. Technol. 72(6), 7783–7795 (2023)
Article Google Scholar
Danielsson, P.E.: Euclidean distance mapping. Comput. Graphics Image Process. 14(3), 227–248 (1980)
Article Google Scholar
Diggle, P., Al-Wasel, I.: Time series (1990)
Google Scholar
Dorigo, M., Gambardella, L.M.: Ant colonies for the travelling salesman problem. Biosystems 43(2), 73–81 (1997)
Google Scholar
Eshraghi, N., Liang, B.: Joint offloading decision and resource allocation with uncertain task computing requirement. In: IEEE INFOCOM, pp. 1414–1422 (2019)
Google Scholar
Fu, X., Tang, B., Guo, F., Kang, L.: Priority and dependency-based DAG tasks offloading in fog/edge collaborative environment. In: CSCWD, pp. 440–445 (2021)
Google Scholar
Hu, S., et al.: Temporal-aware qos prediction via dynamic graph neural collaborative learning. In: Troya, J., Medjahed, B., Piattini, M., Yao, L., Fernández, P., Ruiz-Cortés, A. (eds.) ICSOC, vol. 13740, pp. 125–133. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20984-0_8
Huang, V., Wang, C., Ma, H., Chen, G., Christopher, K.: Cost-aware dynamic multi-workflow scheduling in cloud data center using evolutionary reinforcement learning. In: Troya, J., Medjahed, B., Piattini, M., Yao, L., Fernández, P., Ruiz-Cortés, A. (eds.) ICSOC, vol. 13740, pp. 449–464. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20984-0_32
Jeong, J.S., et al.: Band: coordinated multi-DNN inference on heterogeneous mobile processors. In: ACM MobiSys, pp. 235–247 (2022)
Google Scholar
Kim, Y., Kim, J., Chae, D., Kim, D., Kim, J.: \(\mu \)layer: low latency on-device inference using cooperative single-layer acceleration and processor-friendly quantization. In: EuroSys, pp. 1–15 (2019)
Google Scholar
Li, Z., Yang, C., Huang, X., Zeng, W., Xie, S.: CoOR: collaborative task offloading and service caching replacement for vehicular edge computing networks. IEEE Trans. Veh. Technol., 1–6 (2023)
Google Scholar
Liao, H., Li, X., Guo, D., Kang, W., Li, J.: Dependency-aware application assigning and scheduling in edge computing. IEEE IoT (2021)
Google Scholar
Liu, J., Ren, J., Zhang, Y., Peng, X., Zhang, Y., Yang, Y.: Efficient dependent task offloading for multiple applications in MEC-cloud system. IEEE TMC (2021)
Google Scholar
Meng, Z., Xu, H., Huang, L., Xi, P., Yang, S.: Achieving energy efficiency through dynamic computing offloading in mobile edge-clouds. In: IEEE MASS, pp. 175–183. IEEE (2018)
Google Scholar
Shen, H., et al.: Nexus: a GPU cluster engine for accelerating DNN-based video analysis. In: ACM SOSP, pp. 322–337 (2019)
Google Scholar
Sze, V., Chen, Y.H., Yang, T.J., Emer, J.S.: Efficient processing of deep neural networks: a tutorial and survey. Proc. IEEE 105(12), 2295–2329 (2017)
Article Google Scholar
Tan, T., Cao, G.: FastVA: deep learning video analytics through edge processing and NPU in mobile. In: IEEE INFOCOM, pp. 1947–1956. IEEE (2020)
Google Scholar
Wang, M., Ding, S., Cao, T., Liu, Y., Xu, F.: AsyMo: scalable and efficient deep-learning inference on asymmetric mobile CPUs. In: ACM MobiCom, pp. 215–228 (2021)
Google Scholar
Wei, T., Zhang, P., Dong, H., Jin, H., Bouguettaya, A.: Mobility-aware proactive QoS monitoring for mobile edge computing. In: Troya, J., Medjahed, B., Piattini, M., Yao, L., Fernández, P., Ruiz-Cortés, A. (eds.) ICSOC, vol. 13740, pp. 134–142. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20984-0_9
Wei, W.W.: Time series analysis. In: The Oxford Handbook of Quantitative Methods in Psychology, vol. 2 (2006)
Google Scholar
Xiao, H., Xu, C., Ma, Y., Yang, S., Zhong, L., Muntean, G.M.: Edge intelligence: a computational task offloading scheme for dependent IoT application. IEEE Wirel. Commun. 21(9), 7222–7237 (2022)
Article Google Scholar
Xu, M., Zhang, X., Liu, Y., Huang, G., Liu, X., Lin, F.X.: Approximate query service on autonomous IoT cameras. In: ACM MobiSys, pp. 191–205 (2020)
Google Scholar
Yang, Y., Chen, G., Ma, H., Zhang, M.: Dual-tree genetic programming for deadline-constrained dynamic workflow scheduling in cloud. In: Troya, J., Medjahed, B., Piattini, M., Yao, L., Fernández, P., Ruiz-Cortés, A. (eds.) ICSOC, vol. 13740, pp. 433–448. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20984-0_31
Yeo, H., Chong, C.J., Jung, Y., Ye, J., Han, D.: NEMO: enabling neural-enhanced video streaming on commodity mobile devices. In: ACM MobiCom, pp. 1–14 (2020)
Google Scholar
Yi, J., Lee, Y.: Heimdall: mobile GPU coordination platform for augmented reality applications. In: ACM MobiCom, pp. 1–14 (2020)
Google Scholar
Zhang, J., et al.: MobiPose: real-time multi-person pose estimation on mobile devices. In: ACM SenSys, pp. 136–149 (2020)
Google Scholar
Zhao, G., Xu, H., Zhao, Y., Qiao, C., Huang, L.: Offloading tasks with dependency and service caching in mobile edge computing. IEEE Trans. Parallel Distrib. Syst. 32(11), 2777–2792 (2021)
Article Google Scholar
Zhao, Z., Luo, H., Chu, S.C., Shang, Y., Wu, X.: An immersive online shopping system based on virtual reality. J. Netw. Intell. 3(4), 235–246 (2018)
Google Scholar

Download references

Acknowledgement

This work was supported by the National Key Research and Development Program of China under the grant number 2022YFB4500700, the National Natural Science Foundation of China under the grant numbers 62325201, 62172008, 62102009, and 62102045, the National Natural Science Fund for the Excellent Young Scientists Fund Program (Overseas), the China Postdoctoral Science Foundation 8206300713, the Beijing Outstanding Young Scientist Program under the grant number BJJWZYJH01201910001004, and Center for Data Space Technology and System, Peking University.

Author information

Authors and Affiliations

Key Laboratory of High Confidence Software Technologies (Peking University), Ministry of Education, School of Computer Science, Peking University, Beijing, China
Daliang Xu, Qing Li, Gang Huang, Xin Jin & Xuanzhe Liu
State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Beijing, China
Mengwei Xu & Shangguang Wang
Linggui Tech Company, Beijing, China
Kang Huang
Institute for Artificial Intelligence, Peking University, Beijing, China
Yun Ma
National Key Laboratory of Data Space Technology and System, Beijing, China
Gang Huang

Authors

Daliang Xu
View author publications
You can also search for this author in PubMed Google Scholar
Qing Li
View author publications
You can also search for this author in PubMed Google Scholar
Mengwei Xu
View author publications
You can also search for this author in PubMed Google Scholar
Kang Huang
View author publications
You can also search for this author in PubMed Google Scholar
Gang Huang
View author publications
You can also search for this author in PubMed Google Scholar
Shangguang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Xin Jin
View author publications
You can also search for this author in PubMed Google Scholar
Yun Ma
View author publications
You can also search for this author in PubMed Google Scholar
Xuanzhe Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Mengwei Xu , Xin Jin or Yun Ma .

Editor information

Editors and Affiliations

Sapienza University of Rome, Rome, Italy
Flavia Monti
Technical University of Munich, Garching, Germany
Stefanie Rinderle-Ma
University of Seville, Seville, Spain
Antonio Ruiz Cortés
Sun Yat-sen University, Guangzhou, China
Zibin Zheng
Sapienza University of Rome, Rome, Italy
Massimo Mecella

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xu, D. et al. (2023). Niagara: Scheduling DNN Inference Services on Heterogeneous Edge Processors. In: Monti, F., Rinderle-Ma, S., Ruiz Cortés, A., Zheng, Z., Mecella, M. (eds) Service-Oriented Computing. ICSOC 2023. Lecture Notes in Computer Science, vol 14419. Springer, Cham. https://doi.org/10.1007/978-3-031-48421-6_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-48421-6_6
Published: 20 November 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-48420-9
Online ISBN: 978-3-031-48421-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Niagara: Scheduling DNN Inference Services on Heterogeneous Edge Processors