Abstract
Intelligent applications heavily rely on deep neural network (DNN) inference services executed on edge devices to fulfill functional prerequisites while safeguarding user data privacy. However, the execution of such DNN services on resource-constrained edge devices poses a significant challenge: low throughput of inference tasks. To this end, this paper proposes Niagara, a novel system designed to maximize system throughput by judiciously scheduling DNN inference services on heterogeneous processors available on edge devices. Niagara faces two critical challenges: uncertain workload dynamics and high scheduling complexity. To effectively address these challenges, Niagara employs a predictive model to anticipate incoming workload patterns and orchestrates the allocation of services across heterogeneous processors through a combination of offline scheduling optimization and online service dispatching strategies. We have implemented Niagara and conducted thorough experiments. The results demonstrate that Niagara surpasses state-of-the-art approaches by elevating DNN inference throughput by up to 4.67\(\times \), all while satisfying the same stringent inference latency requirements. Furthermore, Niagara has been successfully deployed in real-world power supply substations to detect violations, ensuring uninterrupted, accident-free operation during its six-month deployment period.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Cortex A57. https://en.wikipedia.org/wiki/ARM_Cortex-A57
Gurobi solver. http://www.gurobi.com
Jetson TX2. https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-tx2/
Kirin 9000. https://www.hisilicon.com/cn/products/Kirin/Kirin-flagship-chips/Kirin-9000
Powerful 64-bit heterogeneous processing, advanced analytics and 4G LTE redefine the IP camera. https://www.edge-ai-vision.com/2015/11/qualcomm-announces-ip-camera-reference-platform-with-high-end-processing-imaging-and-analytics-capabilities-to-advance-security-cameras/
Qualcomm snapdragon 625 IP camera. https://anyconnect.com/recommended-sbcs/thundercomm/thundercomm-qualcomm-snapdragon-625-ip-camera
Snapdragon 650 IP camera brings consciousness to camera security. https://www.qualcomm.com/news/onq/2016/02/snapdragon-650-ip-camera-brings-consciousness-camera-security
Snapdragon 750G SOC. https://www.qualcomm.com/products/mobile/snapdragon/smartphones/snapdragon-7-series-mobile-platforms/snapdragon-750g-5g-mobile-platform
Snapdragon 855 SOC. https://www.qualcomm.com/products/mobile/snapdragon/smartphones/snapdragon-8-series-mobile-platforms/snapdragon-855-mobile-platform
Snapdragon 865 SOC. https://www.qualcomm.com/products/mobile/snapdragon/smartphones/snapdragon-8-series-mobile-platforms/snapdragon-865-plus-5g-mobile-platform
Tflite. https://www.tensorflow.org/lite/
Edge TPU (2021). https://github.com/XiaoMi/mace
Almeida, M., Laskaridis, S., Mehrotra, A., Dudziak, L., Leontiadis, I., Lane, N.D.: Smart at what cost? Characterising mobile deep neural networks in the wild. In: ACM IMC, pp. 658–672 (2021)
Chai, F., Zhang, Q., Yao, H., Xin, X., Gao, R., Guizani, M.: Joint multi-task offloading and resource allocation for mobile edge computing systems in satellite IoT. IEEE Trans. Veh. Technol. 72(6), 7783–7795 (2023)
Danielsson, P.E.: Euclidean distance mapping. Comput. Graphics Image Process. 14(3), 227–248 (1980)
Diggle, P., Al-Wasel, I.: Time series (1990)
Dorigo, M., Gambardella, L.M.: Ant colonies for the travelling salesman problem. Biosystems 43(2), 73–81 (1997)
Eshraghi, N., Liang, B.: Joint offloading decision and resource allocation with uncertain task computing requirement. In: IEEE INFOCOM, pp. 1414–1422 (2019)
Fu, X., Tang, B., Guo, F., Kang, L.: Priority and dependency-based DAG tasks offloading in fog/edge collaborative environment. In: CSCWD, pp. 440–445 (2021)
Hu, S., et al.: Temporal-aware qos prediction via dynamic graph neural collaborative learning. In: Troya, J., Medjahed, B., Piattini, M., Yao, L., Fernández, P., Ruiz-Cortés, A. (eds.) ICSOC, vol. 13740, pp. 125–133. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20984-0_8
Huang, V., Wang, C., Ma, H., Chen, G., Christopher, K.: Cost-aware dynamic multi-workflow scheduling in cloud data center using evolutionary reinforcement learning. In: Troya, J., Medjahed, B., Piattini, M., Yao, L., Fernández, P., Ruiz-Cortés, A. (eds.) ICSOC, vol. 13740, pp. 449–464. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20984-0_32
Jeong, J.S., et al.: Band: coordinated multi-DNN inference on heterogeneous mobile processors. In: ACM MobiSys, pp. 235–247 (2022)
Kim, Y., Kim, J., Chae, D., Kim, D., Kim, J.: \(\mu \)layer: low latency on-device inference using cooperative single-layer acceleration and processor-friendly quantization. In: EuroSys, pp. 1–15 (2019)
Li, Z., Yang, C., Huang, X., Zeng, W., Xie, S.: CoOR: collaborative task offloading and service caching replacement for vehicular edge computing networks. IEEE Trans. Veh. Technol., 1–6 (2023)
Liao, H., Li, X., Guo, D., Kang, W., Li, J.: Dependency-aware application assigning and scheduling in edge computing. IEEE IoT (2021)
Liu, J., Ren, J., Zhang, Y., Peng, X., Zhang, Y., Yang, Y.: Efficient dependent task offloading for multiple applications in MEC-cloud system. IEEE TMC (2021)
Meng, Z., Xu, H., Huang, L., Xi, P., Yang, S.: Achieving energy efficiency through dynamic computing offloading in mobile edge-clouds. In: IEEE MASS, pp. 175–183. IEEE (2018)
Shen, H., et al.: Nexus: a GPU cluster engine for accelerating DNN-based video analysis. In: ACM SOSP, pp. 322–337 (2019)
Sze, V., Chen, Y.H., Yang, T.J., Emer, J.S.: Efficient processing of deep neural networks: a tutorial and survey. Proc. IEEE 105(12), 2295–2329 (2017)
Tan, T., Cao, G.: FastVA: deep learning video analytics through edge processing and NPU in mobile. In: IEEE INFOCOM, pp. 1947–1956. IEEE (2020)
Wang, M., Ding, S., Cao, T., Liu, Y., Xu, F.: AsyMo: scalable and efficient deep-learning inference on asymmetric mobile CPUs. In: ACM MobiCom, pp. 215–228 (2021)
Wei, T., Zhang, P., Dong, H., Jin, H., Bouguettaya, A.: Mobility-aware proactive QoS monitoring for mobile edge computing. In: Troya, J., Medjahed, B., Piattini, M., Yao, L., Fernández, P., Ruiz-Cortés, A. (eds.) ICSOC, vol. 13740, pp. 134–142. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20984-0_9
Wei, W.W.: Time series analysis. In: The Oxford Handbook of Quantitative Methods in Psychology, vol. 2 (2006)
Xiao, H., Xu, C., Ma, Y., Yang, S., Zhong, L., Muntean, G.M.: Edge intelligence: a computational task offloading scheme for dependent IoT application. IEEE Wirel. Commun. 21(9), 7222–7237 (2022)
Xu, M., Zhang, X., Liu, Y., Huang, G., Liu, X., Lin, F.X.: Approximate query service on autonomous IoT cameras. In: ACM MobiSys, pp. 191–205 (2020)
Yang, Y., Chen, G., Ma, H., Zhang, M.: Dual-tree genetic programming for deadline-constrained dynamic workflow scheduling in cloud. In: Troya, J., Medjahed, B., Piattini, M., Yao, L., Fernández, P., Ruiz-Cortés, A. (eds.) ICSOC, vol. 13740, pp. 433–448. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-20984-0_31
Yeo, H., Chong, C.J., Jung, Y., Ye, J., Han, D.: NEMO: enabling neural-enhanced video streaming on commodity mobile devices. In: ACM MobiCom, pp. 1–14 (2020)
Yi, J., Lee, Y.: Heimdall: mobile GPU coordination platform for augmented reality applications. In: ACM MobiCom, pp. 1–14 (2020)
Zhang, J., et al.: MobiPose: real-time multi-person pose estimation on mobile devices. In: ACM SenSys, pp. 136–149 (2020)
Zhao, G., Xu, H., Zhao, Y., Qiao, C., Huang, L.: Offloading tasks with dependency and service caching in mobile edge computing. IEEE Trans. Parallel Distrib. Syst. 32(11), 2777–2792 (2021)
Zhao, Z., Luo, H., Chu, S.C., Shang, Y., Wu, X.: An immersive online shopping system based on virtual reality. J. Netw. Intell. 3(4), 235–246 (2018)
Acknowledgement
This work was supported by the National Key Research and Development Program of China under the grant number 2022YFB4500700, the National Natural Science Foundation of China under the grant numbers 62325201, 62172008, 62102009, and 62102045, the National Natural Science Fund for the Excellent Young Scientists Fund Program (Overseas), the China Postdoctoral Science Foundation 8206300713, the Beijing Outstanding Young Scientist Program under the grant number BJJWZYJH01201910001004, and Center for Data Space Technology and System, Peking University.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Xu, D. et al. (2023). Niagara: Scheduling DNN Inference Services on Heterogeneous Edge Processors. In: Monti, F., Rinderle-Ma, S., Ruiz Cortés, A., Zheng, Z., Mecella, M. (eds) Service-Oriented Computing. ICSOC 2023. Lecture Notes in Computer Science, vol 14419. Springer, Cham. https://doi.org/10.1007/978-3-031-48421-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-48421-6_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-48420-9
Online ISBN: 978-3-031-48421-6
eBook Packages: Computer ScienceComputer Science (R0)