Abstract
With the fast development of space-air-ground computing scenarios, large UAVs, airships or HAPS (high altitude platform station), and satellites, are in the trend to have more powerful computation resources (e.g., heterogeneous types of GPUs), and can act as edge servers in the air. They are increasingly used for a large number of deep neural networks (DNN) inference applications, such as disaster monitoring, remote sensing, and agriculture inspection. However, these edge servers in the air always have a very limited energy supply. Thus, how to reduce their energy consumption to extend their working hours, while meeting the delay requirements of DNN inference tasks becomes a very important demand.
In this paper, we propose MagicBatch, an energy-aware scheduling framework for DNN inference workloads on edge servers (with heterogeneous GPUs) in the air. MagicBatch is based on our key finding, that various GPUs can have different energy and latency performance under different DNN inference batch sizes. Thus, MagicBatch is designed in two phases: In the offline analysis phase, it analyzes the execution latency and energy consumption performance of different DNN inference tasks on heterogeneous GPUs; In the online scheduling phase, we propose a heuristic energy-aware scheduling algorithm (PSO-GA) to better allocate heterogeneous GPU computing resources to various inference tasks. Evaluation on our emulation testbed shows that MagicBatch can achieve more than 31.3% energy savings and 41.1% throughput improvement compared with the state-of-the-art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Crankshaw, D., Wang, X., Zhou, G., Franklin, M.J., Gonzalez, J.E., Stoica, I.: Clipper: a low-latency online prediction serving system. In: 14th USENIX NSDI, pp. 613–627 (2017)
Cui, W., Wei, M., Chen, Q., Tang, X., Leng, J., Li, L., Guo, M.: Ebird: elastic batch for improving responsiveness and throughput of deep learning services. In: 37th ICCD, pp. 497–505. IEEE (2019)
Dinh, H.T., Lee, C., Niyato, D., Wang, P.: A survey of mobile cloud computing: architecture, applications, and approaches. Wirel. Commun. Mob. Comput. 13(18), 1587–1611 (2013)
Fu, Z., Ren, J., Zhang, D., Zhou, Y., Zhang, Y.: Kalmia: a heterogeneous QoS-aware scheduling framework for DNN tasks on edge servers. In: IEEE INFOCOM 2022, pp. 780–789. IEEE (2022)
Jiang, J., Cui, B., Zhang, C., Yu, L.: Heterogeneity-aware distributed parameter servers. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 463–478 (2017)
Jiang, Y., Zhu, Y., Lan, C., Yi, B., Cui, Y., Guo, C.: A unified architecture for accelerating distributed DNN training in heterogeneous GPU/CPU clusters. In: 14th USENIX OSDI, pp. 463–479 (2020)
Lin, B., Huang, Y., Zhang, J., Hu, J., Chen, X., Li, J.: Cost-driven off-loading for DNN-based applications over cloud, edge, and end devices. IEEE Trans. Industr. Inf. 16(8), 5456–5466 (2019)
Nabavinejad, S.M., Reda, S., Ebrahimi, M.: Coordinated batching and DVFS for DNN inference on GPU accelerators. IEEE Trans. Parallel Distrib. Syst. 33(10), 2496–2508 (2022)
Narayanan, D., Santhanam, K., Kazhamiaka, F., Phanishayee, A., Zaharia, M.: Heterogeneity-aware cluster scheduling policies for deep learning workloads. In: 14th USENIX OSDI 2020, pp. 481–498 (2020)
Olston, C., et al.: TensorFlow-serving: flexible, high-performance ML serving. arXiv preprint arXiv:1712.06139 (2017)
Park, J.H., et al.: HetPipe: enabling large DNN training on (Whimpy) heterogeneous GPU Clusters through integration of pipelined model parallelism and data parallelism. In: USENIX ATC, pp. 307–321 (2020)
Yao, C., Liu, W., Tang, W., Hu, S.: EAIS: energy-aware adaptive scheduling for CNN inference on high-performance GPUs. Futur. Gener. Comput. Syst. 130, 253–268 (2022)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Liu, D., Ma, Z., Zhang, A., Zheng, K. (2023). MagicBatch: An Energy-Aware Scheduling Framework for DNN Inference on Heterogeneous Edge Servers in Space-Air-Ground Computation. In: Hsu, CH., Xu, M., Cao, H., Baghban, H., Shawkat Ali, A.B.M. (eds) Big Data Intelligence and Computing. DataCom 2022. Lecture Notes in Computer Science, vol 13864. Springer, Singapore. https://doi.org/10.1007/978-981-99-2233-8_30
Download citation
DOI: https://doi.org/10.1007/978-981-99-2233-8_30
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-2232-1
Online ISBN: 978-981-99-2233-8
eBook Packages: Computer ScienceComputer Science (R0)