MagicBatch: An Energy-Aware Scheduling Framework for DNN Inference on Heterogeneous Edge Servers in Space-Air-Ground Computation

Liu, Di; Ma, Zimo; Zhang, Aolin; Zheng, Kuangyu

doi:10.1007/978-981-99-2233-8_30

Di Liu^12,13,
Zimo Ma¹²,
Aolin Zhang¹² &
…
Kuangyu Zheng^12,13

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13864))

Included in the following conference series:

International Conference on Big Data Intelligence and Computing

956 Accesses

Abstract

With the fast development of space-air-ground computing scenarios, large UAVs, airships or HAPS (high altitude platform station), and satellites, are in the trend to have more powerful computation resources (e.g., heterogeneous types of GPUs), and can act as edge servers in the air. They are increasingly used for a large number of deep neural networks (DNN) inference applications, such as disaster monitoring, remote sensing, and agriculture inspection. However, these edge servers in the air always have a very limited energy supply. Thus, how to reduce their energy consumption to extend their working hours, while meeting the delay requirements of DNN inference tasks becomes a very important demand.

In this paper, we propose MagicBatch, an energy-aware scheduling framework for DNN inference workloads on edge servers (with heterogeneous GPUs) in the air. MagicBatch is based on our key finding, that various GPUs can have different energy and latency performance under different DNN inference batch sizes. Thus, MagicBatch is designed in two phases: In the offline analysis phase, it analyzes the execution latency and energy consumption performance of different DNN inference tasks on heterogeneous GPUs; In the online scheduling phase, we propose a heuristic energy-aware scheduling algorithm (PSO-GA) to better allocate heterogeneous GPU computing resources to various inference tasks. Evaluation on our emulation testbed shows that MagicBatch can achieve more than 31.3% energy savings and 41.1% throughput improvement compared with the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Niagara: Scheduling DNN Inference Services on Heterogeneous Edge Processors

Deep Deterministic Policy Gradient Algorithm for Space/Aerial-Assisted Computation Offloading

Federated Learning Enabled Green Edge Computing System for IIoT Applications

References

Crankshaw, D., Wang, X., Zhou, G., Franklin, M.J., Gonzalez, J.E., Stoica, I.: Clipper: a low-latency online prediction serving system. In: 14th USENIX NSDI, pp. 613–627 (2017)
Google Scholar
Cui, W., Wei, M., Chen, Q., Tang, X., Leng, J., Li, L., Guo, M.: Ebird: elastic batch for improving responsiveness and throughput of deep learning services. In: 37th ICCD, pp. 497–505. IEEE (2019)
Google Scholar
Dinh, H.T., Lee, C., Niyato, D., Wang, P.: A survey of mobile cloud computing: architecture, applications, and approaches. Wirel. Commun. Mob. Comput. 13(18), 1587–1611 (2013)
Article Google Scholar
Fu, Z., Ren, J., Zhang, D., Zhou, Y., Zhang, Y.: Kalmia: a heterogeneous QoS-aware scheduling framework for DNN tasks on edge servers. In: IEEE INFOCOM 2022, pp. 780–789. IEEE (2022)
Google Scholar
Jiang, J., Cui, B., Zhang, C., Yu, L.: Heterogeneity-aware distributed parameter servers. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 463–478 (2017)
Google Scholar
Jiang, Y., Zhu, Y., Lan, C., Yi, B., Cui, Y., Guo, C.: A unified architecture for accelerating distributed DNN training in heterogeneous GPU/CPU clusters. In: 14th USENIX OSDI, pp. 463–479 (2020)
Google Scholar
Lin, B., Huang, Y., Zhang, J., Hu, J., Chen, X., Li, J.: Cost-driven off-loading for DNN-based applications over cloud, edge, and end devices. IEEE Trans. Industr. Inf. 16(8), 5456–5466 (2019)
Article Google Scholar
Nabavinejad, S.M., Reda, S., Ebrahimi, M.: Coordinated batching and DVFS for DNN inference on GPU accelerators. IEEE Trans. Parallel Distrib. Syst. 33(10), 2496–2508 (2022)
Article Google Scholar
Narayanan, D., Santhanam, K., Kazhamiaka, F., Phanishayee, A., Zaharia, M.: Heterogeneity-aware cluster scheduling policies for deep learning workloads. In: 14th USENIX OSDI 2020, pp. 481–498 (2020)
Google Scholar
Olston, C., et al.: TensorFlow-serving: flexible, high-performance ML serving. arXiv preprint arXiv:1712.06139 (2017)
Park, J.H., et al.: HetPipe: enabling large DNN training on (Whimpy) heterogeneous GPU Clusters through integration of pipelined model parallelism and data parallelism. In: USENIX ATC, pp. 307–321 (2020)
Google Scholar
Yao, C., Liu, W., Tang, W., Hu, S.: EAIS: energy-aware adaptive scheduling for CNN inference on high-performance GPUs. Futur. Gener. Comput. Syst. 130, 253–268 (2022)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics and Theories, Peng Cheng Laboratory, Shenzhen, China
Di Liu, Zimo Ma, Aolin Zhang & Kuangyu Zheng
School of Electronic and Information Engineering, Beihang University, Beijing, China
Di Liu & Kuangyu Zheng

Authors

Di Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zimo Ma
View author publications
You can also search for this author in PubMed Google Scholar
Aolin Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Kuangyu Zheng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kuangyu Zheng .

Editor information

Editors and Affiliations

Asia University, Taichung, Taiwan
Ching-Hsien Hsu
Beijing University of Posts and Telecommunications, Beijing, China
Mengwei Xu
University of New Brunswick, Fredericton, NB, Canada
Hung Cao
Asia University, Taichung, Taiwan
Hojjat Baghban
University of Fiji, Samabula, Fiji
A. B. M. Shawkat Ali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, D., Ma, Z., Zhang, A., Zheng, K. (2023). MagicBatch: An Energy-Aware Scheduling Framework for DNN Inference on Heterogeneous Edge Servers in Space-Air-Ground Computation. In: Hsu, CH., Xu, M., Cao, H., Baghban, H., Shawkat Ali, A.B.M. (eds) Big Data Intelligence and Computing. DataCom 2022. Lecture Notes in Computer Science, vol 13864. Springer, Singapore. https://doi.org/10.1007/978-981-99-2233-8_30

Download citation

DOI: https://doi.org/10.1007/978-981-99-2233-8_30
Published: 01 May 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-2232-1
Online ISBN: 978-981-99-2233-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MagicBatch: An Energy-Aware Scheduling Framework for DNN Inference on Heterogeneous Edge Servers in Space-Air-Ground Computation