Skip to main content

MagicBatch: An Energy-Aware Scheduling Framework for DNN Inference on Heterogeneous Edge Servers in Space-Air-Ground Computation

  • Conference paper
  • First Online:
Big Data Intelligence and Computing (DataCom 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13864))

Included in the following conference series:

Abstract

With the fast development of space-air-ground computing scenarios, large UAVs, airships or HAPS (high altitude platform station), and satellites, are in the trend to have more powerful computation resources (e.g., heterogeneous types of GPUs), and can act as edge servers in the air. They are increasingly used for a large number of deep neural networks (DNN) inference applications, such as disaster monitoring, remote sensing, and agriculture inspection. However, these edge servers in the air always have a very limited energy supply. Thus, how to reduce their energy consumption to extend their working hours, while meeting the delay requirements of DNN inference tasks becomes a very important demand.

In this paper, we propose MagicBatch, an energy-aware scheduling framework for DNN inference workloads on edge servers (with heterogeneous GPUs) in the air. MagicBatch is based on our key finding, that various GPUs can have different energy and latency performance under different DNN inference batch sizes. Thus, MagicBatch is designed in two phases: In the offline analysis phase, it analyzes the execution latency and energy consumption performance of different DNN inference tasks on heterogeneous GPUs; In the online scheduling phase, we propose a heuristic energy-aware scheduling algorithm (PSO-GA) to better allocate heterogeneous GPU computing resources to various inference tasks. Evaluation on our emulation testbed shows that MagicBatch can achieve more than 31.3% energy savings and 41.1% throughput improvement compared with the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Crankshaw, D., Wang, X., Zhou, G., Franklin, M.J., Gonzalez, J.E., Stoica, I.: Clipper: a low-latency online prediction serving system. In: 14th USENIX NSDI, pp. 613–627 (2017)

    Google Scholar 

  2. Cui, W., Wei, M., Chen, Q., Tang, X., Leng, J., Li, L., Guo, M.: Ebird: elastic batch for improving responsiveness and throughput of deep learning services. In: 37th ICCD, pp. 497–505. IEEE (2019)

    Google Scholar 

  3. Dinh, H.T., Lee, C., Niyato, D., Wang, P.: A survey of mobile cloud computing: architecture, applications, and approaches. Wirel. Commun. Mob. Comput. 13(18), 1587–1611 (2013)

    Article  Google Scholar 

  4. Fu, Z., Ren, J., Zhang, D., Zhou, Y., Zhang, Y.: Kalmia: a heterogeneous QoS-aware scheduling framework for DNN tasks on edge servers. In: IEEE INFOCOM 2022, pp. 780–789. IEEE (2022)

    Google Scholar 

  5. Jiang, J., Cui, B., Zhang, C., Yu, L.: Heterogeneity-aware distributed parameter servers. In: Proceedings of the 2017 ACM International Conference on Management of Data, pp. 463–478 (2017)

    Google Scholar 

  6. Jiang, Y., Zhu, Y., Lan, C., Yi, B., Cui, Y., Guo, C.: A unified architecture for accelerating distributed DNN training in heterogeneous GPU/CPU clusters. In: 14th USENIX OSDI, pp. 463–479 (2020)

    Google Scholar 

  7. Lin, B., Huang, Y., Zhang, J., Hu, J., Chen, X., Li, J.: Cost-driven off-loading for DNN-based applications over cloud, edge, and end devices. IEEE Trans. Industr. Inf. 16(8), 5456–5466 (2019)

    Article  Google Scholar 

  8. Nabavinejad, S.M., Reda, S., Ebrahimi, M.: Coordinated batching and DVFS for DNN inference on GPU accelerators. IEEE Trans. Parallel Distrib. Syst. 33(10), 2496–2508 (2022)

    Article  Google Scholar 

  9. Narayanan, D., Santhanam, K., Kazhamiaka, F., Phanishayee, A., Zaharia, M.: Heterogeneity-aware cluster scheduling policies for deep learning workloads. In: 14th USENIX OSDI 2020, pp. 481–498 (2020)

    Google Scholar 

  10. Olston, C., et al.: TensorFlow-serving: flexible, high-performance ML serving. arXiv preprint arXiv:1712.06139 (2017)

  11. Park, J.H., et al.: HetPipe: enabling large DNN training on (Whimpy) heterogeneous GPU Clusters through integration of pipelined model parallelism and data parallelism. In: USENIX ATC, pp. 307–321 (2020)

    Google Scholar 

  12. Yao, C., Liu, W., Tang, W., Hu, S.: EAIS: energy-aware adaptive scheduling for CNN inference on high-performance GPUs. Futur. Gener. Comput. Syst. 130, 253–268 (2022)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kuangyu Zheng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Liu, D., Ma, Z., Zhang, A., Zheng, K. (2023). MagicBatch: An Energy-Aware Scheduling Framework for DNN Inference on Heterogeneous Edge Servers in Space-Air-Ground Computation. In: Hsu, CH., Xu, M., Cao, H., Baghban, H., Shawkat Ali, A.B.M. (eds) Big Data Intelligence and Computing. DataCom 2022. Lecture Notes in Computer Science, vol 13864. Springer, Singapore. https://doi.org/10.1007/978-981-99-2233-8_30

Download citation

  • DOI: https://doi.org/10.1007/978-981-99-2233-8_30

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-99-2232-1

  • Online ISBN: 978-981-99-2233-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics