SLO-Aware DL Job Scheduling for Efficient FPGA-GPU Edge Cloud Computing

Kim, Taewoo; Jeon, Minsu; Lee, Changha; Kim, SeongHwan; AL-Hazemi, Fawaz; Youn, Chan-Hyun

doi:10.1007/978-3-031-50385-6_2

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1898))

Included in the following conference series:

International Conference on Web Engineering

255 Accesses

Abstract

Deep learning applications have become increasingly popular in recent years, leading to the development of specialized hardware accelerators such as FPGAs and GPUs. These accelerators provide significant performance gains over traditional CPUs, but their efficient utilization requires careful scheduling configuration for given DL requests. In this paper, we propose a SLO-aware DL job scheduling model for efficient FPGA-GPU edge cloud computing. The proposed model takes into account variant service-level objectives of the DL job and periodically updates the accelerator configuration of DL processing while minimizing computation costs accordingly. We first analyze the impact of various DL-related parameters on the performance of FPGA-GPU computing. We then propose a novel scheduling algorithm that considers the time-variant latency SLO constraints and periodically updates the scheduling configuration. We evaluated our scheduler using several DL workloads on a FPGA-GPU cluster. Our results demonstrated that our scheduler achieves improvements in terms of both energy consumption and SLO compliance compared to the traditional DL scheduling approach.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A fine-grained GPU sharing and job scheduling for deep learning jobs on the cloud

Article 03 January 2025

Resource scheduling techniques in cloud from a view of coordination: a holistic survey

Article 23 January 2023

ArkGPU: enabling applications’ high-goodput co-location execution on multitasking GPUs

Article 24 May 2023

References

NVIDIA multi-process service. https://docs.nvidia.com/deploy/mps/index.html
Bojarski, M., et al.: End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016)
Brito, T., Queiroz, J., Piardi, L., Fernandes, L.A., Lima, J., Leitão, P.: A machine learning approach for collaborative robot smart manufacturing inspection for quality control systems. Procedia Manuf. 51, 11–18 (2020)
Article Google Scholar
Cao, H., et al.: Swin-UNet: UNet-like pure transformer for medical image segmentation. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds.) ECCV 2022. LNCS, vol. 13803, pp. 205–218. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-25066-8_9
Chapter Google Scholar
Chen, C., Seff, A., Kornhauser, A., Xiao, J.: DeepDriving: learning affordance for direct perception in autonomous driving. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2722–2730 (2015)
Google Scholar
Choi, S., Lee, S., Kim, Y., Park, J., Kwon, Y., and Huh, J.: Multi-model machine learning inference serving with GPU spatial partitioning. arXiv preprint arXiv:2109.01611 (2021)
Codevilla, F., Müller, M., López, A., Koltun, V., Dosovitskiy, A.: End-to-end driving via conditional imitation learning. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 4693–4700. IEEE (2018)
Google Scholar
Crankshaw, D., Wang, X., Zhou, G., Franklin, M.J., Gonzalez, J.E., Stoica, I.: Clipper: a $\{$low-latency$\}$ online prediction serving system. In: 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2017), pp. 613–627 (2017)
Google Scholar
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dhakal, A., Kulkarni, S.G., Ramakrishnan, K.: GSLICE: controlled spatial sharing of GPUs for a scalable inference platform. In: Proceedings of the 11th ACM Symposium on Cloud Computing, pp. 492–506 (2020)
Google Scholar
Dosovitskiy, A., et al.: An image is worth 16$\times $16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Esteva, A., et al.: Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639), 115–118 (2017)
Article Google Scholar
Galassi, A., Lippi, M., Torroni, P.: Attention in natural language processing. IEEE Trans. Neural Netw. Learn. Syst. 32(10), 4291–4308 (2020)
Article Google Scholar
Gu, Y., et al.: Domain-specific language model pretraining for biomedical natural language processing. ACM Trans. Comput. Healthc. (HEALTH) 3(1), 1–23 (2021)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Google Scholar
Jain, P., et al.: Dynamic space-time scheduling for GPU inference. arXiv preprint arXiv:1901.00041 (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar
Kim, W.-J., Youn, C.-H.: Cooperative scheduling schemes for explainable DNN acceleration in satellite image analysis and retraining. IEEE Trans. Parallel Distrib. Syst. 33(7), 1605–1618 (2021)
Article Google Scholar
Litjens, G., et al.: State-of-the-art deep learning in cardiovascular image analysis. JACC: Cardiovas. Imaging 12(8 Part 1), 1549–1565 (2019)
Google Scholar
Ouyang, Z., Niu, J., Liu, Y., Guizani, M.: Deep CNN-based real-time traffic light detector for self-driving vehicles. IEEE Trans. Mob. Comput. 19(2), 300–313 (2019)
Article Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015, Part III. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Shen, H., et al.: Nexus: a GPU cluster engine for accelerating DNN-based video analysis. In: Proceedings of the 27th ACM Symposium on Operating Systems Principles, pp. 322–337 (2019)
Google Scholar
Singh, A., Sengupta, S., Lakshminarayanan, V.: Explainable deep learning models in medical image analysis. J. Imaging 6(6), 52 (2020)
Article Google Scholar
Wen, L., Gao, L., Li, X.: A new deep transfer learning based on sparse auto-encoder for fault diagnosis. IEEE Trans. Syst. Man Cybern.: Syst. 49(1), 136–144 (2017)
Article Google Scholar
Xu, H., Gao, Y., Yu, F., Darrell, T.: End-to-end learning of driving models from large-scale video datasets. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2174–2182 (2017)
Google Scholar
Yu, P., Chowdhury, M.: Salus: fine-grained GPU sharing primitives for deep learning applications. arXiv preprint arXiv:1902.04610 (2019)

Download references

Acknowledgements

This work is supported by Samsung Electronics Co., Ltd.

Author information

Authors and Affiliations

Korea Advanced Institute of Science and Technology, Daejeon, 34141, South Korea
Taewoo Kim, Minsu Jeon, Changha Lee, SeongHwan Kim & Chan-Hyun Youn
University of Jeddah, Jeddah, 21959, Saudi Arabia
Fawaz AL-Hazemi

Authors

Taewoo Kim
View author publications
You can also search for this author in PubMed Google Scholar
Minsu Jeon
View author publications
You can also search for this author in PubMed Google Scholar
Changha Lee
View author publications
You can also search for this author in PubMed Google Scholar
SeongHwan Kim
View author publications
You can also search for this author in PubMed Google Scholar
Fawaz AL-Hazemi
View author publications
You can also search for this author in PubMed Google Scholar
Chan-Hyun Youn
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Taewoo Kim .

Editor information

Editors and Affiliations

Jaume I University, Castellón de la Plana, Spain
Sven Casteleyn
University of Jyväskylä, Jyväskylä, Finland
Tommi Mikkonen
Universitat Politècnica de València, Valencia, Spain
Alberto García Simón
Korea Advanced Institute of Science and Technology, Daejeon, Korea (Republic of)
In-Young Ko
LUM “Giuseppe Degennaro” University, Casamassima, Italy
Giuseppe Loseto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kim, T., Jeon, M., Lee, C., Kim, S., AL-Hazemi, F., Youn, CH. (2024). SLO-Aware DL Job Scheduling for Efficient FPGA-GPU Edge Cloud Computing. In: Casteleyn, S., Mikkonen, T., García Simón, A., Ko, IY., Loseto, G. (eds) Current Trends in Web Engineering. ICWE 2023. Communications in Computer and Information Science, vol 1898. Springer, Cham. https://doi.org/10.1007/978-3-031-50385-6_2

Download citation

DOI: https://doi.org/10.1007/978-3-031-50385-6_2
Published: 04 January 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-50384-9
Online ISBN: 978-3-031-50385-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

SLO-Aware DL Job Scheduling for Efficient FPGA-GPU Edge Cloud Computing

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A fine-grained GPU sharing and job scheduling for deep learning jobs on the cloud

Resource scheduling techniques in cloud from a view of coordination: a holistic survey

ArkGPU: enabling applications’ high-goodput co-location execution on multitasking GPUs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

SLO-Aware DL Job Scheduling for Efficient FPGA-GPU Edge Cloud Computing

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A fine-grained GPU sharing and job scheduling for deep learning jobs on the cloud

Resource scheduling techniques in cloud from a view of coordination: a holistic survey

ArkGPU: enabling applications’ high-goodput co-location execution on multitasking GPUs

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation