Abstract
Deep learning applications have become increasingly popular in recent years, leading to the development of specialized hardware accelerators such as FPGAs and GPUs. These accelerators provide significant performance gains over traditional CPUs, but their efficient utilization requires careful scheduling configuration for given DL requests. In this paper, we propose a SLO-aware DL job scheduling model for efficient FPGA-GPU edge cloud computing. The proposed model takes into account variant service-level objectives of the DL job and periodically updates the accelerator configuration of DL processing while minimizing computation costs accordingly. We first analyze the impact of various DL-related parameters on the performance of FPGA-GPU computing. We then propose a novel scheduling algorithm that considers the time-variant latency SLO constraints and periodically updates the scheduling configuration. We evaluated our scheduler using several DL workloads on a FPGA-GPU cluster. Our results demonstrated that our scheduler achieves improvements in terms of both energy consumption and SLO compliance compared to the traditional DL scheduling approach.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
NVIDIA multi-process service. https://docs.nvidia.com/deploy/mps/index.html
Bojarski, M., et al.: End to end learning for self-driving cars. arXiv preprint arXiv:1604.07316 (2016)
Brito, T., Queiroz, J., Piardi, L., Fernandes, L.A., Lima, J., Leitão, P.: A machine learning approach for collaborative robot smart manufacturing inspection for quality control systems. Procedia Manuf. 51, 11–18 (2020)
Cao, H., et al.: Swin-UNet: UNet-like pure transformer for medical image segmentation. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds.) ECCV 2022. LNCS, vol. 13803, pp. 205–218. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-25066-8_9
Chen, C., Seff, A., Kornhauser, A., Xiao, J.: DeepDriving: learning affordance for direct perception in autonomous driving. In Proceedings of the IEEE International Conference on Computer Vision, pp. 2722–2730 (2015)
Choi, S., Lee, S., Kim, Y., Park, J., Kwon, Y., and Huh, J.: Multi-model machine learning inference serving with GPU spatial partitioning. arXiv preprint arXiv:2109.01611 (2021)
Codevilla, F., Müller, M., López, A., Koltun, V., Dosovitskiy, A.: End-to-end driving via conditional imitation learning. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 4693–4700. IEEE (2018)
Crankshaw, D., Wang, X., Zhou, G., Franklin, M.J., Gonzalez, J.E., Stoica, I.: Clipper: a \(\{\)low-latency\(\}\) online prediction serving system. In: 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI 2017), pp. 613–627 (2017)
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Dhakal, A., Kulkarni, S.G., Ramakrishnan, K.: GSLICE: controlled spatial sharing of GPUs for a scalable inference platform. In: Proceedings of the 11th ACM Symposium on Cloud Computing, pp. 492–506 (2020)
Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Esteva, A., et al.: Dermatologist-level classification of skin cancer with deep neural networks. Nature 542(7639), 115–118 (2017)
Galassi, A., Lippi, M., Torroni, P.: Attention in natural language processing. IEEE Trans. Neural Netw. Learn. Syst. 32(10), 4291–4308 (2020)
Gu, Y., et al.: Domain-specific language model pretraining for biomedical natural language processing. ACM Trans. Comput. Healthc. (HEALTH) 3(1), 1–23 (2021)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4700–4708 (2017)
Jain, P., et al.: Dynamic space-time scheduling for GPU inference. arXiv preprint arXiv:1901.00041 (2018)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Kim, W.-J., Youn, C.-H.: Cooperative scheduling schemes for explainable DNN acceleration in satellite image analysis and retraining. IEEE Trans. Parallel Distrib. Syst. 33(7), 1605–1618 (2021)
Litjens, G., et al.: State-of-the-art deep learning in cardiovascular image analysis. JACC: Cardiovas. Imaging 12(8 Part 1), 1549–1565 (2019)
Ouyang, Z., Niu, J., Liu, Y., Guizani, M.: Deep CNN-based real-time traffic light detector for self-driving vehicles. IEEE Trans. Mob. Comput. 19(2), 300–313 (2019)
Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015, Part III. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Shen, H., et al.: Nexus: a GPU cluster engine for accelerating DNN-based video analysis. In: Proceedings of the 27th ACM Symposium on Operating Systems Principles, pp. 322–337 (2019)
Singh, A., Sengupta, S., Lakshminarayanan, V.: Explainable deep learning models in medical image analysis. J. Imaging 6(6), 52 (2020)
Wen, L., Gao, L., Li, X.: A new deep transfer learning based on sparse auto-encoder for fault diagnosis. IEEE Trans. Syst. Man Cybern.: Syst. 49(1), 136–144 (2017)
Xu, H., Gao, Y., Yu, F., Darrell, T.: End-to-end learning of driving models from large-scale video datasets. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2174–2182 (2017)
Yu, P., Chowdhury, M.: Salus: fine-grained GPU sharing primitives for deep learning applications. arXiv preprint arXiv:1902.04610 (2019)
Acknowledgements
This work is supported by Samsung Electronics Co., Ltd.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Kim, T., Jeon, M., Lee, C., Kim, S., AL-Hazemi, F., Youn, CH. (2024). SLO-Aware DL Job Scheduling for Efficient FPGA-GPU Edge Cloud Computing. In: Casteleyn, S., Mikkonen, T., García Simón, A., Ko, IY., Loseto, G. (eds) Current Trends in Web Engineering. ICWE 2023. Communications in Computer and Information Science, vol 1898. Springer, Cham. https://doi.org/10.1007/978-3-031-50385-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-50385-6_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-50384-9
Online ISBN: 978-3-031-50385-6
eBook Packages: Computer ScienceComputer Science (R0)