Collaborative GPU Preemption via Spatial Multitasking for Efficient GPU Sharing

Ji, Zhuoran; Wang, Cho-Li

doi:10.1007/978-3-030-85665-6_6

Zhuoran Ji¹¹ &
Cho-Li Wang¹¹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12820))

Included in the following conference series:

European Conference on Parallel Processing

1883 Accesses
1 Citations

Abstract

GPUs have been widely used in data centers and are often over-provisioned to satisfy the stringent latency targets of latency-sensitive (LS) jobs. The GPU under-utilization provides a strong incentive to share GPUs among LS jobs and batch jobs. Preemptive GPU prioritization is costly due to the large contexts. Many novel GPU preemption techniques have been proposed, exhibiting different trade-offs between preemption latency and overhead. Prior works also propose collaborative methods, which intelligently select the preemption techniques at preemption time. However, GPU kernels usually adopt code transformation to improve performance, which also impacts the preemption costs. As kernel transformation is performed before launching, the preemption technique choices are also determined then. It is impractical to select a preemption technique arbitrarily at preemption time if code transformation is adopted. This paper presents CPSpatial, which combines GPU preemption techniques via GPU spatial multitasking. CPSpatial proposes preemption hierarchy and SM-prefetching, achieving both low latency and high throughput. Evaluations show that CPSpatial also has zero preemption latency like the traditional instant preemption techniques, and at the time, achieves up to 1.43\(\times \) throughput. When dealing with sudden LS job workload increasing, CPSpatial reduces the preemption latency by 87.3% compared with the state-of-the-art GPU context switching method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://github.com/jizhuoran/cpspatial.

References

AMD: Vega instruction set architecture reference guide (2017). https://developer.amd.com/wp-content/resources/Vega_Shader_ISA_28July2017.pdf
Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: 2009 IEEE International Symposium on Workload Characterization (IISWC), pp. 44–54. IEEE (2009)
Google Scholar
Chen, T., et al.: \(\{\)TVM\(\}\): an automated end-to-end optimizing compiler for deep learning. In: 13th \(\{\)USENIX\(\}\) Symposium on Operating Systems Design and Implementation (\(\{\)OSDI\(\}\) 18), pp. 578–594 (2018)
Google Scholar
Gupta, K., Stuart, J.A., Owens, J.D.: A study of persistent threads style GPU programming for GPGPU workloads. IEEE (2012)
Google Scholar
Kato, S., Lakshmanan, K., Rajkumar, R., Ishikawa, Y.: Timegraph: GPU scheduling for real-time multi-tasking environments. In: Proceedings of USENIX ATC, pp. 17–30 (2011)
Google Scholar
Kim, H., Zeng, J., Liu, Q., Abdel-Majeed, M., Lee, J., Jung, C.: Compiler-directed soft error resilience for lightweight GPU register file protection. In: Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 989–1004 (2020)
Google Scholar
Li, C., Zigerelli, A., Yang, J., Zhang, Y., Ma, S., Guo, Y.: A dynamic and proactive GPU preemption mechanism using checkpointing. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 39(1), 75–87 (2018)
Article Google Scholar
Lin, Z., Nyland, L., Zhou, H.: Enabling efficient preemption for SIMT architectures with lightweight context switching. In: SC 2016: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 898–908. IEEE (2016)
Google Scholar
Park, J.J.K., Park, Y., Mahlke, S.: Chimera: Collaborative preemption for multitasking on a shared GPU. ACM SIGARCH Comput. Archit. News 43(1), 593–606 (2015)
Article Google Scholar
Patel, T., Tiwari, D.: Clite: efficient and qos-aware co-location of multiple latency-critical jobs for warehouse scale computers. In: 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA), pp. 193–206. IEEE (2020)
Google Scholar
Reddi, V.J., et al.: Mlperf inference benchmark. In: 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA), pp. 446–459. IEEE (2020)
Google Scholar
Tanasic, I., Gelado, I., Cabezas, J., Ramirez, A., Navarro, N., Valero, M.: Enabling preemptive multiprogramming on GPUS. ACM SIGARCH Comput. Archit. News 42(3), 193–204 (2014)
Article Google Scholar
Wang, Z., Yang, J., Melhem, R., Childers, B., Zhang, Y., Guo, M.: Simultaneous multikernel: fine-grained sharing of GPUS. IEEE Comput. Archit. Lett. 15(2), 113–116 (2015)
Article Google Scholar
Wu, B., Liu, X., Zhou, X., Jiang, C.: Flep: enabling flexible and efficient preemption on GPUS. ACM SIGPLAN Not. 52(4), 483–496 (2017)
Article Google Scholar
Yu, C., et al.: Smguard: a flexible and fine-grained resource management framework for GPUS. IEEE Trans. Parallel Distrib. Syst. 29(12), 2849–2862 (2018)
Article Google Scholar
Zhang, W., et al.: Laius: towards latency awareness and improved utilization of spatial multitasking accelerators in datacenters. In: Proceedings of the ACM International Conference on Supercomputing, pp. 58–68 (2019)
Google Scholar

Download references

Acknowledgements

This research is supported by Hong Kong RGC Research Impact Fund R5060-19. We appreciate EURO-PAR reviewers for their constructive comments and suggestions.

Author information

Authors and Affiliations

The University of Hong Kong, Hong Kong, China
Zhuoran Ji & Cho-Li Wang

Authors

Zhuoran Ji
View author publications
You can also search for this author in PubMed Google Scholar
Cho-Li Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhuoran Ji .

Editor information

Editors and Affiliations

Universidade de Lisboa, Lisbon, Portugal
Leonel Sousa
Universidade de Lisboa, Lisbon, Portugal
Nuno Roma
Universidade de Lisboa, Lisbon, Portugal
Pedro Tomás

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ji, Z., Wang, CL. (2021). Collaborative GPU Preemption via Spatial Multitasking for Efficient GPU Sharing. In: Sousa, L., Roma, N., Tomás, P. (eds) Euro-Par 2021: Parallel Processing. Euro-Par 2021. Lecture Notes in Computer Science(), vol 12820. Springer, Cham. https://doi.org/10.1007/978-3-030-85665-6_6

Download citation

DOI: https://doi.org/10.1007/978-3-030-85665-6_6
Published: 25 August 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-85664-9
Online ISBN: 978-3-030-85665-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics