CRState: checkpoint/restart of OpenCL program for in-kernel applications

Chen, Genlang; Zhang, Jiajian; Zhu, Zufang; Jiang, Qiangqiang; Jiang, Hai; Pang, Chaoyi

doi:10.1007/s11227-020-03460-2

CRState: checkpoint/restart of OpenCL program for in-kernel applications

Published: 06 November 2020

Volume 77, pages 5426–5467, (2021)
Cite this article

The Journal of Supercomputing Aims and scope Submit manuscript

Genlang Chen ORCID: orcid.org/0000-0003-4381-7988¹,
Jiajian Zhang²,
Zufang Zhu²,
Qiangqiang Jiang²,
Hai Jiang³ &
…
Chaoyi Pang¹

280 Accesses
Explore all metrics

Abstract

The checkpoint/restart mechanism is critical in a preemptive system because clusters with this mechanism will be improved in terms of fault tolerance, load balance, and resource utilization. As graphics processing units (GPUs) have more recently become commonplace with the advent of general-purpose computation, and open computing language (OpenCL) programs are portable across various CPUs and GPUs, it is increasingly important to set up checkpoint/restart mechanism in OpenCL programs. However, due to the complexity of the internal computational state of the GPU, there is currently no effective and reasonable checkpoint/restart scheme for OpenCL applications. This paper proposes a feasible system, checkpoint/restart state (CRState), to achieve checkpoint/restart in GPU kernels. The computation states including heap, data segments, local memory, stack and code segments in the underlying hardware are identified and concretized in order to establish an association between the underlying level state and the application level representation. Then, a pre-compiler is developed to insert primitives into OpenCL programs at compile time so that major components of the computation state will be extracted at runtime. Since the computation state is duplicated at application level, such OpenCL programs can be preempted and ported across heterogeneous devices. A comprehensive example and ten authoritative benchmark programs are selected to demonstrate the feasibility and effectiveness of the proposed system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CRAC: An Automatic Assistant Compiler of Checkpoint/Restart for OpenCL Program

Efficient Execution of Multiple CUDA Applications Using Transparent Suspend, Resume and Migration

Checkpointing Kernel Executions of MPI+CUDA Applications

References

Ansel J, Arya K, Cooperman G (2009) Dmtcp: Transparent checkpointing for cluster computations and the desktop. In: IEEE International Symposium on Parallel & Distributed Processing, pp 1–12
Arora R, Bangalore P, Mernik M (2011) A technique for non-invasive application-level checkpointing. J Supercomput 57(3):227–255
Article Google Scholar
Bitsavers AK (2008) Principles of operation: type 701 and associated equipment (from ibm manual). Annals of the history of computing 5(2):164–166
Google Scholar
Bozyigit M, Al-Tawil K, Naseer S (2000) A kernel integrated task migration infrastructure for clusters of workstations. Comput Electr Eng 26(3):279–295
Article Google Scholar
Bronevetsky G, Marques D, Pingali K, Stodghill P (2003) Automated application-level checkpointing of mpi programs. pp 84–94
Butt A, Zhang R, Hu Y (2003) A self-organizing flock of condors. https://doi.org/10.1145/1048935.1050192. Cited By 43
Chen T, Raghavan R, Dale JN, Iwata E (2007) Cell broadband engine architecture and its first implementation–a performance view. Ibm J Res Dev 51(5):559–572
Article Google Scholar
Danalis A, Marin G, Mccurdy C, Meredith JS, Roth PC, Spafford K, Tipparaju V, Vetter JS (2010) The scalable heterogeneous computing (shoc) benchmark suite. In: Workshop on general-purpose computation on graphics processing units
Ferreira KB, Riesen R, Brighwell R, Bridges P, Arnold D (2011) libhashckpt: hash-based incremental checkpointing using GPU’s. Springer, Berlin Heidelberg
Google Scholar
Flores I (1972) B72–26 computer organization and the system/370. IEEE Trans Comput C–21(12):1458–1459
Article Google Scholar
Frey J, Tannenbaum T, Livny M, Foster I, Tuecke S (2002) Condor-g: a computation management agent for multi-institutional grids. Cluster Comput 5(3):237–246
Article Google Scholar
Gavrilovska A, Kumar S, Raj H, Schwan K, Gupta V, Nathuji R, Niranjan R, Ranadive A, Saraiya P (2007) High-performance hypervisor architectures: virtualization in hpc systems. Workshop on system
Gioiosa R, Sancho JC, Jiang S, Petrini F, Davis K (2005) Transparent, incremental checkpointing at kernel level: a foundation for fault tolerance for parallel computers
Gleeson J, Kats D, Mei C, Lara ED (2017) Crane: fast and migratable gpu passthrough for opencl applications. In: ACM International Systems and Storage Conference, p 11
Gottschlag M, Hillenbrand M, Kehne J, Stoess J, Bellosa F (2013) LoGV: low-overhead GPGPU virtualization
Group KOW (2017) The OpenCL specification. KHRONOS
Jiang H, Ji Y (2010) State-carrying code for computation mobility. Handbook of Research on Scalable Computing Technologies
Jiang H, Zhang Y, Jennes J, Li KC (2013) A checkpoint/restart scheme for cuda programs with complex computation states. Ijndc 1(4):196
Article Google Scholar
Juckeland G, Brantley W, Chandrasekaran S, Chapman B, Shuai C, Colgrove M, Feng H, Grund A, Henschel R, Hwu WMW (2014) Spec accel: a standard application suite for measuring hardware accelerator performance. In: Pmbs
Kang J, Yu H (2018) Mitigation technique for performance degradation of virtual machine owing to gpu pass-through in fog computing. J Commun Netw 20(3):257–265. https://doi.org/10.1109/JCN.2018.000038
Article Google Scholar
Laadan O, Nieh J (2007) Transparent checkpoint-restart of multiple processes on commodity operating systems. In: Usenix Technical Conference, June 17-22, 2007, Santa Clara, Ca, Usa, pp 323–336
Lama P, Li Y, Aji AM, Balaji P, Dinan J, Xiao S, Zhang Y, Feng W, Thakur R, Zhou X (2013) pvocl: Power-aware dynamic placement and migration in virtualized gpu environments. In: 2013 IEEE 33rd International Conference on Distributed Computing Systems, pp. 145–154. https://doi.org/10.1109/ICDCS.2013.51
Leskela J, Nikula J, Salmela M (2009) Opencl embedded profile prototype in mobile device. In: SiPS 2009. IEEE Workshop on Signal Processing Systems, 2009, pp 279–284
Macedonia M (2003) The GPU enters computing’s mainstream. IEEE Computer Society Press, New York
Book Google Scholar
Mead C, Conway L (1980) Introduction to VLSI systems. Addison-Wesley, Cambridge
Google Scholar
Milojicic DS, Paindaveine Y (1996) Process vs. task migration 1:636
Monperrus M (2018) Automatic software repair: a bibliography. ACM Comput Surv 51(1):17:1–17:24. https://doi.org/10.1145/3105906
Article Google Scholar
Nukada A, Takizawa H, Matsuoka S (2011) Nvcr: A transparent checkpoint-restart library for nvidia cuda. In: IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp 104–113
Owens J, Luebke D, Govindaraju N, Harris M, Krüger J, Lefohn A, Purcell T (2007) A survey of general-purpose computation on graphics hardware. Comput Gr Forum 26(1):80–113. https://doi.org/10.1111/j.1467-8659.2007.01012.x
Article Google Scholar
Owens JD, Houston M, Luebke D, Green S, Stone JE, Phillips JC (2008) Gpu computing. Proc IEEE 96(5):879–899
Article Google Scholar
Paper W (2013) Implementing FPGA design with the OpenCL standard. Altera
Paul H (2006) Berkeley lab checkpoint/restart (blcr) for linux clusters. In: Journal of Physics : Conference Series, p 494
Pourghassemi B, Chandramowlishwaran A (2017) cudacr: An in-kernel application-level checkpoint/restart scheme for cuda-enabled gpus. In: IEEE International Conference on CLUSTER Computing, pp 725–732
Sajjapongse K, Wang X, Becchi M, Sajjapongse K, Wang X, Becchi M (2013) A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with gpus. In: International Symposium on High-Performance Parallel and Distributed Computing, pp. 179–190
Shuai C, Boyer M, Meng J, Tarjan D, Sheaffer JW, Lee SH, Skadron K (2009) Rodinia: a benchmark suite for heterogeneous computing. In: IEEE International Symposium on Workload Characterization
Suzuki T, Nukada A, Matsuoka S (2015) Efficient execution of multiple cuda applications using transparent suspend, resume and migration 9233, 687–699
Takizawa H, Koyama K, Sato K, Komatsu K, Kobayashi H (2011) Checl: Transparent checkpointing and process migration of opencl applications. In: Parallel & Distributed Processing Symposium, pp 864–876
Takizawa H, Sato K, Komatsu K, Kobayashi H (2010) Checuda: a checkpoint/restart tool for cuda applications. In: International Conference on Parallel and Distributed Computing, Applications and Technologies, pp 408–413
Tien TR, You YP (2014) Enabling opencl support for gpgpu in kernel-based virtual machine. Softw Pract Exp 44(5):483–510
Article Google Scholar
Wu TY, Lee WT, Duan CY, Suen TW (2013) Enhancing cloud-based servers by GPU/CPU virtualization management. Springer, Berlin
Book Google Scholar
Xiao S, Balaji P, Dinan J, Zhu Q, Thakur R, Coghlan S, Lin H, Wen G, Hong J, Feng WC (2012) Transparent accelerator migration in a virtualized gpu environment. In: IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp 124–131
Xiao S, Balaji P, Zhu Q, Thakur R, Coghlan S, Lin H, Wen G, Hong J, Feng WC (2012) Vocl: An optimized environment for transparent virtualization of graphics processing units. In: Innovative Parallel Computing, pp. 1–12
Zhou H, Geist A (1995) “receiver makes right” data conversion in pvm. In: Proceedings International Phoenix Conference on Computers and Communications, pp 458–464. https://doi.org/10.1109/PCCC.1995.472453

Download references

Author information

Authors and Affiliations

Zhejiang University Ningbo Institute of Technology, Zhejiang University Ningbo Research Institute, Ningbo, 315100, China
Genlang Chen & Chaoyi Pang
College of Computer Science, Polytechnic Institute, College of Software Technology, Zhejiang University, Hangzhou, 310058, China
Jiajian Zhang, Zufang Zhu & Qiangqiang Jiang
Department of Computer Science, Arkansas State University, Jonesboro, AR, 72746, USA
Hai Jiang

Authors

Genlang Chen
View author publications
You can also search for this author inPubMed Google Scholar
Jiajian Zhang
View author publications
You can also search for this author inPubMed Google Scholar
Zufang Zhu
View author publications
You can also search for this author inPubMed Google Scholar
Qiangqiang Jiang
View author publications
You can also search for this author inPubMed Google Scholar
Hai Jiang
View author publications
You can also search for this author inPubMed Google Scholar
Chaoyi Pang
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Genlang Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Grant No. Natural Science Foundation of Zhejiang Province (LY20F020001).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, G., Zhang, J., Zhu, Z. et al. CRState: checkpoint/restart of OpenCL program for in-kernel applications. J Supercomput 77, 5426–5467 (2021). https://doi.org/10.1007/s11227-020-03460-2

Download citation

Accepted: 14 October 2020
Published: 06 November 2020
Issue Date: June 2021
DOI: https://doi.org/10.1007/s11227-020-03460-2

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

CRState: checkpoint/restart of OpenCL program for in-kernel applications

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

CRAC: An Automatic Assistant Compiler of Checkpoint/Restart for OpenCL Program

Efficient Execution of Multiple CUDA Applications Using Transparent Suspend, Resume and Migration

Checkpointing Kernel Executions of MPI+CUDA Applications

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now