Abstract
The checkpoint/restart mechanism is critical in a preemptive system because clusters with this mechanism will be improved in terms of fault tolerance, load balance, and resource utilization. As graphics processing units (GPUs) have more recently become commonplace with the advent of general-purpose computation, and open computing language (OpenCL) programs are portable across various CPUs and GPUs, it is increasingly important to set up checkpoint/restart mechanism in OpenCL programs. However, due to the complexity of the internal computational state of the GPU, there is currently no effective and reasonable checkpoint/restart scheme for OpenCL applications. This paper proposes a feasible system, checkpoint/restart state (CRState), to achieve checkpoint/restart in GPU kernels. The computation states including heap, data segments, local memory, stack and code segments in the underlying hardware are identified and concretized in order to establish an association between the underlying level state and the application level representation. Then, a pre-compiler is developed to insert primitives into OpenCL programs at compile time so that major components of the computation state will be extracted at runtime. Since the computation state is duplicated at application level, such OpenCL programs can be preempted and ported across heterogeneous devices. A comprehensive example and ten authoritative benchmark programs are selected to demonstrate the feasibility and effectiveness of the proposed system.



























Similar content being viewed by others
References
Ansel J, Arya K, Cooperman G (2009) Dmtcp: Transparent checkpointing for cluster computations and the desktop. In: IEEE International Symposium on Parallel & Distributed Processing, pp 1–12
Arora R, Bangalore P, Mernik M (2011) A technique for non-invasive application-level checkpointing. J Supercomput 57(3):227–255
Bitsavers AK (2008) Principles of operation: type 701 and associated equipment (from ibm manual). Annals of the history of computing 5(2):164–166
Bozyigit M, Al-Tawil K, Naseer S (2000) A kernel integrated task migration infrastructure for clusters of workstations. Comput Electr Eng 26(3):279–295
Bronevetsky G, Marques D, Pingali K, Stodghill P (2003) Automated application-level checkpointing of mpi programs. pp 84–94
Butt A, Zhang R, Hu Y (2003) A self-organizing flock of condors. https://doi.org/10.1145/1048935.1050192. Cited By 43
Chen T, Raghavan R, Dale JN, Iwata E (2007) Cell broadband engine architecture and its first implementation–a performance view. Ibm J Res Dev 51(5):559–572
Danalis A, Marin G, Mccurdy C, Meredith JS, Roth PC, Spafford K, Tipparaju V, Vetter JS (2010) The scalable heterogeneous computing (shoc) benchmark suite. In: Workshop on general-purpose computation on graphics processing units
Ferreira KB, Riesen R, Brighwell R, Bridges P, Arnold D (2011) libhashckpt: hash-based incremental checkpointing using GPU’s. Springer, Berlin Heidelberg
Flores I (1972) B72–26 computer organization and the system/370. IEEE Trans Comput C–21(12):1458–1459
Frey J, Tannenbaum T, Livny M, Foster I, Tuecke S (2002) Condor-g: a computation management agent for multi-institutional grids. Cluster Comput 5(3):237–246
Gavrilovska A, Kumar S, Raj H, Schwan K, Gupta V, Nathuji R, Niranjan R, Ranadive A, Saraiya P (2007) High-performance hypervisor architectures: virtualization in hpc systems. Workshop on system
Gioiosa R, Sancho JC, Jiang S, Petrini F, Davis K (2005) Transparent, incremental checkpointing at kernel level: a foundation for fault tolerance for parallel computers
Gleeson J, Kats D, Mei C, Lara ED (2017) Crane: fast and migratable gpu passthrough for opencl applications. In: ACM International Systems and Storage Conference, p 11
Gottschlag M, Hillenbrand M, Kehne J, Stoess J, Bellosa F (2013) LoGV: low-overhead GPGPU virtualization
Group KOW (2017) The OpenCL specification. KHRONOS
Jiang H, Ji Y (2010) State-carrying code for computation mobility. Handbook of Research on Scalable Computing Technologies
Jiang H, Zhang Y, Jennes J, Li KC (2013) A checkpoint/restart scheme for cuda programs with complex computation states. Ijndc 1(4):196
Juckeland G, Brantley W, Chandrasekaran S, Chapman B, Shuai C, Colgrove M, Feng H, Grund A, Henschel R, Hwu WMW (2014) Spec accel: a standard application suite for measuring hardware accelerator performance. In: Pmbs
Kang J, Yu H (2018) Mitigation technique for performance degradation of virtual machine owing to gpu pass-through in fog computing. J Commun Netw 20(3):257–265. https://doi.org/10.1109/JCN.2018.000038
Laadan O, Nieh J (2007) Transparent checkpoint-restart of multiple processes on commodity operating systems. In: Usenix Technical Conference, June 17-22, 2007, Santa Clara, Ca, Usa, pp 323–336
Lama P, Li Y, Aji AM, Balaji P, Dinan J, Xiao S, Zhang Y, Feng W, Thakur R, Zhou X (2013) pvocl: Power-aware dynamic placement and migration in virtualized gpu environments. In: 2013 IEEE 33rd International Conference on Distributed Computing Systems, pp. 145–154. https://doi.org/10.1109/ICDCS.2013.51
Leskela J, Nikula J, Salmela M (2009) Opencl embedded profile prototype in mobile device. In: SiPS 2009. IEEE Workshop on Signal Processing Systems, 2009, pp 279–284
Macedonia M (2003) The GPU enters computing’s mainstream. IEEE Computer Society Press, New York
Mead C, Conway L (1980) Introduction to VLSI systems. Addison-Wesley, Cambridge
Milojicic DS, Paindaveine Y (1996) Process vs. task migration 1:636
Monperrus M (2018) Automatic software repair: a bibliography. ACM Comput Surv 51(1):17:1–17:24. https://doi.org/10.1145/3105906
Nukada A, Takizawa H, Matsuoka S (2011) Nvcr: A transparent checkpoint-restart library for nvidia cuda. In: IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp 104–113
Owens J, Luebke D, Govindaraju N, Harris M, Krüger J, Lefohn A, Purcell T (2007) A survey of general-purpose computation on graphics hardware. Comput Gr Forum 26(1):80–113. https://doi.org/10.1111/j.1467-8659.2007.01012.x
Owens JD, Houston M, Luebke D, Green S, Stone JE, Phillips JC (2008) Gpu computing. Proc IEEE 96(5):879–899
Paper W (2013) Implementing FPGA design with the OpenCL standard. Altera
Paul H (2006) Berkeley lab checkpoint/restart (blcr) for linux clusters. In: Journal of Physics : Conference Series, p 494
Pourghassemi B, Chandramowlishwaran A (2017) cudacr: An in-kernel application-level checkpoint/restart scheme for cuda-enabled gpus. In: IEEE International Conference on CLUSTER Computing, pp 725–732
Sajjapongse K, Wang X, Becchi M, Sajjapongse K, Wang X, Becchi M (2013) A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with gpus. In: International Symposium on High-Performance Parallel and Distributed Computing, pp. 179–190
Shuai C, Boyer M, Meng J, Tarjan D, Sheaffer JW, Lee SH, Skadron K (2009) Rodinia: a benchmark suite for heterogeneous computing. In: IEEE International Symposium on Workload Characterization
Suzuki T, Nukada A, Matsuoka S (2015) Efficient execution of multiple cuda applications using transparent suspend, resume and migration 9233, 687–699
Takizawa H, Koyama K, Sato K, Komatsu K, Kobayashi H (2011) Checl: Transparent checkpointing and process migration of opencl applications. In: Parallel & Distributed Processing Symposium, pp 864–876
Takizawa H, Sato K, Komatsu K, Kobayashi H (2010) Checuda: a checkpoint/restart tool for cuda applications. In: International Conference on Parallel and Distributed Computing, Applications and Technologies, pp 408–413
Tien TR, You YP (2014) Enabling opencl support for gpgpu in kernel-based virtual machine. Softw Pract Exp 44(5):483–510
Wu TY, Lee WT, Duan CY, Suen TW (2013) Enhancing cloud-based servers by GPU/CPU virtualization management. Springer, Berlin
Xiao S, Balaji P, Dinan J, Zhu Q, Thakur R, Coghlan S, Lin H, Wen G, Hong J, Feng WC (2012) Transparent accelerator migration in a virtualized gpu environment. In: IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp 124–131
Xiao S, Balaji P, Zhu Q, Thakur R, Coghlan S, Lin H, Wen G, Hong J, Feng WC (2012) Vocl: An optimized environment for transparent virtualization of graphics processing units. In: Innovative Parallel Computing, pp. 1–12
Zhou H, Geist A (1995) “receiver makes right” data conversion in pvm. In: Proceedings International Phoenix Conference on Computers and Communications, pp 458–464. https://doi.org/10.1109/PCCC.1995.472453
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Grant No. Natural Science Foundation of Zhejiang Province (LY20F020001).
Rights and permissions
About this article
Cite this article
Chen, G., Zhang, J., Zhu, Z. et al. CRState: checkpoint/restart of OpenCL program for in-kernel applications. J Supercomput 77, 5426–5467 (2021). https://doi.org/10.1007/s11227-020-03460-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-020-03460-2