Skip to main content
Log in

CRState: checkpoint/restart of OpenCL program for in-kernel applications

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

The checkpoint/restart mechanism is critical in a preemptive system because clusters with this mechanism will be improved in terms of fault tolerance, load balance, and resource utilization. As graphics processing units (GPUs) have more recently become commonplace with the advent of general-purpose computation, and open computing language (OpenCL) programs are portable across various CPUs and GPUs, it is increasingly important to set up checkpoint/restart mechanism in OpenCL programs. However, due to the complexity of the internal computational state of the GPU, there is currently no effective and reasonable checkpoint/restart scheme for OpenCL applications. This paper proposes a feasible system, checkpoint/restart state (CRState), to achieve checkpoint/restart in GPU kernels. The computation states including heap, data segments, local memory, stack and code segments in the underlying hardware are identified and concretized in order to establish an association between the underlying level state and the application level representation. Then, a pre-compiler is developed to insert primitives into OpenCL programs at compile time so that major components of the computation state will be extracted at runtime. Since the computation state is duplicated at application level, such OpenCL programs can be preempted and ported across heterogeneous devices. A comprehensive example and ten authoritative benchmark programs are selected to demonstrate the feasibility and effectiveness of the proposed system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26
Fig. 27

Similar content being viewed by others

References

  1. Ansel J, Arya K, Cooperman G (2009) Dmtcp: Transparent checkpointing for cluster computations and the desktop. In: IEEE International Symposium on Parallel & Distributed Processing, pp 1–12

  2. Arora R, Bangalore P, Mernik M (2011) A technique for non-invasive application-level checkpointing. J Supercomput 57(3):227–255

    Article  Google Scholar 

  3. Bitsavers AK (2008) Principles of operation: type 701 and associated equipment (from ibm manual). Annals of the history of computing 5(2):164–166

    Google Scholar 

  4. Bozyigit M, Al-Tawil K, Naseer S (2000) A kernel integrated task migration infrastructure for clusters of workstations. Comput Electr Eng 26(3):279–295

    Article  Google Scholar 

  5. Bronevetsky G, Marques D, Pingali K, Stodghill P (2003) Automated application-level checkpointing of mpi programs. pp 84–94

  6. Butt A, Zhang R, Hu Y (2003) A self-organizing flock of condors. https://doi.org/10.1145/1048935.1050192. Cited By 43

  7. Chen T, Raghavan R, Dale JN, Iwata E (2007) Cell broadband engine architecture and its first implementation–a performance view. Ibm J Res Dev 51(5):559–572

    Article  Google Scholar 

  8. Danalis A, Marin G, Mccurdy C, Meredith JS, Roth PC, Spafford K, Tipparaju V, Vetter JS (2010) The scalable heterogeneous computing (shoc) benchmark suite. In: Workshop on general-purpose computation on graphics processing units

  9. Ferreira KB, Riesen R, Brighwell R, Bridges P, Arnold D (2011) libhashckpt: hash-based incremental checkpointing using GPU’s. Springer, Berlin Heidelberg

    Google Scholar 

  10. Flores I (1972) B72–26 computer organization and the system/370. IEEE Trans Comput C–21(12):1458–1459

    Article  Google Scholar 

  11. Frey J, Tannenbaum T, Livny M, Foster I, Tuecke S (2002) Condor-g: a computation management agent for multi-institutional grids. Cluster Comput 5(3):237–246

    Article  Google Scholar 

  12. Gavrilovska A, Kumar S, Raj H, Schwan K, Gupta V, Nathuji R, Niranjan R, Ranadive A, Saraiya P (2007) High-performance hypervisor architectures: virtualization in hpc systems. Workshop on system

  13. Gioiosa R, Sancho JC, Jiang S, Petrini F, Davis K (2005) Transparent, incremental checkpointing at kernel level: a foundation for fault tolerance for parallel computers

  14. Gleeson J, Kats D, Mei C, Lara ED (2017) Crane: fast and migratable gpu passthrough for opencl applications. In: ACM International Systems and Storage Conference, p 11

  15. Gottschlag M, Hillenbrand M, Kehne J, Stoess J, Bellosa F (2013) LoGV: low-overhead GPGPU virtualization

  16. Group KOW (2017) The OpenCL specification. KHRONOS

  17. Jiang H, Ji Y (2010) State-carrying code for computation mobility. Handbook of Research on Scalable Computing Technologies

  18. Jiang H, Zhang Y, Jennes J, Li KC (2013) A checkpoint/restart scheme for cuda programs with complex computation states. Ijndc 1(4):196

    Article  Google Scholar 

  19. Juckeland G, Brantley W, Chandrasekaran S, Chapman B, Shuai C, Colgrove M, Feng H, Grund A, Henschel R, Hwu WMW (2014) Spec accel: a standard application suite for measuring hardware accelerator performance. In: Pmbs

  20. Kang J, Yu H (2018) Mitigation technique for performance degradation of virtual machine owing to gpu pass-through in fog computing. J Commun Netw 20(3):257–265. https://doi.org/10.1109/JCN.2018.000038

    Article  Google Scholar 

  21. Laadan O, Nieh J (2007) Transparent checkpoint-restart of multiple processes on commodity operating systems. In: Usenix Technical Conference, June 17-22, 2007, Santa Clara, Ca, Usa, pp 323–336

  22. Lama P, Li Y, Aji AM, Balaji P, Dinan J, Xiao S, Zhang Y, Feng W, Thakur R, Zhou X (2013) pvocl: Power-aware dynamic placement and migration in virtualized gpu environments. In: 2013 IEEE 33rd International Conference on Distributed Computing Systems, pp. 145–154. https://doi.org/10.1109/ICDCS.2013.51

  23. Leskela J, Nikula J, Salmela M (2009) Opencl embedded profile prototype in mobile device. In: SiPS 2009. IEEE Workshop on Signal Processing Systems, 2009, pp 279–284

  24. Macedonia M (2003) The GPU enters computing’s mainstream. IEEE Computer Society Press, New York

    Book  Google Scholar 

  25. Mead C, Conway L (1980) Introduction to VLSI systems. Addison-Wesley, Cambridge

    Google Scholar 

  26. Milojicic DS, Paindaveine Y (1996) Process vs. task migration 1:636

  27. Monperrus M (2018) Automatic software repair: a bibliography. ACM Comput Surv 51(1):17:1–17:24. https://doi.org/10.1145/3105906

    Article  Google Scholar 

  28. Nukada A, Takizawa H, Matsuoka S (2011) Nvcr: A transparent checkpoint-restart library for nvidia cuda. In: IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp 104–113

  29. Owens J, Luebke D, Govindaraju N, Harris M, Krüger J, Lefohn A, Purcell T (2007) A survey of general-purpose computation on graphics hardware. Comput Gr Forum 26(1):80–113. https://doi.org/10.1111/j.1467-8659.2007.01012.x

    Article  Google Scholar 

  30. Owens JD, Houston M, Luebke D, Green S, Stone JE, Phillips JC (2008) Gpu computing. Proc IEEE 96(5):879–899

    Article  Google Scholar 

  31. Paper W (2013) Implementing FPGA design with the OpenCL standard. Altera

  32. Paul H (2006) Berkeley lab checkpoint/restart (blcr) for linux clusters. In: Journal of Physics : Conference Series, p 494

  33. Pourghassemi B, Chandramowlishwaran A (2017) cudacr: An in-kernel application-level checkpoint/restart scheme for cuda-enabled gpus. In: IEEE International Conference on CLUSTER Computing, pp 725–732

  34. Sajjapongse K, Wang X, Becchi M, Sajjapongse K, Wang X, Becchi M (2013) A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with gpus. In: International Symposium on High-Performance Parallel and Distributed Computing, pp. 179–190

  35. Shuai C, Boyer M, Meng J, Tarjan D, Sheaffer JW, Lee SH, Skadron K (2009) Rodinia: a benchmark suite for heterogeneous computing. In: IEEE International Symposium on Workload Characterization

  36. Suzuki T, Nukada A, Matsuoka S (2015) Efficient execution of multiple cuda applications using transparent suspend, resume and migration 9233, 687–699

  37. Takizawa H, Koyama K, Sato K, Komatsu K, Kobayashi H (2011) Checl: Transparent checkpointing and process migration of opencl applications. In: Parallel & Distributed Processing Symposium, pp 864–876

  38. Takizawa H, Sato K, Komatsu K, Kobayashi H (2010) Checuda: a checkpoint/restart tool for cuda applications. In: International Conference on Parallel and Distributed Computing, Applications and Technologies, pp 408–413

  39. Tien TR, You YP (2014) Enabling opencl support for gpgpu in kernel-based virtual machine. Softw Pract Exp 44(5):483–510

    Article  Google Scholar 

  40. Wu TY, Lee WT, Duan CY, Suen TW (2013) Enhancing cloud-based servers by GPU/CPU virtualization management. Springer, Berlin

    Book  Google Scholar 

  41. Xiao S, Balaji P, Dinan J, Zhu Q, Thakur R, Coghlan S, Lin H, Wen G, Hong J, Feng WC (2012) Transparent accelerator migration in a virtualized gpu environment. In: IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp 124–131

  42. Xiao S, Balaji P, Zhu Q, Thakur R, Coghlan S, Lin H, Wen G, Hong J, Feng WC (2012) Vocl: An optimized environment for transparent virtualization of graphics processing units. In: Innovative Parallel Computing, pp. 1–12

  43. Zhou H, Geist A (1995) “receiver makes right” data conversion in pvm. In: Proceedings International Phoenix Conference on Computers and Communications, pp 458–464. https://doi.org/10.1109/PCCC.1995.472453

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Genlang Chen.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Grant No. Natural Science Foundation of Zhejiang Province (LY20F020001).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, G., Zhang, J., Zhu, Z. et al. CRState: checkpoint/restart of OpenCL program for in-kernel applications. J Supercomput 77, 5426–5467 (2021). https://doi.org/10.1007/s11227-020-03460-2

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-020-03460-2

Keywords

Navigation