Skip to main content

CRAC: An Automatic Assistant Compiler of Checkpoint/Restart for OpenCL Program

  • Conference paper
  • First Online:
Data Science (ICDS 2019)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1179))

Included in the following conference series:

  • 1156 Accesses

Abstract

Nowadays, people use multiple devices to meet a growing requirement for computing. With the application of multi-card computing, fault tolerance, load balance, and resource sharing have been the hot issues and the checkpoint/restart (CR) mechanism is critical in a preemptive system. This paper proposes a checkpoint/restart framework including the automatic compiler (CRAC) to achieve a feasible checkpoint/restart system, especially for GPU applications on heterogeneous devices in OpenCL program. By offering the positions of the checkpoint/restart in source code, CRAC inserts primitives into programs and invokes the runtime support modules for final results. A comprehensive example and experiments have demonstrated the feasibility and effectiveness of proposed framework.

Supported by Natural Science Foundation of China (No. 61572022) and the Ningbo eHealth Project (No. 2016C11024).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Arora, R., Bangalore, P., Mernik, M.: A technique for non-invasive application-level checkpointing. J. Supercomputing 57(3), 227–255 (2011)

    Article  Google Scholar 

  2. Bozyigit, M., Al-Tawil, K., Naseer, S.: A kernel integrated task migration infrastructure for clusters of workstations. Comput. Electr. Eng. 26(3), 279–295 (2000)

    Article  Google Scholar 

  3. Bronevetsky, G., Marques, D., Pingali, K., Stodghill, P.: Automated application-level checkpointing of MPI programs. In: Proceedings of the Ninth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. PPoPP 2003, pp. 84–94. ACM, New York (2003). https://doi.org/10.1145/781498.781513

  4. Chen, G., Zhang, J., Pan, Y., Pang, C.: An image processing method via OpenCL for identification of pulmonary nodules. In: Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data (2018)

    Google Scholar 

  5. Danalis, A., et al.: The scalable heterogeneous computing (SHOC) benchmark suite. In: Workshop on General-Purpose Computation on Graphics Processing Units (2010)

    Google Scholar 

  6. Gioiosa, R., Sancho, J.C., Jiang, S., Petrini, F.: Transparent, incremental checkpointing at kernel level: a foundation for fault tolerance for parallel computers. In: SC 2005: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, p. 9, November 2005. https://doi.org/10.1109/SC.2005.76

  7. Group, K.O.W.: The OpenCL Specification. KHRONOS (2017)

    Google Scholar 

  8. Jiang, H., Zhang, Y., Jennes, J., Li, K.C.: A checkpoint/restart scheme for CUDA programs with complex computation states. IJNDC 1(4), 196 (2013)

    Article  Google Scholar 

  9. Juckeland, G., et al.: SPEC ACCEL: a standard application suite for measuring hardware accelerator performance. In: Jarvis, S.A., Wright, S.A., Hammond, S.D. (eds.) PMBS 2014. LNCS, vol. 8966, pp. 46–67. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-17248-4_3

    Chapter  Google Scholar 

  10. Kang, J., Yu, H.: Mitigation technique for performance degradation of virtual machine owing to GPU pass-through in fog computing. J. Commun. Netw. 20(3), 257–265 (2018). https://doi.org/10.1109/JCN.2018.000038

    Article  Google Scholar 

  11. Laadan, O., Nieh, J.: Transparent checkpoint-restart of multiple processes on commodity operating systems. In: Usenix Technical Conference, Santa Clara, CA, USA, 17–22 June 2007, pp. 323–336 (2007)

    Google Scholar 

  12. Lama, P., et al.: pVOCL: power-aware dynamic placement and migration in virtualized GPU environments. In: 2013 IEEE 33rd International Conference on Distributed Computing Systems, pp. 145–154, July 2013. https://doi.org/10.1109/ICDCS.2013.51

  13. Nukada, A., Takizawa, H., Matsuoka, S.: NVCR: a transparent checkpoint-restart library for NVIDIA Cuda. In: IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp. 104–113 (2011)

    Google Scholar 

  14. Paindaveine, Y., Milojicic, D.S.: Process vs. task migration. In: Hawaii International Conference on System Sciences (1996)

    Google Scholar 

  15. Parr, T.J., Quong, R.W.: Adding semantic and syntactic predicates to LL(k): pred-LL(k). In: Fritzson, P.A. (ed.) CC 1994. LNCS, vol. 786, pp. 263–277. Springer, Heidelberg (1994). https://doi.org/10.1007/3-540-57877-3_18

    Chapter  Google Scholar 

  16. Paul, H.: Berkeley lab checkpoint/restart (BLCR) for linux clusters. In: Journal of Physics: Conference Series, p. 494 (2006)

    Google Scholar 

  17. Pourghassemi, B., Chandramowlishwaran, A.: cudaCR: an in-kernel application-level checkpoint/restart scheme for CUDA-enabled GPUS. In: IEEE International Conference on CLUSTER Computing, pp. 725–732 (2017)

    Google Scholar 

  18. Sajjapongse, K., Wang, X., Becchi, M., Sajjapongse, K., Wang, X., Becchi, M.: A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with GPUS. In: International Symposium on High-Performance Parallel and Distributed Computing, pp. 179–190 (2013)

    Google Scholar 

  19. Shuai, C., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: IEEE International Symposium on Workload Characterization (2009)

    Google Scholar 

  20. Takizawa, H., Koyama, K., Sato, K., Komatsu, K., Kobayashi, H.: CheCL: transparent checkpointing and process migration of OpenCL applications. In: Parallel & Distributed Processing Symposium, pp. 864–876 (2011)

    Google Scholar 

  21. Takizawa, H., Sato, K., Komatsu, K., Kobayashi, H.: CheCUDA: a checkpoint/restart tool for CUDA applications. In: International Conference on Parallel and Distributed Computing, Applications and Technologies, pp. 408–413 (2010)

    Google Scholar 

  22. Xiao, S., et al.: Transparent accelerator migration in a virtualized GPU environment. In: IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 124–131 (2012)

    Google Scholar 

  23. Xiao, S., et al.: VOCL: an optimized environment for transparent virtualization of graphics processing units. In: Innovative Parallel Computing, pp. 1–12 (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Genlang Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chen, G., Zhang, J., Zhu, Z., Zhu, C., Jiang, H., Pang, C. (2020). CRAC: An Automatic Assistant Compiler of Checkpoint/Restart for OpenCL Program. In: He, J., et al. Data Science. ICDS 2019. Communications in Computer and Information Science, vol 1179. Springer, Singapore. https://doi.org/10.1007/978-981-15-2810-1_54

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-2810-1_54

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-2809-5

  • Online ISBN: 978-981-15-2810-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics