CRAC: An Automatic Assistant Compiler of Checkpoint/Restart for OpenCL Program

Chen, Genlang; Zhang, Jiajian; Zhu, Zufang; Zhu, Chaoyan; Jiang, Hai; Pang, Chaoyi

doi:10.1007/978-981-15-2810-1_54

Genlang Chen^15,16,
Jiajian Zhang¹⁷,
Zufang Zhu¹⁷,
Chaoyan Zhu^15,16,
Hai Jiang¹⁸ &
…
Chaoyi Pang¹⁵

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1179))

Included in the following conference series:

International Conference on Data Service

1156 Accesses

Abstract

Nowadays, people use multiple devices to meet a growing requirement for computing. With the application of multi-card computing, fault tolerance, load balance, and resource sharing have been the hot issues and the checkpoint/restart (CR) mechanism is critical in a preemptive system. This paper proposes a checkpoint/restart framework including the automatic compiler (CRAC) to achieve a feasible checkpoint/restart system, especially for GPU applications on heterogeneous devices in OpenCL program. By offering the positions of the checkpoint/restart in source code, CRAC inserts primitives into programs and invokes the runtime support modules for final results. A comprehensive example and experiments have demonstrated the feasibility and effectiveness of proposed framework.

Supported by Natural Science Foundation of China (No. 61572022) and the Ningbo eHealth Project (No. 2016C11024).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Arora, R., Bangalore, P., Mernik, M.: A technique for non-invasive application-level checkpointing. J. Supercomputing 57(3), 227–255 (2011)
Article Google Scholar
Bozyigit, M., Al-Tawil, K., Naseer, S.: A kernel integrated task migration infrastructure for clusters of workstations. Comput. Electr. Eng. 26(3), 279–295 (2000)
Article Google Scholar
Bronevetsky, G., Marques, D., Pingali, K., Stodghill, P.: Automated application-level checkpointing of MPI programs. In: Proceedings of the Ninth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. PPoPP 2003, pp. 84–94. ACM, New York (2003). https://doi.org/10.1145/781498.781513
Chen, G., Zhang, J., Pan, Y., Pang, C.: An image processing method via OpenCL for identification of pulmonary nodules. In: Asia-Pacific Web (APWeb) and Web-Age Information Management (WAIM) Joint International Conference on Web and Big Data (2018)
Google Scholar
Danalis, A., et al.: The scalable heterogeneous computing (SHOC) benchmark suite. In: Workshop on General-Purpose Computation on Graphics Processing Units (2010)
Google Scholar
Gioiosa, R., Sancho, J.C., Jiang, S., Petrini, F.: Transparent, incremental checkpointing at kernel level: a foundation for fault tolerance for parallel computers. In: SC 2005: Proceedings of the 2005 ACM/IEEE Conference on Supercomputing, p. 9, November 2005. https://doi.org/10.1109/SC.2005.76
Group, K.O.W.: The OpenCL Specification. KHRONOS (2017)
Google Scholar
Jiang, H., Zhang, Y., Jennes, J., Li, K.C.: A checkpoint/restart scheme for CUDA programs with complex computation states. IJNDC 1(4), 196 (2013)
Article Google Scholar
Juckeland, G., et al.: SPEC ACCEL: a standard application suite for measuring hardware accelerator performance. In: Jarvis, S.A., Wright, S.A., Hammond, S.D. (eds.) PMBS 2014. LNCS, vol. 8966, pp. 46–67. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-17248-4_3
Chapter Google Scholar
Kang, J., Yu, H.: Mitigation technique for performance degradation of virtual machine owing to GPU pass-through in fog computing. J. Commun. Netw. 20(3), 257–265 (2018). https://doi.org/10.1109/JCN.2018.000038
Article Google Scholar
Laadan, O., Nieh, J.: Transparent checkpoint-restart of multiple processes on commodity operating systems. In: Usenix Technical Conference, Santa Clara, CA, USA, 17–22 June 2007, pp. 323–336 (2007)
Google Scholar
Lama, P., et al.: pVOCL: power-aware dynamic placement and migration in virtualized GPU environments. In: 2013 IEEE 33rd International Conference on Distributed Computing Systems, pp. 145–154, July 2013. https://doi.org/10.1109/ICDCS.2013.51
Nukada, A., Takizawa, H., Matsuoka, S.: NVCR: a transparent checkpoint-restart library for NVIDIA Cuda. In: IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp. 104–113 (2011)
Google Scholar
Paindaveine, Y., Milojicic, D.S.: Process vs. task migration. In: Hawaii International Conference on System Sciences (1996)
Google Scholar
Parr, T.J., Quong, R.W.: Adding semantic and syntactic predicates to LL(k): pred-LL(k). In: Fritzson, P.A. (ed.) CC 1994. LNCS, vol. 786, pp. 263–277. Springer, Heidelberg (1994). https://doi.org/10.1007/3-540-57877-3_18
Chapter Google Scholar
Paul, H.: Berkeley lab checkpoint/restart (BLCR) for linux clusters. In: Journal of Physics: Conference Series, p. 494 (2006)
Google Scholar
Pourghassemi, B., Chandramowlishwaran, A.: cudaCR: an in-kernel application-level checkpoint/restart scheme for CUDA-enabled GPUS. In: IEEE International Conference on CLUSTER Computing, pp. 725–732 (2017)
Google Scholar
Sajjapongse, K., Wang, X., Becchi, M., Sajjapongse, K., Wang, X., Becchi, M.: A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with GPUS. In: International Symposium on High-Performance Parallel and Distributed Computing, pp. 179–190 (2013)
Google Scholar
Shuai, C., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: IEEE International Symposium on Workload Characterization (2009)
Google Scholar
Takizawa, H., Koyama, K., Sato, K., Komatsu, K., Kobayashi, H.: CheCL: transparent checkpointing and process migration of OpenCL applications. In: Parallel & Distributed Processing Symposium, pp. 864–876 (2011)
Google Scholar
Takizawa, H., Sato, K., Komatsu, K., Kobayashi, H.: CheCUDA: a checkpoint/restart tool for CUDA applications. In: International Conference on Parallel and Distributed Computing, Applications and Technologies, pp. 408–413 (2010)
Google Scholar
Xiao, S., et al.: Transparent accelerator migration in a virtualized GPU environment. In: IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, pp. 124–131 (2012)
Google Scholar
Xiao, S., et al.: VOCL: an optimized environment for transparent virtualization of graphics processing units. In: Innovative Parallel Computing, pp. 1–12 (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Ningbo Institute of Technology, Zhejiang University, Ningbo, 315100, China
Genlang Chen, Chaoyan Zhu & Chaoyi Pang
Ningbo Research Institute, Zhejiang University, Ningbo, 315100, China
Genlang Chen & Chaoyan Zhu
College of Computer Science, Polytechnic Institute, Zhejiang University, Hangzhou, 310058, China
Jiajian Zhang & Zufang Zhu
Department of Computer Science, Arkansas State University, Jonesboro, AR, 72746, USA
Hai Jiang

Authors

Genlang Chen
View author publications
You can also search for this author in PubMed Google Scholar
Jiajian Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zufang Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Chaoyan Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Hai Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Chaoyi Pang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Genlang Chen .

Editor information

Editors and Affiliations

Swinburne University of Technology, Melbourne, VIC, Australia
Jing He
University of Illinois at Chicago, Chicago, USA
Philip S. Yu
College of Information Science and Technology, University of Nebraska at Omaha, Omaha, NE, USA
Yong Shi
Research Institute of Extenics and Innovation Methods, Guangdong University of Technology, Guangzhou, China
Xingsen Li
Ningbo University, Ningbo, China
Zhijun Xie
Deakin University, Burwood, VIC, Australia
Guangyan Huang
Department of Computer Science and Technology, Nanjing University of Science and Technology, Nanjing, China
Jie Cao
Nanjing University of Posts and Telecommunications, Nanjing, China
Fu Xiao

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chen, G., Zhang, J., Zhu, Z., Zhu, C., Jiang, H., Pang, C. (2020). CRAC: An Automatic Assistant Compiler of Checkpoint/Restart for OpenCL Program. In: He, J., et al. Data Science. ICDS 2019. Communications in Computer and Information Science, vol 1179. Springer, Singapore. https://doi.org/10.1007/978-981-15-2810-1_54

Download citation

DOI: https://doi.org/10.1007/978-981-15-2810-1_54
Published: 02 February 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-2809-5
Online ISBN: 978-981-15-2810-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics