Abstract
Contemporary Grid environments are featured by an increasingly growing virtualization and distribution of resources. Such situations impose greater demands on load-balancing and fault-tolerant capabilities. The checkpoint-restart mechanism seems to be the most intuitive tool that can fulfill the specific requirements. One of the goals of the CoreGRID Network of Excellence is to define the high-level checkpoint-restart Grid Service and to locate it among other Grid Services. We aim to define both the abstract model of that service and the lower layer interface that will allow the service to cooperate with the diverse existing and future checkpoint-restart tools. The paper is the first step leading to achieving this goal. It includes the overall sketch of the architecture of the considered service and its connection with the actual checkpoint-restart tools. Additionally, the work on low-level checkpoint restart tools to be used in the “proof of concept” implementation and integration is mentioned.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Jankowski, G., Mikolajczak, R., Januszewski, R.: Checkpoint/Restart mechanism for multiprocess applications implemented under SGIGrid Project. In: CGW 2004 (2004)
Litzkow, M., Tannenbaun, T., Basney, J., Livny, M.: Checkpoint and Migration of UNIX Processes in the Condor Distributed Processing System, Computer Sciences Department University of Wisconsin-Madison
Libckpt: Transparent Checkpointing under Unix’. In: Conference Proceedings, Usenix Winter 1995 Technical Conference, New Orleans, LA (January 1995)
Kovacs, J., Kacsuk, P.: A migration framework for executing parallel programs in the Grid. In: 2nd European AxGrids Conference, Nicosia, Cyprus, January 28-30, pp. 80–89 (2004)
Next Generation Grid(s), European Grid Research 2005-2010, Expert Group Report, June 16 (2003)
Next Generation Grids 2, Requirements and Options for European Grids Research 2005-2010 and Beyond, Expert Group Report (July 2004)
A Survey of Checkpointing/Restart Implementations, Eric Roman, Lawrence Berkley National Laboratory, CA
Jankowski, G., Mikolajczak, R., Januszewski, R., Meyer, N., Stroinski, M.: Resources Virtualization in Fault-Tolerance and Migration Issues. In: Bubak, M., van Albada, G.D., Sloot, P.M.A., Dongarra, J. (eds.) ICCS 2004. LNCS, vol. 3036, pp. 449–452. Springer, Heidelberg (2004)
Kacsuk, P., Dozsa, G., Kovacs, J., et al.: P-GRADE: a Grid Programming Environment. Journal of Grid Computing 1(2), 171–197 (2004)
PGRADE Parallel Grid Run-time and Application Development Environment: http://www.lpds.sztaki.hu/pgrade
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jankowski, G., Kovacs, J., Meyer, N., Januszewski, R., Mikolajczak, R. (2006). Towards Checkpointing Grid Architecture. In: Wyrzykowski, R., Dongarra, J., Meyer, N., Waśniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2005. Lecture Notes in Computer Science, vol 3911. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11752578_79
Download citation
DOI: https://doi.org/10.1007/11752578_79
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-34141-3
Online ISBN: 978-3-540-34142-0
eBook Packages: Computer ScienceComputer Science (R0)