Abstract
The Grid community has made an important effort in developing middleware to provide different functionalities, such as resource discovery, resource management, job submission, execution monitoring. As part of this effort this paper addresses the design and implementation of an architecture (CPPC-G) based on services to manage the execution of fault tolerant applications on Grids. The CPPC (Controller/Precompiler for Portable Checkpointing) framework is used to insert checkpoint instrumentation into the application code. Designed services will be in charge of submission and monitoring of the execution of the application, management of checkpoint files and detection and automatic restart of failed executions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Elnozahy, E.N., Alvisi, L., Wang, Y.M., Johnson, D.B.: A survey of rollback-recovery protocols in message-passing systems. ACM Computing Surveys 34(3), 375–408 (2002)
Rodríguez, G., Martín, M.J., González, P., no, J.T.: Controller/Precompiler for Portable Checkpointing. IEICE Transactions on Information and Systems E89-D(2), 408–417 (2006)
Rodríguez, G., Martín, M.J., González, P., no, J.T., Doallo, R.: Portable checkpointing of MPI applications. In: Proceedings of the 12th Workshop on Compilers for Parallel Computers (CPC 2006), A Coruña, Spain, pp. 396–410 (January 2006)
Foster, I.T.: Globus toolkit version 4: Software for service-oriented systems. Journal of Computer Science and Technology 21(4), 513–520 (2006)
National Center for Supercomputing Applications: HDF-5: File Format Specification, http://hdf.ncsa.uiuc.edu/HDF5/doc/
Gailly, J., Adler, M.: ZLib Home Page, http://www.gzip.org/zlib/
Hlary, J., Netzer, R., Raynal, M.: Consistency issues in distributed checkpoints. IEEE Transactions on Software Engineering 25(2), 274–281 (1999)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Díaz, D., Pardo, X.C., Martín, M.J., González, P., Rodríguez, G. (2008). CPPC-G: Fault-Tolerant Applications on the Grid. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2007. Lecture Notes in Computer Science, vol 4967. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68111-3_90
Download citation
DOI: https://doi.org/10.1007/978-3-540-68111-3_90
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68105-2
Online ISBN: 978-3-540-68111-3
eBook Packages: Computer ScienceComputer Science (R0)