Skip to main content

CPPC-G: Fault-Tolerant Applications on the Grid

  • Conference paper
Parallel Processing and Applied Mathematics (PPAM 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4967))

  • 850 Accesses

Abstract

The Grid community has made an important effort in developing middleware to provide different functionalities, such as resource discovery, resource management, job submission, execution monitoring. As part of this effort this paper addresses the design and implementation of an architecture (CPPC-G) based on services to manage the execution of fault tolerant applications on Grids. The CPPC (Controller/Precompiler for Portable Checkpointing) framework is used to insert checkpoint instrumentation into the application code. Designed services will be in charge of submission and monitoring of the execution of the application, management of checkpoint files and detection and automatic restart of failed executions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Elnozahy, E.N., Alvisi, L., Wang, Y.M., Johnson, D.B.: A survey of rollback-recovery protocols in message-passing systems. ACM Computing Surveys 34(3), 375–408 (2002)

    Article  Google Scholar 

  2. Rodríguez, G., Martín, M.J., González, P., no, J.T.: Controller/Precompiler for Portable Checkpointing. IEICE Transactions on Information and Systems E89-D(2), 408–417 (2006)

    Article  Google Scholar 

  3. Rodríguez, G., Martín, M.J., González, P., no, J.T., Doallo, R.: Portable checkpointing of MPI applications. In: Proceedings of the 12th Workshop on Compilers for Parallel Computers (CPC 2006), A Coruña, Spain, pp. 396–410 (January 2006)

    Google Scholar 

  4. Foster, I.T.: Globus toolkit version 4: Software for service-oriented systems. Journal of Computer Science and Technology 21(4), 513–520 (2006)

    Article  Google Scholar 

  5. National Center for Supercomputing Applications: HDF-5: File Format Specification, http://hdf.ncsa.uiuc.edu/HDF5/doc/

  6. Gailly, J., Adler, M.: ZLib Home Page, http://www.gzip.org/zlib/

  7. Hlary, J., Netzer, R., Raynal, M.: Consistency issues in distributed checkpoints. IEEE Transactions on Software Engineering 25(2), 274–281 (1999)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Roman Wyrzykowski Jack Dongarra Konrad Karczewski Jerzy Wasniewski

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Díaz, D., Pardo, X.C., Martín, M.J., González, P., Rodríguez, G. (2008). CPPC-G: Fault-Tolerant Applications on the Grid. In: Wyrzykowski, R., Dongarra, J., Karczewski, K., Wasniewski, J. (eds) Parallel Processing and Applied Mathematics. PPAM 2007. Lecture Notes in Computer Science, vol 4967. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68111-3_90

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-68111-3_90

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-68105-2

  • Online ISBN: 978-3-540-68111-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics