Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Bibliography
Vadhiyar S, Dongarra J (2003) SRS – a framework for developing malleable and migratable parallel software. Parallel Process Lett 13(2):291–312
Beck M, Plank JS, Kingsley G, Kingsley G (1994) Compiler-assisted checkpointing. In: Technical report CS-94-269, department of computer science, University of Tennessee, Knoxville, December 1994
Chung chi Jim Li, Stewart EM, Fuchs WK (1994) Compiler-assisted full checkpointing. Pract Exper 24(10):871–886
University of Mannheim, University of Tennessee, and NERSC/LBNL. TOP500 Supercomputing Sites. http://www.top500.org/
Lawrence Livermore National Laboratory. NNSA awards IBM contract to build next generation supercomputer, press release. https://publica _airs.llnl.gov/news/newsreleases/2009/NR-09-02-01.html. Accessed Feb 2009
Bronevetsky G, Pingali K, Stodghill P (2006) Experimental evaluation of application-level checkpointing for OpenMP programs. In: International conference on supercomputing (ICS), Queensland, June 2006
Chandy M, Lamport L (1985) Distributed snapshots: determining global states of distributed systems. ACM Transact Comput Syst 3(1):63–75
Schulz M, Bronevetsky G, Fernandes R, Marques D, Pingali K, Stodghil l P (2004) Implementation and evaluation of a scalable application-level checkpoint-recovery scheme for MPI programs. In: Proceedings of IEEE/ACM supercomputing ’04, Washington, DC, November 2004
Silva LM, Silva JG (1998) An experimental study about diskless checkpointing. EUROMICRO Conf 1:10395
Plank JS, Li K, Puening MA (1998) Diskless checkpointing. IEEE Trans Parallel Distrib Syst 9(10):972–986
Zheng G, Shi L, Kale LV (2004) FTC-Charm++: an In-Memory checkpoint-based fault tolerant runtime for Charm + + and MPI. In: 2004 IEEE international conference on cluster computing, pp 93–103, San Diego, September 2004
Moody A, Bronevetsky G, Mohror K, de Supinski BR (2010) Design, modeling, and evaluation of a scalable multi-level checkpointing system. In: Proceedings of IEEE/ACM supercomputing ’10, New Orleans, LA, 2010
Agarwal S, Garg R, Gupta MS, Moreira JE (2004) Adaptive incremental checkpointing for massively parallel systems. In: ICS ’04: proceedings of the 18th annual international conference on supercomputing. ACM, New York, pp 277–286
Sancho JC, Petrini F, Johnson G, Fernndez J, Frachtenberg E (2004) On the feasibility of incremental checkpointing for scientific computing. Parallel Distrib Process Symp Int 1:58b
Litzkow JBM, Tannenbaum T, Livny M2 (1997). Checkpoint and migration of UNIX processes in the condor distributed processing system. In: Technical report 1346, University of Wisconsin, Madison, 1997
CHARM research group. http://charm.cs.uiuc.edu/
Kale LV, Krishnan S (1993) CHARM++: a portable concurrent object oriented system based on C++. Parallel Process Lett 28(10):91–108
Elnozahy M, Alvisi L, Wang YM, Johnson DB (1996) A survey of rollback-recovery protocols in message passing systems. In: Technical report CMU-CS-96-181, school of computer science, Carnegie Mellon University, Pittsburgh, October 1996
Librato. Availability Services (AvS). http://www.librato.com/products/availability.services
Plank JS, Beck M, Kingsley G, Li K (1994) Libckpt: transparent checkpointing under UNIX. In: Technical report UT-CS-94-242, Department of Computer Science, University of Tennessee, Princeton University
Duell J The design and implementation of Berkeley lab’s linux checkpoint/restart. http://www.nersc.gov/research/FTG/checkpoint/reports.html
Stellner G (1996) CoCheck: checkpointing and process migration for MPI. In: Proceedings of the 10th international parallel processing symposium (IPPS ’96), Honolulu, 1996
Bouteiller A, Cappello F, Herault T, Krawezik G, Lemarnier P, Magniette F (2003) MPICH-V2: a fault tolerant MPI for volatile nodes based on pessimistic sender based message logging. In: Proceedings of IEEE/ACM supercomputing ’03, Phoenix, November 2003
Wang YM, Fuchs WK (1992) Optimistic message logging for independent checkpointing in message-passing systems. In: Proceedings of the 11th symposium on reliable distributed systems, Houston, October 1992, pp 147–154
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media, LLC
About this entry
Cite this entry
Schulz, M. (2011). Checkpointing. In: Padua, D. (eds) Encyclopedia of Parallel Computing. Springer, Boston, MA. https://doi.org/10.1007/978-0-387-09766-4_62
Download citation
DOI: https://doi.org/10.1007/978-0-387-09766-4_62
Publisher Name: Springer, Boston, MA
Print ISBN: 978-0-387-09765-7
Online ISBN: 978-0-387-09766-4
eBook Packages: Computer ScienceReference Module Computer Science and Engineering