Abstract
The simple checkpointing and migration system for UNIX processes as described in the article of Bozyigit and Wasiq [1] can be improved in two ways: First by a technique to checkpoint and migrate applications without the need to recompile them and second by an alternative approach to precisely locate all the data segments of a process that need to be checkpointed. We fully acknowledge the difficulty to do checkpointing or even portable checkpointing for the general case of processes and do not claim to solve the many remaining problems with the simplistic checkpointing and migration approaches presented in the earlier article. Still we are aware of many systems and applications where a simple solution is extremely helpful once it also works with binaries.
- M. Bozyigit and M. Wasiq. User-Level Process Checkpoint and Restore for Migration. Operating Systems Review, 35(2):86-95, 2001. Google ScholarDigital Library
- Balkrishna Ramkumar and Volker Strumpen. Portable Checkpointing for Heterogeneous Architectures. In In 27th International Symposium on Fault-Tolerant Computing --- Digest of Papers, pages 58-67, April 1997. Google ScholarDigital Library
- F. Rauch. Porting ckpt_lib to different UNIX operating systems. Internal report, ISE Integrated Systems Engineering, Zürich, Switzerland, October 1996.Google Scholar
- Felix Rauch, Christian Kurmann, Blanca Maria Müller-Lagunez, and Thomas M. Stricker. Patagonia --- A Dual Use Cluster of PCs for Computation and Education. In 2. Workshop Cluster Computing, Karlsruhe, pages 65-75, March 1999.Google Scholar
Index Terms
- Comments on "transparent user-level process checkpoint and restore for migration" by Bozyigit and Wasiq
Recommendations
User-level process checkpoint and restore for migration
In simple words, process checkpointing means saving the state of a process, so that, it can be reconstructed in the future. Checkpointing followed by restore is important for the purpose of load balancing and fault tolerance. For load balancing, ...
Process Migration for MPI Applications based on Coordinated Checkpoint
ICPADS '05: Proceedings of the 11th International Conference on Parallel and Distributed Systems - Volume 01A lot of research has been done on faulttolerance for MPI applications, some on checkpoint/restart, and some on network faulttolerance. Process migration, however, has not gained widespread use due to the additional complexity of the requirement that ...
Comments