Abstract
The traditional sender-based message logging protocols use a garbage collection algorithm to result in a large number of additional messages and forced checkpoints. So, in our previous work, an algorithm was introduced to allow each process to autonomously remove useless log information in its volatile storage by piggybacking only some additional information without requiring any extra message and forced checkpoint. However, even after a process has executed the algorithm, its storage buffer may still be overloaded in some communication and checkpointing patterns. This paper proposes a new garbage collection algorithm CCPNA for sender-based message logging to address all the problems mentioned above. The algorithm considerably reduces the number of processes to participate in the garbage collection by using the size of the log information of each process. Thus, CCPNA incurs more additional messages and forced checkpoints than our previous algorithm. However, it can avoid the risk of overloading the storage buffers regardless of the specific checkpointing and communication patterns. Also, CCPNA reduces the number of additional messages and forced checkpoints compared with the traditional algorithm.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ahn, J.: An Efficient Algorithm for Removing Useless Logged Messages in SBML Protocols. In: Chakraborty, G. (ed.) ICDCIT 2005. LNCS, vol. 3816, pp. 166–171. Springer, Heidelberg (2005)
Bouteiller, A., Cappello, F., Hérault, T., Krawezik, G., Lemarinier, P., Magniette, F.: MPICH-V2: a Fault Tolerant MPI for Volatile Nodes based on Pessimistic Sender Based Message Logging. In: Proc. of the 15th International Conference on High Performance Networking and Computing (SC 2003) (November 2003)
Chandy, K.M., Lamport, L.: Distributed Snapshots: Determining Global States of Distributed Systems. ACM Transactions on Computer Systems 3(1), 63–75 (1985)
Johnson, D.B., Zwaenpoel, W.: Sender-Based Message Logging. In: Digest of Papers: 17th International Symposium on Fault-Tolerant Computing, pp. 14–19 (1987)
Elnozahy, E.N., Alvisi, L., Wang, Y.M., Johnson, D.B.: A Survey of Rollback-Recovery Protocols in Message-Passing Systems. ACM Computing Surveys 34(3), 375–408 (2002)
Lamport, L.: Time, Clocks, and the Ordering of Events in a Distributed System. Communications of the ACM 21, 558–565 (1978)
McNab, R., Howell, F.W.: simjava: a discrete event simulation package for Java with applications in computer systems modelling. In: Proc. First International Conference on Web-based Modelling and Simulation (1998)
Powell, M.L., Presotto, D.L.: Publishing: A reliable broadcast communication mechanism. In: Proc. of the 9th International Symposium on Operating System Principles, pp. 100–109 (1983)
Sens, P., Folliot, B.: The STAR Fault Tolerant manager for Distributed Operating Environments. Software Practice and Experience 28(10), 1079–1099 (1998)
Schlichting, R.D., Schneider, F.B.: Fail-stop processors: an approach to designing fault-tolerant distributed computing systems. ACM Transactions on Computer Systems 1, 222–238 (1985)
Strom, R.E., Bacon, D.F., Yemeni, S.A.: Volatile Logging in n-Fault-Tolerant Distributed Systems. In: Digest of Papers: the 18th International Symposium on Fault-Tolerant Computing, pp. 44–49 (1988)
Strom, R.E., Yemeni, S.A.: Optimistic recovery in distributed systems. ACM Transactions on Computer Systems 3, 204–226 (1985)
Xu, J., Netzer, R.B., Mackey, M.: Sender-based message logging for reducing rollback propagation. In: Proc. of the 7th International Symposium on Parallel and Distributed Processing, pp. 602–609 (1995)
Yao, B., Ssu, K.-F., Fuchs, W.K.: Message Logging in Mobile Computing. In: Proc. of the 29th International Symposium on Fault-Tolerant Computing, pp. 14–19 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ahn, J. (2006). Checkpointing and Communication Pattern-Neutral Algorithm for Removing Messages Logged by Senders. In: Gerndt, M., Kranzlmüller, D. (eds) High Performance Computing and Communications. HPCC 2006. Lecture Notes in Computer Science, vol 4208. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11847366_8
Download citation
DOI: https://doi.org/10.1007/11847366_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-39368-9
Online ISBN: 978-3-540-39372-6
eBook Packages: Computer ScienceComputer Science (R0)