Abstract
We investigate the problem of detecting termination of a distributed computation in an asynchronous message-passing system where processes may crash and recover. We show that it is impossible to solve the termination detection problem in this model. We identify necessary and sufficient conditions under which it is possible to solve the stabilizing version of the problem in which a termination detection algorithm is allowed to make finite number of mistakes. Finally, we present an algorithm to solve the stabilizing termination detection problem under these conditions.
Chapter PDF
Similar content being viewed by others
Keywords
References
Tel, G.: Distributed Control for AI. Technical Report UU-CS-1998-17, Information and Computing Sciences, Utrecht University, The Netherlands (1998)
Dijkstra, E.W., Scholten, C.S.: Termination Detection for Diffusing Computations. Information Processing Letters (IPL) 11(1), 1–4 (1980)
Francez, N.: Distributed Termination. ACM Transactions on Programming Languages and Systems (TOPLAS) 2(1), 42–55 (1980)
Mattern, F.: Algorithms for Distributed Termination Detection. Distributed Computing (DC) 2(3), 161–175 (1987)
Mattern, F.: Global Quiescence Detection based on Credit Distribution and Recovery. Information Processing Letters (IPL) 30(4), 195–200 (1989)
Mittal, N., Venkatesan, S., Peri, S.: Message-Optimal and Latency-Optimal Termination Detection Algorithms for Arbitrary Topologies. In: Proceedings of the 18th Symposium on Distributed Computing (DISC), Amsterdam, The Netherlands, pp. 290–304 (October 2004)
Venkatesan, S.: Reliable Protocols for Distributed Termination Detection. IEEE Transactions on Reliability 38(1), 103–110 (1989)
Lai, T.H., Wu, L.F.: An (N − 1)-Resilient Algorithm for Distributed Termination Detection. IEEE Transactions on Parallel and Distributed Systems (TPDS) 6(1), 63–78 (1995)
Tseng, Y.C.: Detecting Termination by Weight-Throwing in a Faulty Distributed System. Journal of Parallel and Distributed Computing (JPDC) 25(1), 7–15 (1995)
Hélary, J.M., Murfin, M., Mostefaoui, A., Raynal, M., Tronel, F.: Computing Global Functions in Asynchronous Distributed Systems with Perfect Failure Detectors. In: IEEE Transactions on Parallel and Distributed Systems (TPDS), September 2000, vol. 11(9), pp. 897–909 (2000)
Gärtner, F.C., Pleisch, S. (Im)Possibilities of Predicate Detection in Crash-Affected Systems. In: Proceedings of the 5th Workshop on Self-Stabilizing Systems (WSS),, Lisbon, Portugal, October 2001, pp. 98–113 (2001)
Mittal, N., Freiling, F.C., Venkatesan, S., Penso, L.D.: Efficient Reduction for Wait-Free Termination Detection in a Crash-Prone Distributed System. In: Fraigniaud, P. (ed.) DISC 2005. LNCS, vol. 3724, pp. 93–107. Springer, Heidelberg (2005)
Aguilera, M.K., Chen, W., Toueg, S.: Failure Detection and Consensus in the Crash Recovery Model. Distributed Computing (DC) 13(2), 99–125 (2000)
Boichat, R., Guerraoui, R.: Reliable and Total Order Broadcast in the Crash-Recovery Model. Journal of Parallel and Distributed Computing (JPDC) 65(4), 397–413 (2005)
Rodrigues, L., Raynal, M.: Atomic Broadcast in Asynchronous Crash-Recovery Distributed Systems and its Use in Quorum-Based Replication. IEEE Transactions on Knowledge and Data Engineering 15(5), 1206–1217 (2003)
Freiling, F., Majuntke, M., Mittal, N.: Termination Detection in an Asynchronous Distributed System with Crash-Recovery Failures. Technical report, TR-2006-008, University of Mannheim (2006)
Delporte-Gallet, C., Fauconnier, H., Guerraoui, R.: A Realistic Look At Failure Detectors. In: Proceedings of the IEEE International Conference on Dependable Systems and Networks (DSN), Washington, DC, USA, pp. 345–353 (2002)
Chandra, T.D., Toueg, S.: Unreliable Failure Detectors for Reliable Distributed Systems. Journal of the ACM 43(2), 225–267 (1996)
Mattern, F.: Virtual Time and Global States of Distributed Systems. In: Parallel and Distributed Algorithms: Proceedings of the Workshop on Distributed Algorithms (WDAG), pp. 215–226 (1989)
Fidge, C.J.: Logical Time in Distributed Computing Systems. IEEE Computer 24(8), 28–33 (1991)
Mittal, N., Phaneesh, K.L., Freiling, F.C.: Safe Termination Detection in an Asynchronous Distributed System when Processes may Crash and Recover. In: Shvartsman, A.A. (ed.) OPODIS 2006. LNCS, vol. 4305, pp. 126–141. Springer, Heidelberg (2006)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Freiling, F.C., Majuntke, M., Mittal, N. (2007). On Detecting Termination in the Crash-Recovery Model. In: Kermarrec, AM., Bougé, L., Priol, T. (eds) Euro-Par 2007 Parallel Processing. Euro-Par 2007. Lecture Notes in Computer Science, vol 4641. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74466-5_66
Download citation
DOI: https://doi.org/10.1007/978-3-540-74466-5_66
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74465-8
Online ISBN: 978-3-540-74466-5
eBook Packages: Computer ScienceComputer Science (R0)