Abstract
Unreliable failure detectors are a well known means to enrich asynchronous distributed systems with time-free semantics that allow to solve consensus in the presence of crash failures. Implementing unreliable failure detectors requires a system that provides some synchrony, typically an upper bound on end-to-end message delays. Recently, we introduced an implementation of the perfect failure detector in a novel partially synchronous model, referred to as the Θ-Model, where only the ratio Θ of maximum vs. minimum end-to-end delay of messages that are simultaneously in transit must be known a priori (while the actual delays need not be known and not even be bounded). In this paper, we present an alternative failure detector algorithm, which is based on a clock synchronization algorithm for the Θ-Model. It not only surpasses our first implementation with respect to failure detection time, but also works during the system booting phase.
Supported by the Austrian START program Y41-MAT, the BM:vit FIT-IT project DCBA (proj. no. 808198), and by the FWF project Theta (proj. no. P17757-N04).
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Hermant, J.F., Le Lann, G.: Fast asynchronous uniform consensus in real-time distributed systems. IEEE Transactions on Computers 51, 931–944 (2002)
Le Lann, G., Schmid, U.: How to maximize computing systems coverage. Technical Report 183/1-128, Department of Automation, Technische Universität Wien (2003)
Fischer, M.J., Lynch, N.A., Paterson, M.S.: Impossibility of distributed consensus with one faulty processor. Journal of the ACM 32, 374–382 (1985)
Dolev, D., Dwork, C., Stockmeyer, L.: On the minimal synchronism needed for distributed consensus. Journal of the ACM 34, 77–97 (1987)
Chandra, T.D., Toueg, S.: Unreliable failure detectors for reliable distributed systems. Journal of the ACM 43, 225–267 (1996)
Chandra, T.D., Hadzilacos, V., Toueg, S.: The weakest failure detector for solving consensus. Journal of the ACM 43, 685–722 (1996)
Le Lann, G., Schmid, U.: How to implement a timer-free perfect failure detector in partially synchronous systems. Technical Report 183/1-127, Department of Automation, Technische Universität Wien (2003)
Larrea, M., Fernandez, A., Arevalo, S.: On the implementation of unreliable failure detectors in partially synchronous systems. IEEE Transactions on Computers 53, 815–828 (2004)
Dwork, C., Lynch, N., Stockmeyer, L.: Consensus in the presence of partial synchrony. Journal of the ACM 35, 288–323 (1988)
Widder, J.: Distributed Computing in the Presence of Bounded Asynchrony. PhD thesis, Vienna University of Technology, Fakultät für Informatik (2004)
Larrea, M., Fernández, A., Arévalo, S.: On the impossibility of implementing perpetual failure detectors in partially synchronous systems. In: Proceedings of the 10th Euromicro Workshop on Parallel, Distributed and Network-based Processing (PDP 2002), Gran Canaria Island, Spain (2002)
Widder, J.: Booting clock synchronization in partially synchronous systems. In: Proceedings of the 17th International Symposium on Distributed Computing (DISC 2003), Sorrento, Italy, vol. 2848, pp. 121–135. Springer, Heidelberg (2003)
Widder, J., Schmid, U.: Booting clock synchronization in partially synchronous systems with hybrid node and link failures. Technical Report 183/1-126, Department of Automation, Technische Universität Wien (2003) (submitted for publication)
Srikanth, T.K., Toueg, S.: Optimal clock synchronization. Journal of the ACM 34, 626–645 (1987)
Dolev, D., Friedman, R., Keidar, I., Malkhi, D.: Failure detectors in omission failure environments. In: Proc. 16th ACM Symposium on Principles of Distributed Computing, Santa Barbara, California, p. 286 (1997)
Malkhi, D., Reiter, M.: Unreliable intrusion detection in distributed computations. In: Proceedings of the 10th Computer Security Foundations Workshop (CSFW 1997), Rockport, MA, USA, pp. 116–124 (1997)
Kihlstrom, K.P., Moser, L.E., Melliar-Smith, P.M.: Solving consensus in a byzantine environment using an unreliable fault detector. In: Proceedings of the International Conference on Principles of Distributed Systems (OPODIS), Chantilly, France, pp. 61–75 (1997)
Doudou, A., Garbinato, B., Guerraoui, R., Schiper, A.: Muteness failure detectors: Speci¯cation and implementation. In: Hlavicka, J., Maehle, E., Pataricza, A. (eds.) EDDC 1999. LNCS, vol. 1667, pp. 71–87. Springer, Heidelberg (1999)
Doudou, A., Garbinato, B., Guerraoui, R.: Encapsulating failure detection: From crash to byzantine failures. In: Blieberger, J., Strohmeier, A. (eds.) Ada-Europe 2002. LNCS, vol. 2361, pp. 24–50. Springer, Heidelberg (2002)
Basu, A., Charron-Bost, B., Toueg, S.: Simulating reliable links with unreliable links in the presence of process crashes. In: Babaoğlu, Ö., Marzullo, K. (eds.) WDAG 1996. LNCS, vol. 1151, pp. 105–122. Springer, Heidelberg (1996)
Liu, J.W.S.: Real-Time Systems. Prentice Hall, Englewood Cliffs (2000)
Stankovic, J.A., Spuri, M., Ramamritham, K., Buttazzo, G.C.: Deadline Scheduling for Real-Time Systems. Kluwer Academic Publishers, Dordrecht (1998)
Albeseder, D.: Experimentelle Verifikation von Synchronitäatsannahmen für Computernetzwerke. Diplomarbeit, Embedded Computing Systems Group, Technische Universitäat Wien (2004) (in German)
Hadzilacos, V., Toueg, S.: Fault-tolerant broadcasts and related problems. In: Mullender, S. (ed.) Distributed Systems, 2nd edn., pp. 97–145. Addison-Wesley, Reading (1993)
Schmid, U., Fetzer, C.: Randomized asynchronous consensus with imperfect communications. In: 22nd Symposium on Reliable Distributed Systems (SRDS 2003), Florence, Italy, pp. 361–370 (2003)
Le Lann, G.: On real-time and non real-time distributed computing. In: Helary, J.-M., Raynal, M. (eds.) WDAG 1995. LNCS, vol. 972, pp. 51–70. Springer, Heidelberg (1995)
Aguilera, M.K., Chen, W., Toueg, S.: Failure detection and consensus in the crashrecovery model. Distributed Computing 13, 99–125 (2000)
Cristian, F., Fetzer, C.: The timed asynchronous distributed system model. IEEE Transactions on Parallel and Distributed Systems 10, 642–657 (1999)
Veríssimo, P., Casimiro, A., Fetzer, C.: The timely computing base: Timely actions in the presence of uncertain timeliness. In: Proceedings IEEE International Conference on Dependable Systems and Networks (DSN’01 / FTCS’30), New York City, USA, pp. 533–542 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Widder, J., Le Lann, G., Schmid, U. (2005). Failure Detection with Booting in Partially Synchronous Systems. In: Dal Cin, M., Kaâniche, M., Pataricza, A. (eds) Dependable Computing - EDCC 5. EDCC 2005. Lecture Notes in Computer Science, vol 3463. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11408901_3
Download citation
DOI: https://doi.org/10.1007/11408901_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25723-3
Online ISBN: 978-3-540-32019-7
eBook Packages: Computer ScienceComputer Science (R0)