Skip to main content
Log in

Booting clock synchronization in partially synchronous systems with hybrid process and link failures

  • Published:
Distributed Computing Aims and scope Submit manuscript

Abstract

This paper provides description and analysis of a new clock synchronization algorithm for synchronous and partially synchronous systems with unknown upper and lower bounds on delays. It is purely message-driven, timer-free and relies on a hybrid failure model incorporating both process and link failures, in both time and value domain. Unlike existing solutions, our algorithm works during both system start-up and normal operation: Whereas bounded precision (the mutual deviation of any two clocks) can always be guaranteed, accuracy (clocks being within a linear envelope of real-time) and hence progress is only ensured when sufficiently many correct processes are eventually up and running. By means of a detailed analysis, we provide formulas for resilience, precision and envelope bounds.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Anceaume, E., Puaut, I.: Performance evaluation of clock synchronization algorithms. Tech. Rep. RR-3526, INRIA (1998)

  2. Attiya H. and Welch J. (2004). Distributed Computing, 2nd edn. John Wiley, New York

    Book  Google Scholar 

  3. Azadmanesh M. and Kieckhafer R.M. (2000). Exploiting omissive faults in synchronous approximate agreement. IEEE Trans. Comput. 49(10): 1031–1042

    Article  Google Scholar 

  4. Biely, M., Schmid, U.: Message-efficient consensus in presence of hybrid node and link faults. Tech. Rep. 183/1-116, Department of Automation, Technische Universität Wien (2001) (submitted)

  5. Biely, M., Widder, J.: Optimal message-driven implementations of Omega with mute processes. In: Proceedings of the Eighth International Symposium on Stabilization, Safety, and Security of Distributed Systems (SSS 2006), LNCS, vol. 4280, pp. 110–121. Springer, Dallas, TX, USA (2006)

  6. Chandra T.D., Hadzilacos V. and Toueg S. (1996). The weakest failure detector for solving consensus. J. ACM 43(4): 685–722

    Article  MATH  Google Scholar 

  7. Chandra T.D. and Toueg S. (1996). Unreliable failure detectors for reliable distributed systems. J. ACM 43(2): 225–267

    Article  MATH  Google Scholar 

  8. Charron-Bost, B., Schiper, A.: The heard-of model: Unifying all benign failures. Tech. Rep. LSR-REPORT-2006-004, EPFL (2006)

  9. Charron-Bost, B., Schiper, A.: Improving fast Paxos: being optimistic with no overhead. In: The 12th IEEE International Symposium Pacific Rim Dependable Computing (PRDC’06), pp. 287–295 (2006)

  10. Claesson V., Lönn H. and Suri N. (2004). An efficient TDMA start-up and restart synchronization approach for distributed embedded systems. IEEE Trans. Parallel Distributed Syst. 15(8): 725–739

    Article  Google Scholar 

  11. Cristian F. and Fetzer C. (1999). The timed asynchronous distributed system model. IEEE Trans. Parallel Distributed Syst. 10(6): 642–657

    Article  Google Scholar 

  12. Daliot, A., Dolev, D., Parnas, H.: Linear time Byzantine self-stabilizing clock synchronization. In: Proceedings of the 7th International Conference on Principles of Distributed Systems (2003)

  13. Dolev D., Halpern J.Y. and Strong H.R. (1986). On the possibility and impossibility of achieving clock synchronization. J. Comput. Syst. Sci. 32: 230–250

    Article  MATH  Google Scholar 

  14. Dolev S. and Welch J.L. (2004). Self-stabilizing clock synchronization in the presence of Byzantine faults. J. ACM 51(5): 780–799

    Article  Google Scholar 

  15. Dwork C., Lynch N. and Stockmeyer L. (1988). Consensus in the presence of partial synchrony. J. ACM 35(2): 288–323

    Article  Google Scholar 

  16. Fetzer, C., Cristian, F.: An optimal internal clock synchronization algorithm. In: Proceedings 10th Annual IEEE Conference on Computer Assurance. Gaithersburg, MD (1995)

  17. Fischer, M., Lamport, L.: Byzantine generals and transaction commit protocols. Technical Report 62, SRI International (1982)

  18. Fuegger, M., Schmid, U., Fuchs, G., Kempf, G.: Fault-Tolerant Distributed Clock Generation in VLSI Systems-on-Chip. In: Proceedings of the Sixth European Dependable Computing Conference (EDCC-6), pp. 87–96. IEEE Computer Society Press (2006)

  19. Gafni, E.: Round-by-round fault detectors (extended abstract): unifying synchrony and asynchrony. In: Proceedings of the Seventeenth Annual ACM Symposium on Principles of Distributed Computing, pp. 143–152. ACM Press, Puerto Vallarta, Mexico (1998)

  20. Gray, J.N.: Notes on data base operating systems. In: Bayer, G.S. R., Graham, R.M. (ed.) Operating Systems: An Advanced Course, Lecture Notes in Computer Science, vol. 60, Chap. 3.F, p. 465. Springer, New York (1978)

  21. Hadzilacos, V., Toueg, S.: Fault-tolerant broadcasts and related problems. In: S. Mullender (ed.) Distributed Systems, 2nd edn., chap. 5, pp. 97–145. Addison-Wesley, Reading (1993)

  22. Hermant J.F. and Le Lann G. (2002). Fast asynchronous uniform consensus in real-time distributed systems. IEEE Trans. Comput. 51(8): 931–944

    Article  Google Scholar 

  23. Hermant, J.F., Widder, J.: Implementing reliable distributed real-time systems with the Θ-model. In: Proceedings of the 9th International Conference on Principles of Distributed Systems (OPODIS 2005), LNCS, vol. 3974, pp. 334–350. Springer, Pisa, Italy (2005)

  24. Hutle, M., Widder, J.: On the possibility and the impossibility of message-driven self-stabilizing failure detection. In: Proceedings of the Seventh International Symposium on Self Stabilizing Systems (SSS 2005), LNCS, vol. 3764, pp. 153–170. Springer Verlag, Barcelona, Spain (2005). Appeared also as brief announcement in Proceedings of the 24th ACM Symposium on Principles of Distributed Computing (PODC’05)

  25. Keidar, I., Shraer, A.: Timeliness, failure detectors, and consensus performance. In: Proceedings of the twenty-fifth annual ACM SIGACT-SIGOPS symposium on Principles of Distributed Computing (PODC’06). ACM Press, New York (2006)

  26. Kieckhafer R.M., Walter C.J., Finn A.M. and Thambidurai P.M. (1988). The MAFT architecture for distributed fault tolerance. IEEE Trans. Comput. 37: 398–405

    Article  Google Scholar 

  27. Le Lann, G., Schmid, U.: How to implement a timer-free perfect failure detector in partially synchronous systems. Tech. Rep. 183/1-127, Department of Automation, Technische Universität Wien (2003)

  28. Le Lann, G., Schmid, U.: How to maximize computing systems coverage. Tech. Rep. 183/1-128, Department of Automation, Technische Universität Wien (2003)

  29. Liskov B. (1993). Practical uses of synchronized clocks in distributed systems. Distrib. Comput. 6: 211–219

    Article  MATH  Google Scholar 

  30. Lundelius J. and Lynch N. (1984). An upper and lower bound for clock synchronization. Inf. Control 62: 190–240

    Article  MATH  Google Scholar 

  31. Lundelius-Welch J. and Lynch N.A. (1988). A new fault-tolerant algorithm for clock synchronization. Inf. Comput. 77(1): 1–36

    Article  Google Scholar 

  32. Miner, P.S.: Verification of fault-tolerant clock synchronization systems. NASA Technical Paper 3349 (1993)

  33. Ponzio, S., Strong, R.: Semisynchrony and real time. In: Proceedings of the 6th International Workshop on Distributed Algorithms (WDAG’92), pp. 120–135. Haifa, Israel (1992)

  34. Powell, D.: Failure mode assumptions and assumption coverage. In: Proceedings of the 22nd IEEE International Symposium on Fault-Tolerant Computing (FTCS-22), pp. 386–395. Boston, MA, USA (1992). (Revised version available as LAAS-CNRS Research Report 91462, 1995)

  35. Ramanathan P., Shin K.G. and Butler R.W. (1990). Fault-tolerant clock synchronization in distributed systems. IEEE Comput. 23(10): 33–42

    Google Scholar 

  36. Rushby, J.: A formally verified algorithm for clock sychronization under a hybrid fault model. In: Proceedings ACM Principles of Distributed Computing (PODC’94), pp. 304–313. Los Angeles (1994)

  37. Santoro, N., Widmayer, P.: Time is not a healer. In: Proceedings 6th Annual Symposium on Theoretical Aspects of Computer Science (STACS’89), LNCS 349, pp. 304–313. Springer, Paderborn (1989)

  38. Schmid, U. (ed.): Special issue on the challenge of global time in large-scale distributed real-time systems. J. Real-Time Syst. 12(1–3) (1997)

  39. Schmid U. (2000). Orthogonal accuracy clock synchronization. Chicago J. Theor. Comput. Sci. 2000(3): 3–77

    Google Scholar 

  40. Schmid, U.: How to model link failures: a perception-based fault model. In: Proceedings of the International Conference on Dependable Systems and Networks (DSN’01), pp. 57–66. Göteborg, Sweden (2001)

  41. Schmid, U.: Failure model coverage under transient link failures. Research Report 2/2004, Technische Universität Wien, Institut für Technische Informatik, Treitlstraße 3, A-1040 Vienna, Austria (2004) (submitted)

  42. Schmid, U., Fetzer, C.: Randomized asynchronous consensus with imperfect communications. In: 22nd Symposium on Reliable Distributed Systems (SRDS’03), pp. 361–370. Florence, Italy (2003)

  43. Schmid U. and Schossmaier K. (1997). Interval-based clock synchronization. Real-Time Syst. 12(2): 173–228

    Article  Google Scholar 

  44. Schmid U. and Schossmaier K. (2001). How to reconcile fault-tolerant interval intersection with the Lipschitz condition. Distrib. Comput. 14(2): 101–111

    Article  Google Scholar 

  45. Schmid U. and Schossmaier K. (2003). Interval-based clock synchronization with optimal precision. Inf. Comput. 186(1): 36–77

    Article  MATH  Google Scholar 

  46. Schmid, U., Weiss, B.: Impossibility results and lower bounds for consensus under link failures. Tech. Rep. 183/1-129, Department of Automation, Technische Universität Wien (2003). (submitted)

  47. Schmid, U., Weiss, B., Rushby, J.: Formally verified Byzantine agreement in presence of link faults. In: 22nd International Conference on Distributed Computing Systems (ICDCS’02), pp. 608–616. Vienna, Austria (2002)

  48. Schneider, F.B.: Understanding protocols for Byzantine clock synchronization. Technical Report 87-859, Cornell University, Department of Computer Science (1987)

  49. Simons, B., Lundelius-Welch, J., Lynch, N.: An overview of clock synchronization. In: B. Simons, A. Spector (eds.) Fault-Tolerant Distributed Computing, pp. 84–96. Springer, Berlin (1990). (Lecture Notes on Computer Science 448)

  50. Srikanth T. and Toueg S. (1987). Simulating authenticated broadcasts to derive simple fault-tolerant algorithms. Distrib. Comput. 2: 80–94

    Article  Google Scholar 

  51. Srikanth T.K. and Toueg S. (1987). Optimal clock synchronization. J. ACM 34(3): 626–645

    Article  Google Scholar 

  52. Steiner, W., Paulitsch, M.: The transition from asynchronous to synchronous system operation: An approach for distributed fault-tolerant systems. Proceedings of the The 22nd International Conference on Distributed Computing Systems (2002)

  53. Veríssimo, P., Casimiro, A., Fetzer, C.: The timely computing base: Timely actions in the presence of uncertain timeliness. In: Proceedings IEEE International Conference on Dependable Systems and Networks (DSN’01/FTCS’30), pp. 533–542. New York, (2000)

  54. Walter C.J. and Suri N. (2002). The customizable fault/error model for dependable distributed systems. Theor. Comput. Sci. 290: 1223–1251

    Article  Google Scholar 

  55. Widder, J.: Booting clock synchronization in partially synchronous systems. In: Proceedings of the 17th International Symposium on Distributed Computing (DISC’03), LNCS, vol. 2848, pp. 121–135. Springer, Sorrento (2003)

  56. Widder, J.: Distributed computing in the presence of bounded asynchrony. Ph.D. thesis, Vienna University of Technology, Fakultät für Informatik (2004)

  57. Widder, J., Le Lann, G., Schmid, U.: Failure detection with booting in partially synchronous systems. In: Proceedings of the 5th European Dependable Computing Conference (EDCC-5), LNCS, vol. 3463, pp. 20–37. Springer, Budapest (2005)

  58. Widder, J., Schmid, U.: Achieving synchrony without clocks. Research Report 49/2005, Technische Universität Wien, Institut für Technische Informatik (2005) (submitted)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Josef Widder.

Additional information

This work has been supported by the Austrian START programme Y41-MAT, the BM:vit FIT-IT Embedded Systems project DCBA (proj.no. 808198), and the FWF project Theta (proj.no. P17757-N04).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Widder, J., Schmid, U. Booting clock synchronization in partially synchronous systems with hybrid process and link failures. Distrib. Comput. 20, 115–140 (2007). https://doi.org/10.1007/s00446-007-0026-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00446-007-0026-0

Keywords

Navigation