Abstract
With proof techniques like IC3 and k-induction, model-checking scales further than ever before. Still, fault-tolerant distributed systems are particularly challenging to model-check given their large state spaces and non-determinism. The typical approach to controlling complexity is to construct ad-hoc abstractions of faults, message-passing, and behaviors. However, these abstractions come at the price of divorcing the model from its implementation and making refactoring difficult. In this work, we present a model for fault-tolerant distributed system verification that combines ideas from the literature including calendar automata, symbolic fault injection, and abstract transition systems, and then use it to model-check various implementations of the Hybrid Oral Messages algorithm that differ in the fault model, timing model, and local node behavior. We show that despite being implementation-level models, the verifications are scalable and modular, insofar as isolated changes to an implementation require isolated changes to the model and proofs. This work is carried out in the SAL model-checker.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
There are exceptions; for example, benign faults may be detected by a node itself (e.g., in a built-in-test).
- 3.
References
Bevier, W.R., Young, W.D.: The proof of correctness of a fault-tolerant circuit design. Computational Logic Inc., Technical report 57 (1990). http://computationallogic.com/reports/index.html
Young, W.D.: Comparing verification systems: interactive consistency in ACL2. IEEE Trans. Softw. Eng. 23(4), 214–223 (1997)
Lincoln, P., Rushby, J.: A formally verified algorithm for interactive consistency under a hybrid fault model. In: 23rd Fault Tolerant Computing Symposium, pp. 402–411. IEEE Computer Society (1993)
Owre, S., Rushby, J., Shankar, N., von Henke, F.: Formal verification for fault-tolerant architectures: prolegomena to the design of PVS. IEEE Trans. Software Eng. 21(2), 107–125 (1995)
Chandra, T.D., Griesemer, R., Redstone, J.: Paxos made live: an engineering perspective. In: ACM Symposium on Principles of Distributed Computing (PODC), pp. 398–407. ACM (2007)
Dutertre, B., Sorea, M.: Modeling and verification of a fault-tolerant real-time startup protocol using calendar automata. In: Lakhnech, Y., Yovine, S. (eds.) FORMATS/FTRTFT -2004. LNCS, vol. 3253, pp. 199–214. Springer, Heidelberg (2004). doi:10.1007/978-3-540-30206-3_15
Boyer, R.S., Moore, J.S.: MJRTY-a fast majority vote algorithm. In: Boyer, R.S. (ed.) Automated Reasoning. Automated Reasoning Series, vol. 1, pp. 105–117. Springer, Dordrecht (1991)
Azadmanesh, M.H., Kieckhafer, R.M.: Exploiting omissive faults in synchronous approximate agreement. IEEE Trans. Comput. 49(10), 1031–1042 (2000)
Pike, L., Maddalon, J., Miner, P., Geser, A.: Abstractions for fault-tolerant distributed system verification. In: Slind, K., Bunker, A., Gopalakrishnan, G. (eds.) TPHOLs 2004. LNCS, vol. 3223, pp. 257–270. Springer, Heidelberg (2004). doi:10.1007/978-3-540-30142-4_19
Rushby, J.: SAL tutorial: analyzing the fault-tolerant algorithm OM(1). Computer Science Laboratory, SRI International, Menlo Park, CA, CSL Technical note. http://www.csl.sri.com/users/rushby/abstracts/om1
Thambidurai, P., Park, Y.-K.: Interactive consistency with multiple failure modes. In: Symposium on Reliable Distributed Systems, pp. 93–100. IEEE (1988)
Rushby, J.: Verification diagrams revisited: disjunctive invariants for easy verification. In: Emerson, E.A., Sistla, A.P. (eds.) CAV 2000. LNCS, vol. 1855, pp. 508–520. Springer, Heidelberg (2000). doi:10.1007/10722167_38
Dutertre, B., Sorea, M.: Timed systems in SAL. In: SRI International, Menlo Park, CA, SDL Technical report SRI-SDL-04-03, July 2004
Lamport, L., Shostak, R., Pease, M.: The Byzantine generals problem. ACM Trans. Program. Lang. Syst. 4(3), 382–401 (1982)
Bensalem, S., Ganesh, V., Lakhnech, Y., Muñoz, C., Owre, S., Rueß, H., Rushby, J., Rusu, V., Saïdi, H., Shankar, N., Singerman, E., Tiwari, A.: An overview of SAL. In: NASA Langley Formal Methods Workshop, pp. 187–196 (2000)
Rushby, J.: The versatile synchronous observer. In: Iida, S., Meseguer, J., Ogata, K. (eds.) Specification, Algebra, and Software. LNCS, vol. 8373, pp. 110–128. Springer, Heidelberg (2014). doi:10.1007/978-3-642-54624-2_6
Kopetz, H.: Real-Time Systems: Design Principles for Distributed Embedded Applications. Kluwer, Philadelphia (1997)
Javanović, D., Dutertre, B.: Property-directed \(k\)-induction. In: Formal Methods in Computer Aided Design (FMCAD) (2016)
Bokor, P., Serafini, M., Suri, N.: On efficient models for model checking message-passing distributed protocols. In: Hatcliff, J., Zucca, E. (eds.) FMOODS/FORTE -2010. LNCS, vol. 6117, pp. 216–223. Springer, Heidelberg (2010). doi:10.1007/978-3-642-13464-7_17
Acknowledgments
This work is partially supported by NASA contract #NNL14AA08C. We are indebted to our collaborators Brendan Hall and Srivatsan Varadarajan at Honeywell Labs, and to Wilfredo Torres-Pomales at NASA Langley for their discussions and insights. Additionally, we acknowledge that this work is heavily inspired by a series of papers authored by John Rushby.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Jones, B.F., Pike, L. (2017). Modular Model-Checking of a Byzantine Fault-Tolerant Protocol. In: Barrett, C., Davies, M., Kahsai, T. (eds) NASA Formal Methods. NFM 2017. Lecture Notes in Computer Science(), vol 10227. Springer, Cham. https://doi.org/10.1007/978-3-319-57288-8_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-57288-8_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-57287-1
Online ISBN: 978-3-319-57288-8
eBook Packages: Computer ScienceComputer Science (R0)