skip to main content
10.1145/3406085.3409009acmconferencesArticle/Chapter ViewAbstractPublication PagesicfpConference Proceedingsconference-collections
research-article

Teaching practical realistic verification of distributed algorithms in Erlang with TLA+

Published:23 August 2020Publication History

ABSTRACT

Distributed systems are inherently complex as they need to address the interplay between features like communication, concurrency, and failure. Due to the inherent complexity of these interacting features, it is typically not possible to systematically test these kind of systems; yet, unexpected and unlikely combinations of events might cause corner cases that are hard to find. But since these systems are running typically for long durations, these events are likely to materialize eventually and must be handled correctly. Caught in such a dilemma, students are able to experience the benefits of applying verification tools to check their own algorithms and implementations. Having executable models with automatically generated executions allows them to experiment with different solutions by iteratively adapting and refining their algorithms.

In this experience report, we report on our experience of teaching verification in a (hands-on) distributed systems course. We argue that broadcast algorithms provide a sweet spot in design and verification complexity. To this end, we give an implementation of these algorithms in Erlang and derive a TLA+ specification. TLA+ is a formal language for describing and reasoning about distributed and concurrent systems and provides a model checker, TLC, among other things. Our study reveals interesting parallels between the Erlang and TLA+ code, while exposing the challenges of formally modeling communication and parallelism in distributed systems. Presenting selected aspects of our course design, we aim to motivate the feasibility and need for introducing verification in close correspondence to programming tasks.

References

  1. Alzahrani, N., Spichkova, M., Blech, J.O. : From temporal models to property-based testing. In: Damiani, E., Spanoudakis, G., Maciaszek, L.A. (eds.) ENASE 2017-Proceedings of the 12th International Conference on Evaluation of Novel Approaches to Software Engineering, Porto, Portugal, April 28-29, 2017. pp. 241-246. SciTePress ( 2017 ), htps://doi.org/10.5220/0006340302410246Google ScholarGoogle Scholar
  2. Armstrong, J. : Making reliable distributed systems in the presence of software errors. Ph.D. thesis, Royal Institute of Technology, Stockholm, Sweden ( 2003 ), htp://erlang.org/download/armstrong_thesis_2003. pdfGoogle ScholarGoogle Scholar
  3. Birman, K.P., Joseph, T.A. : Reliable communication in the presence of failures. ACM Trans. Comput. Syst. 5 ( 1 ), 47-76 ( 1987 ), htps://doi.org/ 10.1145/7351.7478Google ScholarGoogle Scholar
  4. Cachin, C., Guerraoui, R., Rodrigues, L.E.T. : Introduction to Reliable and Secure Distributed Programming (2. ed.). Springer ( 2011 ), htps: //doi.org/10.1007/978-3-642-15260-3Google ScholarGoogle Scholar
  5. Cassar, I., Francalanza, A. : On synchronous and asynchronous monitor instrumentation for actor-based systems. In: Cámara, J., Proença, J. (eds.) Proceedings 13th International Workshop on Foundations of Coordination Languages and Self-Adaptive Systems, FOCLASA 2014, Rome, Italy, 6th September 2014. EPTCS, vol. 175, pp. 54-68 ( 2014 ), htps://doi.org/10.4204/EPTCS.175.4Google ScholarGoogle Scholar
  6. Défago, X., Schiper, A., Urbán, P. : Total order broadcast and multicast algorithms: Taxonomy and survey. ACM Comput. Surv. 36 ( 4 ), 372-421 ( 2004 ), htps://doi.org/10.1145/1041680.1041682Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Demirbas, M. : My experience with using tla+ in distributed systems class. htp://muratbufalo.blogspot.com/ 2014 /08/using-tla-forteaching-distributed.html, accessed: 2019-06-14Google ScholarGoogle Scholar
  8. Gotovos, A., Christakis, M., Sagonas, K. : Test-driven development of concurrent programs using concuerror. In: Rikitake, K., Stenman, E. (eds.) Proceedings of the 10th ACM SIGPLAN workshop on Erlang, Tokyo, Japan, September 23, 2011. pp. 51-61. ACM ( 2011 ), htps://doi. org/10.1145/2034654.2034664Google ScholarGoogle Scholar
  9. Gotovos, A., Christakis, M., Sagonas, K. : Test-driven development of concurrent programs using concuerror. In: Proceedings of the 10th ACM SIGPLAN Workshop on Erlang. p. 51-61. Erlang ' 11, Association for Computing Machinery, New York, NY, USA ( 2011 ), htps://doi.org/ 10.1145/2034654.2034664Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Hadzilacos, V., Toueg, S.: A modular approach to fault-tolerant broadcasts and related problems. Tech. rep., USA ( 1994 )Google ScholarGoogle Scholar
  11. Jin, X., Li, X., Zhang, H., Foster, N., Lee, J., Soulé, R., Kim, C., Stoica, I. : Netchain: Scale-free sub-rtt coordination. In: 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI 18). pp. 35-49. USENIX Association, Renton, WA ( 2018 ), htps://www.usenix.org/ conference/nsdi18/presentation/jinGoogle ScholarGoogle Scholar
  12. Lamport, L. : The temporal logic of actions. ACM Trans. Program. Lang. Syst. 16 ( 3 ), 872-923 ( 1994 ), htps://doi.org/10.1145/177492.177726Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Lamport, L. : Specifying Systems, The TLA+ Language and Tools for Hardware and Software Engineers. Addison-Wesley ( 2002 )Google ScholarGoogle Scholar
  14. Lamport, L. : Teaching concurrency. SIGACT News 40 ( 1 ), 58-62 ( 2009 ), htps://doi.org/10.1145/1515698.1515713Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Michael, E., Woos, D., Anderson, T., Ernst, M.D., Tatlock, Z. : Teaching rigorous distributed systems with eficient model checking. In: Proceedings of the Fourteenth EuroSys Conference 2019. EuroSys '19, Association for Computing Machinery, New York, NY, USA ( 2019 ), htps://doi.org/10.1145/3302424.3303947Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Mu, S., Nelson, L., Lloyd, W., Li, J. : Consolidating concurrency control and consensus for commits under conflicts. In: 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI'16). pp. 517-532 ( 2016 )Google ScholarGoogle Scholar
  17. Newcombe, C. : Why Amazon chose TLA+. In: Abstract State Machines, Alloy, B, TLA, VDM, and Z. pp. 25-39. Springer ( 2014 ). https://doi.org/10.1007/978-3-662-43652-3 Google ScholarGoogle ScholarCross RefCross Ref
  18. Newcombe, C., Rath, T., Zhang, F., Munteanu, B., Brooker, M., Deardeuf, M. : How Amazon web services uses formal methods. Communications of the ACM 58 ( 4 ), 66-73 ( 2015 )Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Raynal, M. : Fault-Tolerant Message-Passing Distributed Systems-An Algorithmic Approach. Springer ( 2018 ), htps://doi.org/10.1007/978-3-319-94141-7Google ScholarGoogle Scholar
  20. Schneider, F.B. : Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Comput. Surv. 22 ( 4 ), 299-319 ( 1990 ), htps://doi.org/10.1145/98163.98167Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Svensson, H., Fredlund, L., Earle, C.B. : A unified semantics for future erlang. In: Fritchie, S.L., Sagonas, K. (eds.) Proceedings of the 9th ACM SIGPLAN workshop on Erlang, Baltimore, Maryland, USA, September 30, 2010. pp. 23-32. ACM ( 2010 ), htps://doi.org/10.1145/1863509. 1863514Google ScholarGoogle Scholar

Index Terms

  1. Teaching practical realistic verification of distributed algorithms in Erlang with TLA+

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            Erlang 2020: Proceedings of the 19th ACM SIGPLAN International Workshop on Erlang
            August 2020
            52 pages
            ISBN:9781450380492
            DOI:10.1145/3406085

            Copyright © 2020 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 23 August 2020

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate51of68submissions,75%

            Upcoming Conference

            ICFP '24

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader