Brief Announcement: Design and Verification of a Logless Dynamic Reconfiguration Protocol in MongoDB Replication

Authors William Schultz, Siyuan Zhou, Stavros Tripakis



PDF
Thumbnail PDF

File

LIPIcs.DISC.2021.61.pdf
  • Filesize: 0.62 MB
  • 4 pages

Document Identifiers

Author Details

William Schultz
  • Northeastern University, Boston, MA, USA
Siyuan Zhou
  • MongoDB, New York, NY, USA
Stavros Tripakis
  • Northeastern University, Boston, MA, USA

Cite AsGet BibTex

William Schultz, Siyuan Zhou, and Stavros Tripakis. Brief Announcement: Design and Verification of a Logless Dynamic Reconfiguration Protocol in MongoDB Replication. In 35th International Symposium on Distributed Computing (DISC 2021). Leibniz International Proceedings in Informatics (LIPIcs), Volume 209, pp. 61:1-61:4, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2021)
https://doi.org/10.4230/LIPIcs.DISC.2021.61

Abstract

We introduce a novel dynamic reconfiguration protocol for the MongoDB replication system that extends and generalizes the single server reconfiguration protocol of the Raft consensus algorithm. Our protocol decouples the processing of configuration changes from the main database operation log, which allows reconfigurations to proceed in cases when the main log is prevented from processing new operations. Additionally, this decoupling allows for configuration state to be managed by a logless replicated state machine, storing only the latest version of the configuration and avoiding the complexities of a log-based protocol. We present a formal specification of the protocol in TLA+, initial verification results of model checking its safety properties, and an experimental evaluation of how reconfigurations are able to quickly restore a system to healthy operation when node failures have stalled the main operation log. This announcement is a short version and the full paper is available at [Schultz et al., 2021].

Subject Classification

ACM Subject Classification
  • Information systems → Parallel and distributed DBMSs
  • Software and its engineering → Software verification
Keywords
  • Reconfiguration
  • Consensus
  • State Machine Replication

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Marcos Aguilera, Idit Keidar, Dahlia Malkhi, Jean-Philippe Martin, and Alexander Shraer. Reconfiguring Replicated Atomic Storage: A Tutorial. Bulletin of the European Association for Theoretical Computer Science EATCS, 2010. Google Scholar
  2. Tushar D Chandra, Robert Griesemer, and Joshua Redstone. Paxos Made Live: An Engineering Perspective. In Proceedings of the Twenty-Sixth Annual ACM Symposium on Principles of Distributed Computing, PODC '07, pages 398-407, New York, NY, USA, 2007. Association for Computing Machinery. URL: https://doi.org/10.1145/1281100.1281103.
  3. Kaustuv Chaudhuri, Damien Doligez, Leslie Lamport, and Stephan Merz. Verifying safety properties with the tla+ proof system. In International Joint Conference on Automated Reasoning, pages 142-148. Springer, 2010. Google Scholar
  4. Dongxu Huang, Qi Liu, Qiu Cui, Zhuhe Fang, Xiaoyu Ma, Fei Xu, Li Shen, Liu Tang, Yuxing Zhou, Menglong Huang, Wan Wei, Cong Liu, Jian Zhang, Jianjun Li, Xuelian Wu, Lingyu Song, Ruoxi Sun, Shuaipeng Yu, Lei Zhao, Nicholas Cameron, Liquan Pei, and Xin Tang. TiDB: a Raft-based HTAP database. Proceedings of the VLDB Endowment, 2020. URL: https://doi.org/10.14778/3415478.3415535.
  5. Leslie Lamport. The Part-Time Parliament. ACM Transactions on Computer Systems, 1998. URL: https://doi.org/10.1145/279227.279229.
  6. Leslie Lamport. Specifying Systems: The TLA+ Language and Tools for Hardware and Software Engineers. Addison-Wesley, June 2002. Google Scholar
  7. Leslie Lamport, Dahlia Malkhi, and Lidong Zhou. Stoppable paxos. TechReport, Microsoft Research, 2008. Google Scholar
  8. Haojun Ma, Aman Goel, Jean Baptiste Jeannin, Manos Kapritsos, Baris Kasikci, and Karem A. Sakallah. I4: Incremental inference of inductive invariants for verification of distributed protocols. In SOSP 2019 - Proceedings of the 27th ACM Symposium on Operating Systems Principles, 2019. URL: https://doi.org/10.1145/3341301.3359651.
  9. MongoDB Github Project, 2021. URL: https://github.com/mongodb/mongo.
  10. Diego Ongaro. Bug in single-server membership changes, July 2015. URL: https://groups.google.com/g/raft-dev/c/t4xj6dJTP6E/m/d2D9LrWRza8J.
  11. Diego Ongaro and John Ousterhout. In Search of an Understandable Consensus Algorithm. In Proceedings of the 2014 USENIX Conference on USENIX Annual Technical Conference, USENIX ATC'14, pages 305-320, USA, 2014. USENIX Association. Google Scholar
  12. Amir Pnueli. The temporal logic of programs. In 18th Annual Symposium on Foundations of Computer Science (sfcs 1977), pages 46-57. IEEE, 1977. Google Scholar
  13. Fred B. Schneider. Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial. ACM Computing Surveys (CSUR), 1990. URL: https://doi.org/10.1145/98163.98167.
  14. William Schultz. MongoRaftReconfig TLA+ Specification, 2021. URL: https://github.com/will62794/logless-reconfig/blob/3d6b378eae7dabe3a35a0b4042bfc55fd178cb21/specs/MongoRaftReconfig.tla.
  15. William Schultz, Tess Avitabile, and Alyson Cabral. Tunable consistency in mongodb. Proceedings of the VLDB Endowment, 12(12):2071-2081, 2019. Google Scholar
  16. William Schultz, Siyuan Zhou, and Stavros Tripakis. Design and verification of a logless dynamic reconfiguration protocol in mongodb replication. arXiv preprint, 2021. URL: http://arxiv.org/abs/2102.11960.
  17. Alexander Shraer, Benjamin Reed, Dahlia Malkhi, and Flavio Junqueira. Dynamic reconfiguration of primary/backup clusters. In Proceedings of the 2012 USENIX Annual Technical Conference, USENIX ATC 2012, 2019. Google Scholar
  18. Rebecca Taft, Irfan Sharif, Andrei Matei, Nathan VanBenschoten, Jordan Lewis, Tobias Grieger, Kai Niemi, Andy Woods, Anne Birzin, Raphael Poss, Paul Bardea, Amruta Ranade, Ben Darnell, Bram Gruneir, Justin Jaffray, Lucy Zhang, and Peter Mattis. CockroachDB: The Resilient Geo-Distributed SQL Database. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, SIGMOD '20, pages 1493-1509, New York, NY, USA, 2020. Association for Computing Machinery. URL: https://doi.org/10.1145/3318464.3386134.
  19. Yuan Yu, Panagiotis Manolios, and Leslie Lamport. Model checking tla+ specifications. In Advanced Research Working Conference on Correct Hardware Design and Verification Methods, pages 54-66. Springer, 1999. Google Scholar
  20. Jianjun Zheng, Qian Lin, Jiatao Xu, Cheng Wei, Chuwei Zeng, Pingan Yang, and Yunfan Zhang. PaxosStore: High-availability storage made practical in WeChat. In Proceedings of the VLDB Endowment, 2017. URL: https://doi.org/10.14778/3137765.3137778.
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail