skip to main content
tutorial

Paxos Made Moderately Complex

Published: 17 February 2015 Publication History

Abstract

This article explains the full reconfigurable multidecree Paxos (or multi-Paxos) protocol. Paxos is by no means a simple protocol, even though it is based on relatively simple invariants. We provide pseudocode and explain it guided by invariants. We initially avoid optimizations that complicate comprehension. Next we discuss liveness, list various optimizations that make the protocol practical, and present variants of the protocol.

References

[1]
Peter Alvaro, Tyson Condie, Neil Conway, Joseph M. Hellerstein, and Russell Sears. 2010. I do declare: Consensus in a logic language. SIGOPS Operating Systems Review 43, 4, 25--30.
[2]
Hagit Attiya, Amotz Bar-Noy, and Danny Dolev. 1995. Sharing memory robustly in message-passing systems. Journal of the ACM 42, 1, 124--142.
[3]
Romain Boichat, Partha Dutta, Svend Frølund, and Rachid Guerraoui. 2003. Deconstructing Paxos. SIGACT News 34, 1, 47--67.
[4]
William J. Bolosky, Dexter Bradshaw, Randolph B. Haagens, Norbert P. Kusters, and Peng Li. 2011. Paxos replicated state machines as the basis of a high-performance data store. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation (NSDI'11). 141--154.
[5]
Navin Budhiraja, Keith Marzullo, Fred B. Schneider, and Sam Toueg. 1993. The primary-backup approach. In Distributed Systems (2nd ed.), S. Mullender (Ed.). ACM Press/Addison-Wesley, New York, NY, 199--216.
[6]
Mike Burrows. 2006. The chubby lock service for loosely-coupled distributed systems. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI'06). 335--350.
[7]
Tushar Deepak Chandra and Sam Toueg. 1991. Unreliable failure detectors for asynchronous systems (preliminary version). In Proceedings of the 10th Annual ACM Symposium on Principles of Distributed Computing (PODC'91). ACM, New York, NY, 325--340.
[8]
Roberto De Prisco, Butler W. Lampson, and Nancy Lynch. 2000. Revisiting the Paxos algorithm. Theoretical Computer Science 243, 1--2, 35--91.
[9]
Michael J. Fischer, Nancy A. Lynch, and Michael S. Paterson. 1985. Impossibility of distributed consensus with one faulty process. Journal of the ACM 32, 2, 374--382.
[10]
Eli Gafni and Leslie Lamport. 2003. Disk Paxos. Distributed Computing 16, 1, 1--20.
[11]
Cary G. Gray and David R. Cheriton. 1989. Leases: An efficient fault-tolerant mechanism for distributed file cache consistency. In Proceedings of the 12th ACM Symposium on Operating Systems Principles (SOSP'89). ACM, New York, NY, 202--210.
[12]
Jonathan Kirsch and Yair Amir. 2008a. Paxos for System Builders. Technical Report CNDS-2008-2. Johns Hopkins University, Baltimore, MD.
[13]
Jonathan Kirsch and Yair Amir. 2008b. Paxos for system builders: An overview. In Proceedings of the 2nd Workshop on Large-Scale Distributed Systems and Middleware (LADIS'08). ACM, New York, NY, Article No. 3.
[14]
Leslie Lamport. 1978. Time, clocks, and the ordering of events in a distributed system. Communations of the ACM 21, 7, 558--565.
[15]
Leslie Lamport. 1998. The part-time parliament. ACM Transactions on Computer Systems 16, 2, 133--169.
[16]
Leslie Lamport. 2001. Paxos made simple. ACM SIGACT News 32, 4, 51--58.
[17]
Leslie Lamport. 2005. Generalized Consensus and Paxos. Technical Report MSR-TR-2005-33. Microsoft Research, Mountain View, CA. Available at http://research.microsoft.com/pubs/64631/tr-2005-33.pdf
[18]
Leslie Lamport. 2006. Fast Paxos. Distributed Computing 19, 2, 79--103.
[19]
Leslie Lamport, Dahlia Malkhi, and Lidong Zhou. 2008. Stoppable Paxos. Technical Report. Microsoft Research, Mountain View, CA. Available at http://research.microsoft.com/apps/pubs/default.aspx?id=101826
[20]
Leslie Lamport, Dahlia Malkhi, and Lidong Zhou. 2009. Vertical Paxos and primary-backup replication. In Proceedings of the 28th ACM Symposium on Principles of Distributed Computing (PODC'09). ACM, New York, NY, 312--313.
[21]
Leslie Lamport and Mike Massa. 2004. Cheap Paxos. In Proceedings of the 2004 International Conference on Dependable Systems and Networks (DSN'04). IEEE, Los Alamitos, CA, 307--315.
[22]
Butler W. Lampson. 1996. How to build a highly available system using consensus. In Proceedings of the 10th International Workshop on Distributed Algorithms (WDAG'96). 1--17.
[23]
Butler W. Lampson. 2001. The ABCD's of Paxos. In Proceedings of the 20th Annual ACM Symposium on Principles of Distributed Computing (PODC'01). ACM, New York, NY, 13--14.
[24]
Harry C. Li, Allen Clement, Amitanand S. Aiyer, and Lorenzo Alvisi. 2007. The Paxos register. In Proceedings of the 26th IEEE International Symposium on Reliable Distributed Systems (SRDS'07). IEEE, Los Alamitos, CA, 114--126.
[25]
Jacob R. Lorch, Atul Adya, William J. Bolosky, Ronnie Chaiken, John R. Douceur, and Jon Howell. 2006. The SMART way to migrate replicated stateful services. In Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006 (EuroSys'06). ACM, New York, NY, 103--115.
[26]
Yanhua Mao, Flavio P. Junqueira, and Keith Marzullo. 2008. Mencius: Building efficient replicated state machines for WANs. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI'08). 369--384.
[27]
David Mazières. 2007. Paxos Made Practical. Technical Report. Stanford University, Stanford, CA. Available at http://www.scs.stanford.edu/∼dm/home/papers/paxos.pdf
[28]
Hein Meling and Leander Jehl. 2013. Tutorial summary: Paxos explained from scratch. In Principles of Distributed Systems. Lecture Notes in Computer Science, Vol. 8304. Springer, 1--10.
[29]
C. Mohan, Don Haderle, Bruce Lindsay, Hamid Pirahesh, and Peter Schwarz. 1992. ARIES: A transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Transactions on Database Systems 17, 1, 94--162.
[30]
Iulian Moraru, David G. Andersen, and Michael Kaminsky. 2013. There is more consensus in egalitarian parliaments. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP'13). ACM, New York, NY, 358--372.
[31]
Brian M. Oki and Barbara H. Liskov. 1988. Viewstamped replication: A new primary copy method to support highly-available distributed systems. In Proceedings of the 7th Annual ACM Symposium on Principles of Distributed Computing (PODC'88). ACM, New York, NY, 8--17.
[32]
Diego Ongaro and John Ousterhout. 2014. In search of an understandable consensus algorithm. In Proceedings of the 2014 USENIX Annual Technical Conference (ATC'14). 305--319.
[33]
Richard D. Schlichting and Fred B. Schneider. 1983. Fail-stop processors: An approach to designing fault-tolerant computing systems. ACM Transactions on Computer Systems 1, 3, 222--238.
[34]
Fred B. Schneider. 1990. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys 22, 4, 299--319.
[35]
Robert H. Thomas. 1979. A majority consensus approach to concurrency control for multiple copy databases. ACM Transactions on Database Systems 4, 2, 180--209.
[36]
Robbert van Renesse, Nicolas Schiper, and Fred B. Schneider. 2014. Vive la différence: Paxos vs. Viewstamped Replication vs. Zab. IEEE Transactions on Dependable and Secure Computing PP, 99, 1.
[37]
Robbert van Renesse, Fred B. Schneider, and Johannes Gehrke. 2011. Nerio: Leader Election and Edict Ordering. Technical Report. Cornell University, Ithaca, NY. Available at http://hdl.handle.net/1813/23631

Cited By

View all
  • (2024)Boki: Towards Data Consistency and Fault Tolerance with Shared Logs in Stateful Serverless ComputingACM Transactions on Computer Systems10.1145/365307242:3-4(1-35)Online publication date: 12-Sep-2024
  • (2024)Raft Protocol for Fault Tolerance and Self-Recovery in Federated LearningProceedings of the 19th International Symposium on Software Engineering for Adaptive and Self-Managing Systems10.1145/3643915.3644093(110-121)Online publication date: 15-Apr-2024
  • (2024)Bigger, not Badder: Safely Scaling BFT ProtocolsProceedings of the 11th Workshop on Principles and Practice of Consistency for Distributed Data10.1145/3642976.3653033(30-36)Online publication date: 22-Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys
ACM Computing Surveys  Volume 47, Issue 3
April 2015
602 pages
ISSN:0360-0300
EISSN:1557-7341
DOI:10.1145/2737799
  • Editor:
  • Sartaj Sahni
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 February 2015
Accepted: 01 September 2014
Revised: 01 June 2014
Received: 01 August 2013
Published in CSUR Volume 47, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Replicated state machines
  2. consensus
  3. voting

Qualifiers

  • Tutorial
  • Research
  • Refereed

Funding Sources

  • DARPA
  • NSF
  • AFOSR

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)142
  • Downloads (Last 6 weeks)8
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Boki: Towards Data Consistency and Fault Tolerance with Shared Logs in Stateful Serverless ComputingACM Transactions on Computer Systems10.1145/365307242:3-4(1-35)Online publication date: 12-Sep-2024
  • (2024)Raft Protocol for Fault Tolerance and Self-Recovery in Federated LearningProceedings of the 19th International Symposium on Software Engineering for Adaptive and Self-Managing Systems10.1145/3643915.3644093(110-121)Online publication date: 15-Apr-2024
  • (2024)Bigger, not Badder: Safely Scaling BFT ProtocolsProceedings of the 11th Workshop on Principles and Practice of Consistency for Distributed Data10.1145/3642976.3653033(30-36)Online publication date: 22-Apr-2024
  • (2024)Optimizing Distributed Protocols with Query RewritesProceedings of the ACM on Management of Data10.1145/36392572:1(1-25)Online publication date: 26-Mar-2024
  • (2024)Simpler is Better: Revisiting Mencius State Machine Replication2024 43rd International Symposium on Reliable Distributed Systems (SRDS)10.1109/SRDS64841.2024.00026(175-186)Online publication date: 30-Sep-2024
  • (2024)Self-stabilizing Multivalued Consensus in Asynchronous Crash-prone SystemsTheoretical Computer Science10.1016/j.tcs.2024.114886(114886)Online publication date: Sep-2024
  • (2024)Self-stabilizing indulgent zero-degrading binary consensusTheoretical Computer Science10.1016/j.tcs.2024.114387989(114387)Online publication date: Mar-2024
  • (2024)Antipaxos: Taking interactive consistency to the next levelJournal of Parallel and Distributed Computing10.1016/j.jpdc.2024.104839187(104839)Online publication date: May-2024
  • (2024)An in-depth and insightful exploration of failure detection in distributed systemsComputer Networks10.1016/j.comnet.2024.110432247(110432)Online publication date: Jun-2024
  • (2024)vCubeChainAd Hoc Networks10.1016/j.adhoc.2024.103461158:COnline publication date: 1-May-2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media