Abstract
The Paxos algorithm is famously difficult to reason about and even more so to implement, despite having been synonymous with distributed consensus for over a decade. The recently proposed Raft protocol lays claim to being a new, understandable consensus algorithm, improving on Paxos without making compromises in performance or correctness.
In this study, we repeat the Raft authors' performance analysis. We developed a clean-slate implementation of the Raft protocol and built an event-driven simulation framework for prototyping it on experimental topologies. We propose several optimizations to the Raft protocol and demonstrate their effectiveness under contention. Finally, we empirically validate the correctness of the Raft protocol invariants and evaluate Raft's understandability claims.
- CoreOS website. http://coreos.com. Accessed on 02/09/2014.Google Scholar
- W. J. Bolosky, D. Bradshaw, R. B. Haagens, N. P. Kusters, and P. Li. Paxos replicated state machines as the basis of a highperformance data store. In Proceedings of the 8th USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2011. Google ScholarDigital Library
- M. Burrows. The Chubby lock service for loosely-coupled distributed systems. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI), pages 335--350, 2006. Google ScholarDigital Library
- T. D. Chandra, R. Griesemer, and J. Redstone. Paxos made live: An engineering perspective. In Proceedings of the 26th ACM Symposium on Principles of Distributed Computing (PODC), pages 398--407, 2007. Google ScholarDigital Library
- F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. BigTable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS), 26(2):4, 2008. Google ScholarDigital Library
- B. Clark, T. Deshane, E. Dow, S. Evanchik, M. Finlayson, J. Herne, and J. N. Matthews. Xen and the art of repeated research. In Proceedings of the USENIX Annual Technical Conference, pages 135--144, 2004. Google ScholarDigital Library
- C. Collberg, T. Proebsting, G. Moraila, A. Shankaran, Z. Shi, and A. M. Warren. Measuring reproducibility in computer systems. Technical report, University of Arizona, 2014.Google Scholar
- G. Delzanno, M. Tatarek, and R. Traverso. Model Checking Paxos in Spin. ArXiv e-prints, Aug. 2014.Google Scholar
- S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google file system. ACM SIGOPS Operating Systems Review, 37(5):29--43, 2003. Google ScholarDigital Library
- M. P. Herlihy and J. M. Wing. Linearizability: A correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems, 12(3):463--492, 1990. Google ScholarDigital Library
- H. Howard. ARC: Analysis of Raft Consensus. Technical Report UCAM-CL-TR-857, University of Cambridge, Computer Laboratory, July 2014.Google Scholar
- P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. ZooKeeper: wait-free coordination for internet-scale systems. In Proceedings of the 2010 USENIX Annual Technical Conference (USENIX ATC), volume 8, pages 145--158, 2010. Google ScholarDigital Library
- J. Kovacevic. How to encourage and publish reproducible research. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, volume 4, pages 1273--1276, April 2007.Google ScholarCross Ref
- L. Lamport. The part-time parliament. ACM Transactions on Computer Systems (TOCS), 16(2):133--169, 1998. Google ScholarDigital Library
- L. Lamport. Paxos made simple. ACM SIGACT News 32.4, pages 18--25vi, 2001.Google Scholar
- L. Lamport. Fast Paxos. Distributed Computing, 19(2):79--103, 2006.Google Scholar
- L. Lamport and M. Massa. Cheap Paxos. In Proceedings of the International Conference on Dependable Systems and Networks, pages 307--314, 2004. Google ScholarDigital Library
- B. Liskov and J. Cowling. Viewstamped replication revisited. Technical Report MIT-CSAIL-TR-2012-021, MIT Computer Science and Artificial Intelligence Laboratory, 2012.Google Scholar
- A. Madhavapeddy. Creating high-performance statically type-safe network applications. PhD thesis, University of Cambridge, 2006.Google Scholar
- A. Madhavapeddy. Combining static model checking with dynamic enforcement using the statecall policy language. In Proceedings of the 11th International Conference on Formal Engineering Methods: Formal Methods and Software Engineering, pages 446--465, 2009. Google ScholarDigital Library
- D. Mazieres. Paxos made practical. http://www.scs.stanford.edu/~dm/home/papers/paxos.pdf. Accessed on 02/09/2014.Google Scholar
- B. M. Oki and B. H. Liskov. Viewstamped replication: A new primary copy method to support highly-available distributed systems. In Proceedings of the 7th annual ACM Symposium on Principles of Distributed Computing (PODC), pages 8--17, 1988. Google ScholarDigital Library
- D. Ongaro. Consensus: Bridging Theory and Practice. PhD thesis, Stanford University, 2014.Google Scholar
- D. Ongaro and J. Ousterhout. In search of an understandable consensus algorithm (extended version). http:// ramcloud.stanford.edu/raft.pdf. Accessed on 13/09/2014.Google Scholar
- D. Ongaro and J. Ousterhout. Raft: A consensus algorithm for replicated logs (user study). http://www.youtube.com/watch?v=YbZ3zDzDnrw. Accessed on 02/09/2014.Google Scholar
- D. Ongaro and J. Ousterhout. In search of an understandable consensus algorithm. In Proceedings of the USENIX Annual Technical Conference, 2014. Google ScholarDigital Library
- M. Pease, R. Shostak, and L. Lamport. Reaching agreement in the presence of faults. Journal of the ACM (JACM), 27(2):228--234, 1980. Google ScholarDigital Library
- R. Van Renesse. Paxos made moderately complex. http://www.cs.cornell.edu/courses/cs7412/2011sp/paxos.pdf, 2011. Accessed on 02/09/2014.Google Scholar
- P. Vandewalle, J. Kovacevic, and M. Vetterli. Reproducible research in signal processing. Signal Processing Magazine, IEEE, 26(3):37--47, 2009.Google ScholarCross Ref
- A. Varga et al. The OMNeT++ discrete event simulation system. In Proceedings of the European Simulation Multiconference, volume 9, page 185, 2001.Google Scholar
- J. Yang, T. Chen, M. Wu, Z. Xu, X. Liu, H. Lin, M. Yang, F. Long, L. Zhang, and L. Zhou. Modist: Transparent model checking of unmodified distributed systems. In Proceedings of the 6th USENIX Symposium on Networked Systems Design and Implementation (NSDI), pages 213--228, 2009. Google ScholarDigital Library
Index Terms
- Raft Refloated: Do We Have Consensus?
Recommendations
Network-Assisted Raft Consensus Algorithm
SIGCOMM Posters and Demos '17: Proceedings of the SIGCOMM Posters and DemosConsensus is a fundamental problem in distributed computing. In this poster, we ask the following question: can we partially offload the execution of a consensus algorithm to the network to improve its performance? We argue for an affirmative answer by ...
On the Parallels between Paxos and Raft, and how to Port Optimizations
PODC '19: Proceedings of the 2019 ACM Symposium on Principles of Distributed ComputingIn recent years, Raft has surpassed Paxos to become the more popular consensus protocol in the industry. While many researchers have observed the similarities between the two protocols, no one has shown how Raft and Paxos are formally related to each ...
Paxos vs Raft: have we reached consensus on distributed consensus?
PaPoC '20: Proceedings of the 7th Workshop on Principles and Practice of Consistency for Distributed DataDistributed consensus is a fundamental primitive for constructing fault-tolerant, strongly-consistent distributed systems. Though many distributed consensus algorithms have been proposed, just two dominate production systems: Paxos, the traditional, ...
Comments