skip to main content
10.1145/2815400.2815427acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections
research-article

Paxos made transparent

Published: 04 October 2015 Publication History

Abstract

State machine replication (SMR) leverages distributed consensus protocols such as Paxos to keep multiple replicas of a program consistent in face of replica failures or network partitions. This fault tolerance is enticing on implementing a principled SMR system that replicates general programs, especially server programs that demand high availability. Unfortunately, SMR assumes deterministic execution, but most server programs are multithreaded and thus nondeterministic. Moreover, existing SMR systems provide narrow state machine interfaces to suit specific programs, and it can be quite strenuous and error-prone to orchestrate a general program into these interfaces
This paper presents Crane, an SMR system that transparently replicates general server programs. Crane achieves distributed consensus on the socket API, a common interface to almost all server programs. It leverages deterministic multithreading (specifically, our prior system Parrot) to make multithreaded replicas deterministic. It uses a new technique we call time bubbling to efficiently tackle a difficult challenge of nondeterministic network input timing. Evaluation on five widely used server programs (e.g., Apache, ClamAV, and MySQL) shows that Crane is easy to use, has moderate overhead, and is robust. Crane's source code is at github.com/columbia/crane.

Supplementary Material

MP4 File (p105.mp4)

References

[1]
Boost C++ Libraries. http://www.boost.org/.
[2]
LXC. https://linuxcontainers.org/.
[3]
MySQL. http://www.mysql.com/,.
[4]
MySQL Replication. https://dev.mysql.com/doc/refman/5.0/en/replication.html.
[5]
SQLite. https://www.sqlite.org/.
[6]
ZooKeeper. https://zookeeper.apache.org/.
[7]
SysBench: a system performance benchmark. http://sysbench.sourceforge.net, 2004.
[8]
MediaTomb - Free UPnP MediaServer. http://mediatomb.cc/, 2014.
[9]
Apache. Apache web server. http://www.apache.org, 2012.
[10]
A. Askarov, D. Zhang, and A. C. Myers. Predictive black-box mitigation of timing channels. In Proceedings of the 17th ACM conference on Computer and communications security (CCS '10), Oct. 2010.
[11]
A. Aviram, S. Hu, B. Ford, and R. Gummadi. Determinating timing channels in compute clouds. In Proceedings of the 2010 ACMWorkshop on Cloud Computing SecurityWorkshop (CCSW '10), Oct. 2010.
[12]
A. Aviram, S.-C. Weng, S. Hu, and B. Ford. Efficient system-enforced deterministic parallelism. In Proceedings of the Ninth Symposium on Operating Systems Design and Implementation (OSDI '10), Oct. 2010.
[13]
T. Bergan, O. Anderson, J. Devietti, L. Ceze, and D. Grossman. CoreDet: a compiler and runtime system for deterministic multithreaded execution. In Fifteenth International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS '10), pages 53--64, Mar. 2010.
[14]
T. Bergan, N. Hunt, L. Ceze, and S. D. Gribble. Deterministic process groups in dOS. In Proceedings of the Ninth Symposium on Operating Systems Design and Implementation (OSDI '10), Oct. 2010.
[15]
T. Bergan, L. Ceze, and D. Grossman. Input-covering schedules for multithreaded programs. In Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications, pages 677--692. ACM, 2013.
[16]
E. Berger, T. Yang, T. Liu, D. Krishnan, and A. Novark. Grace: safe and efficient concurrent programming. In Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA '09), pages 81--96, Oct. 2009.
[17]
Berkeley DB. http://www.sleepycat.com.
[18]
R. L. Bocchino, Jr., V. S. Adve, D. Dig, S. V. Adve, S. Heumann, R. Komuravelli, J. Overbey, P. Simmons, H. Sung, and M. Vakilian. A type and effect system for deterministic parallel java. In Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA '09), pages 97--116, Oct. 2009.
[19]
W. J. Bolosky, D. Bradshaw, R. B. Haagens, N. P. Kusters, and P. Li. Paxos replicated state machines as the basis of a high-performance data store. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, NSDI'11, Berkeley, CA, USA, 2011. USENIX Association.
[20]
T. C. Bressoud and F. B. Schneider. Hypervisor-based fault tolerance. In Proceedings of the 15th ACM Symposium on Operating Systems Principles (SOSP '95), Dec. 1995.
[21]
M. Burrows. The chubby lock service for loosely-coupled distributed systems. In Proceedings of the Seventh Symposium on Operating Systems Design and Implementation (OSDI '06), pages 335--350, 2006.
[22]
M. Castro and B. Liskov. Practical byzantine fault tolerance. In Proceedings of the Third Symposium on Operating Systems Design and Implementation (OSDI '99), Oct. 1999.
[23]
T. D. Chandra, R. Griesemer, and J. Redstone. Paxos made live: An engineering perspective. In Proceedings of the Twenty-sixth Annual ACM Symposium on Principles of Distributed Computing (PODC '07), Aug. 2007.
[24]
Clam AntiVirus. http://www.clamav.net/.
[25]
A. Clement, M. Kapritsos, S. Lee, Y. Wang, L. Alvisi, M. Dahlin, and T. Riche. Upright cluster services. In Proceedings of the 22nd ACM Symposium on Operating Systems Principles (SOSP '09), Oct. 2009.
[26]
concoord. Openreplica. http://openreplica.org/download/, 2015.
[27]
J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, P. Hochschild, W. Hsieh, S. Kanthak, E. Kogan, H. Li, A. Lloyd, S. Melnik, D. Mwaura, D. Nagle, S. Quinlan, R. Rao, L. Rolig, Y. Saito, M. Szymaniak, C. Taylor, R. Wang, and D. Woodford. Spanner: Google's globally-distributed database. Oct. 2012.
[28]
criu. Criu. http://criu.org, 2015.
[29]
H. Cui, J. Simsa, Y.-H. Lin, H. Li, B. Blum, X. Xu, J. Yang, G. A. Gibson, and R. E. Bryant. Parrot: a practical runtime for deterministic, stable, and reliable threads. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP '13), Nov. 2013.
[30]
H. Cui, R. Gu, C. Liu, and J. Yang. Repframe: An efficient and transparent framework for dynamic program analysis. In Proceedings of 6th Asia-Pacific Workshop on Systems (APSys '15), July 2015.
[31]
J. Devietti, B. Lucia, L. Ceze, and M. Oskin. DMP: deterministic shared memory multiprocessing. In Fourteenth International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS '09), pages 85--96, Mar. 2009.
[32]
D. Engler and K. Ashcraft. RacerX: effective, static detection of race conditions and deadlocks. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (SOSP '03), pages 237--252, Oct. 2003.
[33]
Z. Guo, C. Hong, M. Yang, D. Zhou, L. Zhou, and L. Zhuang. Rex: Replication at the speed of multi-core. In Proceedings of the 2014 ACM European Conference on Computer Systems (EUROSYS '14), page 11. ACM, 2014.
[34]
N. Hunt, T. Bergan, L. Ceze, and S. Gribble. DDOS: Taming nondeterminism in distributed systems. In Eighteenth International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS '13), pages 499--508, 2013.
[35]
G. Jin, W. Zhang, D. Deng, B. Liblit, and S. Lu. Automated concurrency-bug fixing. In Proceedings of the Tenth Symposium on Operating Systems Design and Implementation (OSDI '12), pages 221--236, 2012.
[36]
H. Jula, D. Tralamazza, Z. Cristian, and C. George. Deadlock immunity: Enabling systems to defend against deadlocks. In Proceedings of the Eighth Symposium on Operating Systems Design and Implementation (OSDI '08), pages 295--308, Dec. 2008.
[37]
M. Kapritsos, Y. Wang, V. Quema, A. Clement, L. Alvisi, M. Dahlin, et al. All about eve: Execute-verify replication for multi-core servers. In Proceedings of the Tenth Symposium on Operating Systems Design and Implementation (OSDI '12), volume 12, pages 237--250, 2012.
[38]
R. Kotla, L. Alvisi, M. Dahlin, A. Clement, and E. Wong. Zyzzyva: Speculative byzantine fault tolerance. In Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP '07), Oct. 2007.
[39]
O. Laadan, N. Viennot, and J. Nieh. Transparent, lightweight application execution replay on commodity multiprocessor operating systems. In Proceedings of the ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems (SIGMETRICS '10), pages 155--166, June 2010.
[40]
O. Laadan, N. Viennot, and J. Nieh. Transparent, lightweight application execution replay on commodity multiprocessor operating systems. In ACM SIGMETRICS Performance Evaluation Review, volume 38, pages 155--166, 2010.
[41]
O. Laadan, N. Viennot, C. che Tsai, C. Blinn, J. Yang, and J. Nieh. Pervasive detection of process races in deployed systems. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP '11), Oct. 2011.
[42]
L. Lamport. Paxos made simple. http://research.microsoft.com/en-us/um/people/lamport/pubs/paxos-simple.pdf.
[43]
L. Lamport. Time, clocks, and the ordering of events in a distributed system. Comm. ACM, 21(7):558--565, 1978.
[44]
L. Lamport. The part-time parliament. ACM Trans. Comput. Syst., 16(2):133--169, 1998.
[45]
L. Lamport. Fast paxos. Fast Paxos, Aug. 2006.
[46]
D. Lee, B. Wester, K. Veeraraghavan, S. Narayanasamy, P. M. Chen, and J. Flinn. Respec: efficient online multiprocessor replayvia speculation and external determinism. In Fifteenth International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS '10), pages 77--90, Mar. 2010.
[47]
libevent. libevent. libevent.org/, 2015.
[48]
T. Liu, C. Curtsinger, and E. D. Berger. DTHREADS: efficient Deterministic Multithreading. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (SOSP '11), pages 327--336, Oct. 2011.
[49]
S. Lu, J. Tucek, F. Qin, and Y. Zhou. AVIO: detecting atomicity violations via access interleaving invariants. In Twelfth International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS '06), pages 37--48, Oct. 2006.
[50]
S. Lu, S. Park, C. Hu, X. Ma, W. Jiang, Z. Li, R. A. Popa, and Y. Zhou. Muvi: automatically inferring multi-variable access correlations and detecting related semantic and concurrency bugs. In Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP '07), pages 103--116, 2007.
[51]
Y. Mao, F. P. Junqueira, and K. Marzullo. Mencius: building efficient replicated state machines for wans. In Proceedings of the 8th USENIX conference on Operating systems design and implementation, volume 8, pages 369--384, 2008.
[52]
D. Mazieres. Paxos made practical. Technical report, Technical report, 2007. http://www. scs. stanford. edu/dm/home/papers, 2007.
[53]
mencoder. Mencoder. https://www.mplayerhq.hu/, 2015.
[54]
Mongoose. https://code.google.com/p/mongoose/.
[55]
I. Moraru, D. G. Andersen, and M. Kaminsky. There is more consensus in egalitarian parliaments. In Proceedings of the 13th ACM Symposium on Operating Systems Principles (SOSP '91), Nov. 2013.
[56]
M. Olszewski, J. Ansel, and S. Amarasinghe. Kendo: efficient deterministic multithreading in software. In Fourteenth International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS '09), pages 97--108, Mar. 2009.
[57]
D. Ongaro and J. Ousterhout. In search of an understandable consensus algorithm. In Proceedings of the USENIX Annual Technical Conference (USENIX '14), June 2014.
[58]
C.-S. Park and K. Sen. Randomized active atomicity violation detection in concurrent programs. In Proceedings of the 16th ACM SIGSOFT International Symposium on Foundations of Software Engineering (SIGSOFT '08/FSE-16), pages 135--145, Nov. 2008.
[59]
S. Park, S. Lu, and Y. Zhou. CTrigger: exposing atomicity violation bugs from their hiding places. In Fourteenth International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS '09), pages 25--36, Mar. 2009.
[60]
M. Primi. LibPaxos. http://libpaxos.sourceforge.net/.
[61]
J. Rao, E. J. Shekita, and S. Tata. Using paxos to build a scalable, consistent, and highly available datastore. Proc. VLDB Endow., Jan. 2011.
[62]
S. Savage, M. Burrows, G. Nelson, P. Sobalvarro, and T. E. Anderson. Eraser: A dynamic data race detector for multi-threaded programming. ACM Transactions on Computer Systems, pages 391--411, Nov. 1997.
[63]
F. B. Schneider. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Computing Surveys (CSUR), 22(4):299--319, 1990.
[64]
K. Sen. Race directed random testing of concurrent programs. In Proceedings of the ACM SIGPLAN 2008 Conference on Programming Language Design and Implementation (PLDI '08), pages 11--21, June 2008.
[65]
R. Van Renesse and D. Altinbuken. Paxos made moderately complex. ACM Computing Surveys (CSUR), 47(3):42:1--42:36, 2015.
[66]
VTune. http://software.intel.com/en-us/intel-vtune-amplifier-xe/.
[67]
Y. Wang, T. Kelly, M. Kudlur, S. Lafortune, and S. Mahlke. Gadara: Dynamic deadlock avoidance for multithreaded programs. In Proceedings of the Eighth Symposium on Operating Systems Design and Implementation (OSDI '08), pages 281--294, Dec. 2008.
[68]
J. Wu, H. Cui, and J. Yang. Bypassing races in live applications with execution filters. In Proceedings of the Ninth Symposium on Operating Systems Design and Implementation (OSDI '10), Oct. 2010.
[69]
Y. Yu, T. Rodeheffer, and W. Chen. RaceTrack: efficient detection of data race conditions via adaptive tracking. In Proceedings of the 20th ACM Symposium on Operating Systems Principles (SOSP '05), pages 221--234, Oct. 2005.
[70]
D. Zhang, A. Askarov, and A. C. Myers. Predictive mitigation of timing channels in interactive systems. In Proceedings of the 18th ACM conference on Computer and communications security (CCS '11), Oct. 2011.
[71]
W. Zhang, C. Sun, and S. Lu. ConMem: detecting severe concurrency bugs through an effect-oriented approach. In Fifteenth International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS '10), pages 179--192, Mar. 2010.

Cited By

View all
  • (2024)Extending State Machine Replication through CompositionProceedings of the 13th Latin-American Symposium on Dependable and Secure Computing10.1145/3697090.3697106(231-240)Online publication date: 26-Nov-2024
  • (2024)Bandle: Asynchronous State Machine Replication Made EfficientProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650091(265-280)Online publication date: 22-Apr-2024
  • (2022)Understanding and Reaching the Performance Limit of Schedule Tuning on Stable Synchronization DeterminismProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569669(223-238)Online publication date: 8-Oct-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SOSP '15: Proceedings of the 25th Symposium on Operating Systems Principles
October 2015
499 pages
ISBN:9781450338349
DOI:10.1145/2815400
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 October 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. fault tolerance
  2. software reliability
  3. stable and deterministic multithreading
  4. state machine replication

Qualifiers

  • Research-article

Conference

SOSP '15
Sponsor:

Acceptance Rates

SOSP '15 Paper Acceptance Rate 30 of 181 submissions, 17%;
Overall Acceptance Rate 174 of 961 submissions, 18%

Upcoming Conference

SOSP '25
ACM SIGOPS 31st Symposium on Operating Systems Principles
October 13 - 16, 2025
Seoul , Republic of Korea

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)68
  • Downloads (Last 6 weeks)6
Reflects downloads up to 01 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Extending State Machine Replication through CompositionProceedings of the 13th Latin-American Symposium on Dependable and Secure Computing10.1145/3697090.3697106(231-240)Online publication date: 26-Nov-2024
  • (2024)Bandle: Asynchronous State Machine Replication Made EfficientProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650091(265-280)Online publication date: 22-Apr-2024
  • (2022)Understanding and Reaching the Performance Limit of Schedule Tuning on Stable Synchronization DeterminismProceedings of the International Conference on Parallel Architectures and Compilation Techniques10.1145/3559009.3569669(223-238)Online publication date: 8-Oct-2022
  • (2022)RolisProceedings of the Seventeenth European Conference on Computer Systems10.1145/3492321.3519561(69-84)Online publication date: 28-Mar-2022
  • (2021)Optimal prediction of synchronization-preserving racesProceedings of the ACM on Programming Languages10.1145/34343175:POPL(1-29)Online publication date: 4-Jan-2021
  • (2020)HovercRaftProceedings of the Fifteenth European Conference on Computer Systems10.1145/3342195.3387545(1-17)Online publication date: 15-Apr-2020
  • (2020)MeerkatProceedings of the Fifteenth European Conference on Computer Systems10.1145/3342195.3387529(1-14)Online publication date: 15-Apr-2020
  • (2020)Parallel State Machine Replication from Generalized Consensus2020 International Symposium on Reliable Distributed Systems (SRDS)10.1109/SRDS51746.2020.00021(133-142)Online publication date: Sep-2020
  • (2020)HAMS: High Availability for Distributed Machine Learning Service Graphs2020 50th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN)10.1109/DSN48063.2020.00036(184-196)Online publication date: Jun-2020
  • (2019)DerechoACM Transactions on Computer Systems10.1145/330225836:2(1-49)Online publication date: 2-Apr-2019
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media