skip to main content
10.1145/2592798.2592800acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

Rex: replication at the speed of multi-core

Published:14 April 2014Publication History

ABSTRACT

Standard state-machine replication involves consensus on a sequence of totally ordered requests through, for example, the Paxos protocol. Such a sequential execution model is becoming outdated on prevalent multi-core servers. Highly concurrent executions on multi-core architectures introduce non-determinism related to thread scheduling and lock contentions, and fundamentally break the assumption in state-machine replication. This tension between concurrency and consistency is not inherent because the total-ordering of requests is merely a simplifying convenience that is unnecessary for consistency. Concurrent executions of the application can be decoupled with a sequence of consensus decisions through consensus on partial-order traces, rather than on totally ordered requests, that capture the non-deterministic decisions in one replica execution and to be replayed with the same decisions on others. The result is a new multi-core friendly replicated state-machine framework that achieves strong consistency while preserving parallelism in multi-thread applications. On 12-core machines with hyper-threading, evaluations on typical applications show that we can scale with the number of cores, achieving up to 16 times the throughput of standard replicated state machines.

References

  1. P. A. Alsberg and J. D. Day. A principle for resilient sharing of distributed resources. In Proceedings of the 2nd international conference on software engineering, ICSE '76, pages 562--570. IEEE, 1976. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. G. Altekar and I. Stoica. ODR: output-deterministic replay for multicore debugging. In Proceedings of the 22nd ACM symposium on operating systems principles, SOSP '09, pages 193--206. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Aviram, S.-C. Weng, S. Hu, and B. Ford. Efficient system-enforced deterministic parallelism. In Proceedings of the 9th USENIX symposium on operating systems design and implementation, OSDI'10, pages 1--16. USENIX, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Basile, Z. Kalbarczyk, and R. K. Iyer. Active replication of multithreaded applications. IEEE transactions on parallel and distributed systems, 17(5):448--465, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. T. Bergan, O. Anderson, J. Devietti, L. Ceze, and D. Grossman. CoreDet: a compiler and runtime system for deterministic multithreaded execution. In Proceedings of the 15th international conference on architectural support for programming languages and operating systems, ASPLOS '10, pages 53--64. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. Bergan, J. Devietti, N. Hunt, and L. Ceze. The deterministic execution hammer: how well does it actually pound nails? In Proceedings of the 2nd workshop on determinism and correctness in parallel programming, WODET '11, pages 448--465. ACM, 2011.Google ScholarGoogle Scholar
  7. T. Bergan, N. Hunt, L. Ceze, and S. D. Gribble. Deterministic process groups in dOs. In Proceedings of the 9th USENIX symposium on operating systems design and implementation, OSDI'10, pages 1--16. USENIX, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. Bernick, B. Bruckert, P. D. Vigna, D. Garcia, R. Jardine, J. Klecka, and J. Smullen. NonStop advanced architecture. In Proceedings of the 35th international conference on dependable systems and networks, DSN '05, pages 12--21. IEEE, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. W. J. Bolosky, D. Bradshaw, R. B. Haagens, N. P. Kusters, and P. Li. Paxos replicated state machines as the basis of a high-performance data store. In Proceedings of the 8th USENIX symposium on networked systems design and implementation, NSDI'11, pages 11--11. USENIX, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Burrows. The Chubby lock service for loosely-coupled distributed systems. In Proceedings of the 7th USENIX symposium on operating systems design and implementation, OSDI '06, pages 335--350. USENIX, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. T. D. Chandra, R. Griesemer, and J. Redstone. Paxos made live: an engineering perspective. In Proceedings of the 26th annual ACM symposium on principles of distributed computing, PODC '07, pages 398--407. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. H. Cui, J. Wu, J. Gallagher, H. Guo, and J. Yang. Efficient deterministic multithreading through schedule relaxation. In Proceedings of the 23rd ACM symposium on operating systems principles, SOSP '11, pages 337--351. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. B. Cully, G. Lefebvre, D. Meyer, M. Feeley, N. Hutchinson, and A. Warfield. Remus: high availability via asynchronous virtual machine replication. In Proceedings of the 5th USENIX symposium on networked systems design and implementation, NSDI'08, pages 161--174. USENIX, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Dean and S. Ghemawat. LevelDB: A fast and lightweight key/value database library by Google., 2011. http://code.google.com/p/leveldb.Google ScholarGoogle Scholar
  15. J. Devietti, B. Lucia, L. Ceze, and M. Oskin. DMP: deterministic shared memory multiprocessing. In Proceedings of the 14th international conference on architectural support for programming languages and operating systems, ASPLOS '09, pages 85--96. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Devietti, J. Nelson, T. Bergan, L. Ceze, and D. Grossman. RCDC: a relaxed consistency deterministic computer. In Proceedings of the 16th international conference on architectural support for programming languages and operating systems, ASPLOS '11, pages 67--78. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. G. W. Dunlap, S. T. King, S. Cinar, M. A. Basrai, and P. M. Chen. ReVirt: enabling intrusion analysis through virtual-machine logging and replay. In Proceedings of the 5th USENIX symposium on operating systems design and implementation, OSDI '02, pages 211--224. ACM, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. Fitzpatrick. memcached - a distributed memory object caching system, 2011. http://memcached.org/.Google ScholarGoogle Scholar
  19. A. Georges, M. Christiaens, M. Ronsse, and K. De Bosschere. JaRec: a portable record/replay environment for multi-threaded Java applications. Software: practice and experience, 34:523--547, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Z. Guo, X. Wang, J. Tang, X. Liu, Z. Xu, M. Wu, M. F. Kaashoek, and Z. Zhang. R2: an application-level kernel for record and replay. In Proceedings of the 8th USENIX symposium on operating systems design and implementation, OSDI'08, pages 193--208. USENIX, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. D. R. Hower, P. Dudnik, M. D. Hill, and D. A. Wood. Calvin: deterministic or not? Free will to choose. In Proceedings of the 2011 IEEE 17th international symposium on high performance computer architecture, HPCA '11, pages 333--334. IEEE, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Kapritsos, Y. Wang, V. Quema, A. Clement, L. Alvisi, and M. Dahlin. All about Eve: execute-verify replication for multi-core servers. In Proceedings of the 10th USENIX symposium on operating systems design and implementation, OSDI'12, pages 237--250. USENIX, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Kończak, N. Santos, T. Zurkowski, P. T. Wojciechowski, and A. Schiper. JPaxos: state machine replication based on the Paxos protocol. Technical report, EPFL, 2011.Google ScholarGoogle Scholar
  24. R. Kotla and M. Dahlin. High throughput Byzantine fault tolerance. In Proceedings of the 34th international conference on dependable systems and networks, DSN '04, pages 575--. IEEE, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. O. Laadan, N. Viennot, and J. Nieh. Transparent, lightweight application execution replay on commodity multiprocessor operating systems. In Proceedings of the 2010 international conference on measurement and modeling of computer systems, SIGMETRICS '10, pages 155--166. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. F. Labs. Kyoto Cabinet: a straightforward implementation of DBM. http://www.fallabs.com/kyotocabinet/.Google ScholarGoogle Scholar
  27. L. Lamport. Time, clocks, and the ordering of events in a distributed system. Communications of the ACM, 21(7):558--565, 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. L. Lamport. The part-time parliament. ACM transaction on computer systems, 16(2):133--169, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. L. Lamport. Paxos made simple. ACM SIGACT news, 32(4):18--25, 2001.Google ScholarGoogle Scholar
  30. L. Lamport. Generalized consensus and Paxos. Technical Report MSR-TR-2005-33, Microsoft, 2005.Google ScholarGoogle Scholar
  31. D. Lee, B. Wester, K. Veeraraghavan, S. Narayanasamy, P. M. Chen, and J. Flinn. Respec: efficient online multiprocessor replay via speculation and external determinism. In Proceedings of the 15th international conference on architectural support for programming languages and operating systems, ASPLOS '10, pages 77--90. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. T. Liu, C. Curtsinger, and E. D. Berger. Dthreads: efficient deterministic multithreading. In Proceedings of the 23rd ACM symposium on operating systems principles, SOSP '11, pages 327--336. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M. Olszewski, J. Ansel, and S. Amarasinghe. Kendo: efficient deterministic multithreading in software. In Proceedings of the 14th international conference on architectural support for programming languages and operating systems, ASPLOS '09, pages 97--108. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Park, Y. Zhou, W. Xiong, Z. Yin, R. Kaushik, K. H. Lee, and S. Lu. PRES: probabilistic replay with execution sketching on multiprocessors. In Proceedings of the 22nd ACM symposium on operating systems principles, SOSP '09, pages 177--192. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. F. Pedone and A. Schiper. Generic broadcast. In Proceedings of the 13th international symposium on distributed computing, DISC '99, pages 94--106. Springer Verlag, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. Ronsse and K. De Bosschere. RecPlay: a fully integrated practical record/replay system. ACM transaction on computer systems, 17(2):133--152, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. F. B. Schneider. Implementing fault-tolerant services using the state machine approach: a tutorial. ACM computer survey, 22(4):299--319, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. K. Tadeusz, K. Maciej, and T. W. Pawel. Hybrid replication: state-machine-based and deferred-update replication schemes combined. In Proceedings of the 33rd international conference on distributed computing systems, ICDCS '13, pages 286--296. IEEE, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. R. van Renesse and F. B. Schneider. Chain replication for supporting high throughput and availability. In Proceedings of the 6th USENIX symposium on operating systems design and implementation, OSDI'04, pages 7--7. USENIX, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. K. Veeraraghavan, P. M. Chen, J. Flinn, and S. Narayanasamy. Detecting and surviving data races using complementary schedules. In Proceedings of the 23rd ACM symposium on operating systems principles, SOSP '11, pages 369--384. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. K. Veeraraghavan, D. Lee, B. Wester, J. Ouyang, P. M. Chen, J. Flinn, and S. Narayanasamy. DoublePlay: parallelizing sequential logging and replay. In Proceedings of the 16th international conference on architectural support for programming languages and operating systems, ASPLOS '11, pages 15--26. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. W. Xiong, S. Park, J. Zhang, Y. Zhou, and Z. Ma. Adhoc synchronization considered harmful. In Proceedings of the 9th USENIX conference on operating systems design and implementation, OSDI'10, pages 1--8. USENIX, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Rex: replication at the speed of multi-core

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        EuroSys '14: Proceedings of the Ninth European Conference on Computer Systems
        April 2014
        388 pages
        ISBN:9781450327046
        DOI:10.1145/2592798

        Copyright © 2014 ACM

        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 14 April 2014

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        EuroSys '14 Paper Acceptance Rate27of147submissions,18%Overall Acceptance Rate241of1,308submissions,18%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader