skip to main content
10.1145/2043556.2043592acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections
research-article

Transactional storage for geo-replicated systems

Published:23 October 2011Publication History

ABSTRACT

We describe the design and implementation of Walter, a key-value store that supports transactions and replicates data across distant sites. A key feature behind Walter is a new property called Parallel Snapshot Isolation (PSI). PSI allows Walter to replicate data asynchronously, while providing strong guarantees within each site. PSI precludes write-write conflicts, so that developers need not worry about conflict-resolution logic. To prevent write-write conflicts and implement PSI, Walter uses two new and simple techniques: preferred sites and counting sets. We use Walter to build a social networking application and port a Twitter-like application.

References

  1. Redis: an open-source advanced key-value store. http://redis.io.Google ScholarGoogle Scholar
  2. A Twitter clone for the Redis key-value database. http://retwis.antirez.com.Google ScholarGoogle Scholar
  3. M. K. Aguilera, W. Golab, and M. A. Shah. A practical scalable distributed B-tree. In International Conference on Very Large Data Bases, pages 598--609, Aug. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. K. Aguilera, A. Merchant, M. Shah, A. Veitch, and C. Karamanolis. Sinfonia: A new paradigm for building scalable distributed systems. ACM Transactions on Computer Systems, 27(3):5:1--5:48, Nov. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. Alonso, D. Barbará, and H. Garcia-Molina. Data caching issues in an information retrieval system. ACM Transactions on Database Systems, 15(3):359--384, Sept. 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. T. Anderson, Y. Breitbart, H. F. Korth, and A. Wool. Replication, consistency, and practicality: are these mutually exclusive? In International Conference on Management of Data, pages 484--495, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. Baker et al. Megastore: Providing scalable, highly available storage for interactive services. In 5th Conference on Innovative Data Systems Research, pages 223--234, Jan. 2011.Google ScholarGoogle Scholar
  8. H. Berenson et al. A critique of ANSI SQL isolation levels. In International Conference on Management of Data, pages 1--10, May 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. P. A. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency Control and Recovery in Database Systems. Addison-Wesley, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Y. Breitbart et al. Update propagation protocols for replicated databases. In International Conference on Management of Data, pages 97--108, June 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Y. Breitbart and H. F. Korth. Replication and consistency in a distributed environment. Journal of Computer and System Sciences, 59(1):29--69, Aug. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. F. Chang et al. Bigtable: A distributed storage system for structured data. In Symposium on Operating Systems Design and Implementation, pages 205--218, Nov. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Chundi, D. J. Rosenkrantz, and S. S. Ravi. Deferred updates and data placement in distributed databases. In International Conference on Data Engineering, pages 469--476, Feb. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. B. F. Cooper et al. PNUTS: Yahoo!'s hosted data serving platform. In International Conference on Very Large Data Bases, pages 1277--1288, Aug. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. K. Daudjee and K. Salem. Lazy database replication with snapshot isolation. In International Conference on Very Large Data Bases, pages 715--726, Sept. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. G. DeCandia et al. Dynamo: Amazon's highly available key-value store. In ACM Symposium on Operating Systems Principles, pages 205--220, Oct. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Elnikety, S. Dropsho, and F. Pedone. Tashkent: Uniting durability with transaction ordering for high-performance scalable database replication. In European Conference on Computer Systems, pages 117--130, Apr. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Elnikety, S. Dropsho, and W. Zwaenepoel. Tashkent+: Memory-aware load balancing and update filtering in replicated databases. In European Conference on Computer Systems, pages 399--412. Mar. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. P. Ferreira et al. Perdis: design, implementation, and use of a persistent distributed store. In Recent Advances in Distributed Systems, volume 1752 of LNCS, chapter 18. Springer-Verlag, Feb. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. H. Garcia-Molina. Using semantic knowledge for transaction processing in a distributed database. ACM Transactions on Database Systems, 8(2):186--213, June 1983. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google file system. In ACM Symposium on Operating Systems Principles, pages 29--43, Oct. 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Gilbert and N. Lynch. Brewer's conjecture and the feasibility of consistent, available, partition tolerant web services. ACM SIGACT News, 33(2):51--59, June 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Gray, P. Helland, P. O'Neil, and D. Shasha. The dangers of replication and a solution. In International Conference on Management of Data, pages 173--182, June 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. J. Gray and A. Reuter. Transaction processing: concepts and techniques. Morgan Kaufmann Publishers, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. T. J. Green, Z. G. Ives, and V. Tannen. Reconcilable differences. In International Conference on Digital Telecommunications, pages 212--224, Mar. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. H. Guo et al. Relaxed currency and consistency: How to say "good enough" in SQL. In International Conference on Management of Data, pages 815--826, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. B. Kemme and G. Alonso. A new approach to developing and implementing eager database replication protocols. ACM Transactions on Database Systems, 25(3):333--379, Sept. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. E. K. Lee and C. A. Thekkath. Petal: Distributed virtual disks. In International Conference on Architectural Support for Programming Languages and Operating Systems, pages 84--92, Oct. 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. Letia, N. Preguiça, and M. Shapiro. Consistency without concurrency control in large, dynamic systems. In International Workshop on Large Scale Distributed Systems and Middleware, Oct. 2009.Google ScholarGoogle Scholar
  30. Y. Lin, B. Kemme, M. P. no Martínez, and R. Jiménez-Peris. Middleware based data replication providing snapshot isolation. In International Conference on Management of Data, pages 419--430, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. W. Lloyd, M. Freedman, M. Kaminsky, and D. Andersen. Don't settle for eventual: Stronger consistency for wide-area storage with cops. In ACM Symposium on Operating Systems Principles, Oct. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. N. A. Lynch. Distributed Algorithms. Morgan Kaufmann Publishers, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. D. Mazières and D. Shasha. Building secure file systems out of byzantine storage. In ACM Symposium on Principles of Distributed Computing, pages 108--117, July 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. L. B. Mummert, M. R. Eblig, and M. Satyanarayanan. Exploiting weak connectivity for mobile file access. In ACM Symposium on Operating Systems Principles, pages 143--155, Dec. 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. P. E. O'Neil. The escrow transactional method. ACM Transactions on Database Systems, 11(4):405--430, Dec. 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. E. Pacitti, P. Minet, and E. Simon. Fast algorithms for maintaining replica consistency in lazy master replicated databases. In International Conference on Very Large Data Bases, pages 126--137, Sept. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. M. Patino-Martinez, R. Jiménez-Peris, B. Kemme, and G. Alonso. MIDDLE-R: Consistent database replication at the middleware level. ACM Transactions on Computer Systems, 23(4):375--423, Nov. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. D. Peng and F. Dabek. Large-scale incremental processing using distributed transactions and notifications. In Symposium on Operating Systems Design and Implementation, pages 251--264, Oct. 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. C. Plattner and G. Alonso. Ganymed: Scalable replication for transactional web applications. In International Middleware Conference, pages 155--174, Oct. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Y. Saito et al. FAB: building distributed enterprise disk arrays from commodity components. In International Conference on Architectural Support for Programming Languages and Operating Systems, pages 48--58, Oct. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. R. Schenkel et al. Federated transaction management with snapshot isolation. In Workshop on Foundations of Models and Languages for Data and Objects, Transactions and Database Dynamics, pages 1--25, Sept. 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. P. Schwarz and A. Spector. Synchronizing shared abstract types. ACM Transactions on Computer Systems, 2(3):223--250, Aug. 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. M. Shapiro, N. Preguiça, C. Baquero, and M. Zawirski. Conflict-free replicated data types. In International Symposium on Stabilization, Safety, and Security of Distributed Systems, Oct. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. A. Singh, P. Fonseca, P. Kuznetsov, R. Rodrigues, and P. Maniatis. Zeno: Eventually consistent byzantine fault tolerance. In Symposium on Networked Systems Design and Implementation, pages 169--184, Apr. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. M. Stonebraker et al. Mariposa: a wide-area distributed database system. In International Conference on Very Large Data Bases, pages 48--63, Jan. 1996.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. J. Stribling, Y. Sovran, I. Zhang, X. Pretzer, J. Li, F. Kaashoek, and R. Morris. Simplifying wide-area application development with WheelFS. In Symposium on Networked Systems Design and Implementation, pages 43--58, Apr. 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. D. B. Terry et al. Managing update conflicts in Bayou, a weakly connected replicated storage system. In ACM Symposium on Operating Systems Principles, pages 172--183, Dec. 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. C. A. Thekkath, T. Mann, and E. K. Lee. Frangipani: A scalable distributed file system. In ACM Symposium on Operating Systems Principles, pages 224--237, Oct. 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. W. Vogels. Data access patterns in the amazon.com technology platform. In International Conference on Very Large Data Bases, page 1, Sept, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. W. Weihl. Commutativity-based concurrency control for abstract data types. IEEE Transactions on Computers, 37(12):1488--1505, Dec. 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. G. Weikum and G. Vossen. Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery. Morgan Kaufmann, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. S. Wu and B. Kemme. Postgres-R(SI): Combining replica control with concurrency control based on snapshot isolation. In International Conference on Data Engineering, pages 422--433, Apr. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. G. T. J. Wuu and A. J. Bernstein. Efficient solutions to the replicated log and dictionart problems. In ACM Symposium on Principles of Distributed Computing, pages 233--242, Aug. 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. H. Yu and A. Vahdat. Design and evaluation of a conit-based continuous consistency model for replicated services. ACM Transactions on Computer Systems, 20(3):239--282, Aug. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Transactional storage for geo-replicated systems

                Recommendations

                Comments

                Login options

                Check if you have access through your login credentials or your institution to get full access on this article.

                Sign in
                • Published in

                  cover image ACM Conferences
                  SOSP '11: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
                  October 2011
                  417 pages
                  ISBN:9781450309776
                  DOI:10.1145/2043556

                  Copyright © 2011 ACM

                  Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

                  Publisher

                  Association for Computing Machinery

                  New York, NY, United States

                  Publication History

                  • Published: 23 October 2011

                  Permissions

                  Request permissions about this article.

                  Request Permissions

                  Check for updates

                  Qualifiers

                  • research-article

                  Acceptance Rates

                  Overall Acceptance Rate131of716submissions,18%

                  Upcoming Conference

                  SOSP '24

                PDF Format

                View or Download as a PDF file.

                PDF

                eReader

                View online with eReader.

                eReader