ABSTRACT
We describe the design and implementation of Walter, a key-value store that supports transactions and replicates data across distant sites. A key feature behind Walter is a new property called Parallel Snapshot Isolation (PSI). PSI allows Walter to replicate data asynchronously, while providing strong guarantees within each site. PSI precludes write-write conflicts, so that developers need not worry about conflict-resolution logic. To prevent write-write conflicts and implement PSI, Walter uses two new and simple techniques: preferred sites and counting sets. We use Walter to build a social networking application and port a Twitter-like application.
- Redis: an open-source advanced key-value store. http://redis.io.Google Scholar
- A Twitter clone for the Redis key-value database. http://retwis.antirez.com.Google Scholar
- M. K. Aguilera, W. Golab, and M. A. Shah. A practical scalable distributed B-tree. In International Conference on Very Large Data Bases, pages 598--609, Aug. 2008. Google ScholarDigital Library
- M. K. Aguilera, A. Merchant, M. Shah, A. Veitch, and C. Karamanolis. Sinfonia: A new paradigm for building scalable distributed systems. ACM Transactions on Computer Systems, 27(3):5:1--5:48, Nov. 2009. Google ScholarDigital Library
- R. Alonso, D. Barbará, and H. Garcia-Molina. Data caching issues in an information retrieval system. ACM Transactions on Database Systems, 15(3):359--384, Sept. 1990. Google ScholarDigital Library
- T. Anderson, Y. Breitbart, H. F. Korth, and A. Wool. Replication, consistency, and practicality: are these mutually exclusive? In International Conference on Management of Data, pages 484--495, June 1998. Google ScholarDigital Library
- J. Baker et al. Megastore: Providing scalable, highly available storage for interactive services. In 5th Conference on Innovative Data Systems Research, pages 223--234, Jan. 2011.Google Scholar
- H. Berenson et al. A critique of ANSI SQL isolation levels. In International Conference on Management of Data, pages 1--10, May 1995. Google ScholarDigital Library
- P. A. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency Control and Recovery in Database Systems. Addison-Wesley, 1987. Google ScholarDigital Library
- Y. Breitbart et al. Update propagation protocols for replicated databases. In International Conference on Management of Data, pages 97--108, June 1999. Google ScholarDigital Library
- Y. Breitbart and H. F. Korth. Replication and consistency in a distributed environment. Journal of Computer and System Sciences, 59(1):29--69, Aug. 1999. Google ScholarDigital Library
- F. Chang et al. Bigtable: A distributed storage system for structured data. In Symposium on Operating Systems Design and Implementation, pages 205--218, Nov. 2006. Google ScholarDigital Library
- P. Chundi, D. J. Rosenkrantz, and S. S. Ravi. Deferred updates and data placement in distributed databases. In International Conference on Data Engineering, pages 469--476, Feb. 1996. Google ScholarDigital Library
- B. F. Cooper et al. PNUTS: Yahoo!'s hosted data serving platform. In International Conference on Very Large Data Bases, pages 1277--1288, Aug. 2008. Google ScholarDigital Library
- K. Daudjee and K. Salem. Lazy database replication with snapshot isolation. In International Conference on Very Large Data Bases, pages 715--726, Sept. 2006. Google ScholarDigital Library
- G. DeCandia et al. Dynamo: Amazon's highly available key-value store. In ACM Symposium on Operating Systems Principles, pages 205--220, Oct. 2007. Google ScholarDigital Library
- S. Elnikety, S. Dropsho, and F. Pedone. Tashkent: Uniting durability with transaction ordering for high-performance scalable database replication. In European Conference on Computer Systems, pages 117--130, Apr. 2006. Google ScholarDigital Library
- S. Elnikety, S. Dropsho, and W. Zwaenepoel. Tashkent+: Memory-aware load balancing and update filtering in replicated databases. In European Conference on Computer Systems, pages 399--412. Mar. 2007. Google ScholarDigital Library
- P. Ferreira et al. Perdis: design, implementation, and use of a persistent distributed store. In Recent Advances in Distributed Systems, volume 1752 of LNCS, chapter 18. Springer-Verlag, Feb. 2000. Google ScholarDigital Library
- H. Garcia-Molina. Using semantic knowledge for transaction processing in a distributed database. ACM Transactions on Database Systems, 8(2):186--213, June 1983. Google ScholarDigital Library
- S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google file system. In ACM Symposium on Operating Systems Principles, pages 29--43, Oct. 2003. Google ScholarDigital Library
- S. Gilbert and N. Lynch. Brewer's conjecture and the feasibility of consistent, available, partition tolerant web services. ACM SIGACT News, 33(2):51--59, June 2002. Google ScholarDigital Library
- J. Gray, P. Helland, P. O'Neil, and D. Shasha. The dangers of replication and a solution. In International Conference on Management of Data, pages 173--182, June 1996. Google ScholarDigital Library
- J. Gray and A. Reuter. Transaction processing: concepts and techniques. Morgan Kaufmann Publishers, 1993. Google ScholarDigital Library
- T. J. Green, Z. G. Ives, and V. Tannen. Reconcilable differences. In International Conference on Digital Telecommunications, pages 212--224, Mar. 2009. Google ScholarDigital Library
- H. Guo et al. Relaxed currency and consistency: How to say "good enough" in SQL. In International Conference on Management of Data, pages 815--826, June 2004. Google ScholarDigital Library
- B. Kemme and G. Alonso. A new approach to developing and implementing eager database replication protocols. ACM Transactions on Database Systems, 25(3):333--379, Sept. 2000. Google ScholarDigital Library
- E. K. Lee and C. A. Thekkath. Petal: Distributed virtual disks. In International Conference on Architectural Support for Programming Languages and Operating Systems, pages 84--92, Oct. 1996. Google ScholarDigital Library
- M. Letia, N. Preguiça, and M. Shapiro. Consistency without concurrency control in large, dynamic systems. In International Workshop on Large Scale Distributed Systems and Middleware, Oct. 2009.Google Scholar
- Y. Lin, B. Kemme, M. P. no Martínez, and R. Jiménez-Peris. Middleware based data replication providing snapshot isolation. In International Conference on Management of Data, pages 419--430, June 2005. Google ScholarDigital Library
- W. Lloyd, M. Freedman, M. Kaminsky, and D. Andersen. Don't settle for eventual: Stronger consistency for wide-area storage with cops. In ACM Symposium on Operating Systems Principles, Oct. 2011. Google ScholarDigital Library
- N. A. Lynch. Distributed Algorithms. Morgan Kaufmann Publishers, 1996. Google ScholarDigital Library
- D. Mazières and D. Shasha. Building secure file systems out of byzantine storage. In ACM Symposium on Principles of Distributed Computing, pages 108--117, July 2002. Google ScholarDigital Library
- L. B. Mummert, M. R. Eblig, and M. Satyanarayanan. Exploiting weak connectivity for mobile file access. In ACM Symposium on Operating Systems Principles, pages 143--155, Dec. 1995. Google ScholarDigital Library
- P. E. O'Neil. The escrow transactional method. ACM Transactions on Database Systems, 11(4):405--430, Dec. 1986. Google ScholarDigital Library
- E. Pacitti, P. Minet, and E. Simon. Fast algorithms for maintaining replica consistency in lazy master replicated databases. In International Conference on Very Large Data Bases, pages 126--137, Sept. 1999. Google ScholarDigital Library
- M. Patino-Martinez, R. Jiménez-Peris, B. Kemme, and G. Alonso. MIDDLE-R: Consistent database replication at the middleware level. ACM Transactions on Computer Systems, 23(4):375--423, Nov. 2005. Google ScholarDigital Library
- D. Peng and F. Dabek. Large-scale incremental processing using distributed transactions and notifications. In Symposium on Operating Systems Design and Implementation, pages 251--264, Oct. 2010. Google ScholarDigital Library
- C. Plattner and G. Alonso. Ganymed: Scalable replication for transactional web applications. In International Middleware Conference, pages 155--174, Oct. 2004. Google ScholarDigital Library
- Y. Saito et al. FAB: building distributed enterprise disk arrays from commodity components. In International Conference on Architectural Support for Programming Languages and Operating Systems, pages 48--58, Oct. 2004. Google ScholarDigital Library
- R. Schenkel et al. Federated transaction management with snapshot isolation. In Workshop on Foundations of Models and Languages for Data and Objects, Transactions and Database Dynamics, pages 1--25, Sept. 1999. Google ScholarDigital Library
- P. Schwarz and A. Spector. Synchronizing shared abstract types. ACM Transactions on Computer Systems, 2(3):223--250, Aug. 1984. Google ScholarDigital Library
- M. Shapiro, N. Preguiça, C. Baquero, and M. Zawirski. Conflict-free replicated data types. In International Symposium on Stabilization, Safety, and Security of Distributed Systems, Oct. 2011. Google ScholarDigital Library
- A. Singh, P. Fonseca, P. Kuznetsov, R. Rodrigues, and P. Maniatis. Zeno: Eventually consistent byzantine fault tolerance. In Symposium on Networked Systems Design and Implementation, pages 169--184, Apr. 2009. Google ScholarDigital Library
- M. Stonebraker et al. Mariposa: a wide-area distributed database system. In International Conference on Very Large Data Bases, pages 48--63, Jan. 1996.Google ScholarDigital Library
- J. Stribling, Y. Sovran, I. Zhang, X. Pretzer, J. Li, F. Kaashoek, and R. Morris. Simplifying wide-area application development with WheelFS. In Symposium on Networked Systems Design and Implementation, pages 43--58, Apr. 2009. Google ScholarDigital Library
- D. B. Terry et al. Managing update conflicts in Bayou, a weakly connected replicated storage system. In ACM Symposium on Operating Systems Principles, pages 172--183, Dec. 1995. Google ScholarDigital Library
- C. A. Thekkath, T. Mann, and E. K. Lee. Frangipani: A scalable distributed file system. In ACM Symposium on Operating Systems Principles, pages 224--237, Oct. 1997. Google ScholarDigital Library
- W. Vogels. Data access patterns in the amazon.com technology platform. In International Conference on Very Large Data Bases, page 1, Sept, 2007. Google ScholarDigital Library
- W. Weihl. Commutativity-based concurrency control for abstract data types. IEEE Transactions on Computers, 37(12):1488--1505, Dec. 1988. Google ScholarDigital Library
- G. Weikum and G. Vossen. Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery. Morgan Kaufmann, 2009. Google ScholarDigital Library
- S. Wu and B. Kemme. Postgres-R(SI): Combining replica control with concurrency control based on snapshot isolation. In International Conference on Data Engineering, pages 422--433, Apr. 2005. Google ScholarDigital Library
- G. T. J. Wuu and A. J. Bernstein. Efficient solutions to the replicated log and dictionart problems. In ACM Symposium on Principles of Distributed Computing, pages 233--242, Aug. 1984. Google ScholarDigital Library
- H. Yu and A. Vahdat. Design and evaluation of a conit-based continuous consistency model for replicated services. ACM Transactions on Computer Systems, 20(3):239--282, Aug. 2002. Google ScholarDigital Library
Index Terms
- Transactional storage for geo-replicated systems
Recommendations
Unbounded page-based transactional memory
Proceedings of the 2006 ASPLOS ConferenceExploiting thread level parallelism is paramount in the multicore era. Transactions enable programmers to expose such parallelism by greatly simplifying the multi-threaded programming model. Virtualized transactions (unbounded in space and time) are ...
Time-Based Software Transactional Memory
Software transactional memory (STM) is a concurrency control mechanism that is widely considered to be easier to use by programmers than other mechanisms such as locking. The first generations of STMs have either relied on visible read designs, which ...
Hybrid transactional memory
PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programmingHigh performance parallel programs are currently difficult to write and debug. One major source of difficulty is protecting concurrent accesses to shared data with an appropriate synchronization mechanism. Locks are the most common mechanism but they ...
Comments