research-article

Transactional storage for geo-replicated systems

Authors:
Yair Sovran

New York University

New York University
View Profile

,
Russell Power

New York University

New York University
View Profile

,
Marcos K. Aguilera

Microsoft Research Silicon Valley

Microsoft Research Silicon Valley
View Profile

,
Jinyang Li

New York University

New York University
View Profile

SOSP '11: Proceedings of the Twenty-Third ACM Symposium on Operating Systems PrinciplesOctober 2011Pages 385–400https://doi.org/10.1145/2043556.2043592

Published:23 October 2011Publication History

SOSP '11: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

Pages 385–400

ABSTRACT

We describe the design and implementation of Walter, a key-value store that supports transactions and replicates data across distant sites. A key feature behind Walter is a new property called Parallel Snapshot Isolation (PSI). PSI allows Walter to replicate data asynchronously, while providing strong guarantees within each site. PSI precludes write-write conflicts, so that developers need not worry about conflict-resolution logic. To prevent write-write conflicts and implement PSI, Walter uses two new and simple techniques: preferred sites and counting sets. We use Walter to build a social networking application and port a Twitter-like application.

References

Redis: an open-source advanced key-value store. http://redis.io.Google Scholar
A Twitter clone for the Redis key-value database. http://retwis.antirez.com.Google Scholar
M. K. Aguilera, W. Golab, and M. A. Shah. A practical scalable distributed B-tree. In International Conference on Very Large Data Bases, pages 598--609, Aug. 2008. Google ScholarDigital Library
M. K. Aguilera, A. Merchant, M. Shah, A. Veitch, and C. Karamanolis. Sinfonia: A new paradigm for building scalable distributed systems. ACM Transactions on Computer Systems, 27(3):5:1--5:48, Nov. 2009. Google ScholarDigital Library
R. Alonso, D. Barbará, and H. Garcia-Molina. Data caching issues in an information retrieval system. ACM Transactions on Database Systems, 15(3):359--384, Sept. 1990. Google ScholarDigital Library
T. Anderson, Y. Breitbart, H. F. Korth, and A. Wool. Replication, consistency, and practicality: are these mutually exclusive? In International Conference on Management of Data, pages 484--495, June 1998. Google ScholarDigital Library
J. Baker et al. Megastore: Providing scalable, highly available storage for interactive services. In 5th Conference on Innovative Data Systems Research, pages 223--234, Jan. 2011.Google Scholar
H. Berenson et al. A critique of ANSI SQL isolation levels. In International Conference on Management of Data, pages 1--10, May 1995. Google ScholarDigital Library
P. A. Bernstein, V. Hadzilacos, and N. Goodman. Concurrency Control and Recovery in Database Systems. Addison-Wesley, 1987. Google ScholarDigital Library
Y. Breitbart et al. Update propagation protocols for replicated databases. In International Conference on Management of Data, pages 97--108, June 1999. Google ScholarDigital Library
Y. Breitbart and H. F. Korth. Replication and consistency in a distributed environment. Journal of Computer and System Sciences, 59(1):29--69, Aug. 1999. Google ScholarDigital Library
F. Chang et al. Bigtable: A distributed storage system for structured data. In Symposium on Operating Systems Design and Implementation, pages 205--218, Nov. 2006. Google ScholarDigital Library
P. Chundi, D. J. Rosenkrantz, and S. S. Ravi. Deferred updates and data placement in distributed databases. In International Conference on Data Engineering, pages 469--476, Feb. 1996. Google ScholarDigital Library
B. F. Cooper et al. PNUTS: Yahoo!'s hosted data serving platform. In International Conference on Very Large Data Bases, pages 1277--1288, Aug. 2008. Google ScholarDigital Library
K. Daudjee and K. Salem. Lazy database replication with snapshot isolation. In International Conference on Very Large Data Bases, pages 715--726, Sept. 2006. Google ScholarDigital Library
G. DeCandia et al. Dynamo: Amazon's highly available key-value store. In ACM Symposium on Operating Systems Principles, pages 205--220, Oct. 2007. Google ScholarDigital Library
S. Elnikety, S. Dropsho, and F. Pedone. Tashkent: Uniting durability with transaction ordering for high-performance scalable database replication. In European Conference on Computer Systems, pages 117--130, Apr. 2006. Google ScholarDigital Library
S. Elnikety, S. Dropsho, and W. Zwaenepoel. Tashkent+: Memory-aware load balancing and update filtering in replicated databases. In European Conference on Computer Systems, pages 399--412. Mar. 2007. Google ScholarDigital Library
P. Ferreira et al. Perdis: design, implementation, and use of a persistent distributed store. In Recent Advances in Distributed Systems, volume 1752 of LNCS, chapter 18. Springer-Verlag, Feb. 2000. Google ScholarDigital Library
H. Garcia-Molina. Using semantic knowledge for transaction processing in a distributed database. ACM Transactions on Database Systems, 8(2):186--213, June 1983. Google ScholarDigital Library
S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google file system. In ACM Symposium on Operating Systems Principles, pages 29--43, Oct. 2003. Google ScholarDigital Library
S. Gilbert and N. Lynch. Brewer's conjecture and the feasibility of consistent, available, partition tolerant web services. ACM SIGACT News, 33(2):51--59, June 2002. Google ScholarDigital Library
J. Gray, P. Helland, P. O'Neil, and D. Shasha. The dangers of replication and a solution. In International Conference on Management of Data, pages 173--182, June 1996. Google ScholarDigital Library
J. Gray and A. Reuter. Transaction processing: concepts and techniques. Morgan Kaufmann Publishers, 1993. Google ScholarDigital Library
T. J. Green, Z. G. Ives, and V. Tannen. Reconcilable differences. In International Conference on Digital Telecommunications, pages 212--224, Mar. 2009. Google ScholarDigital Library
H. Guo et al. Relaxed currency and consistency: How to say "good enough" in SQL. In International Conference on Management of Data, pages 815--826, June 2004. Google ScholarDigital Library
B. Kemme and G. Alonso. A new approach to developing and implementing eager database replication protocols. ACM Transactions on Database Systems, 25(3):333--379, Sept. 2000. Google ScholarDigital Library
E. K. Lee and C. A. Thekkath. Petal: Distributed virtual disks. In International Conference on Architectural Support for Programming Languages and Operating Systems, pages 84--92, Oct. 1996. Google ScholarDigital Library
M. Letia, N. Preguiça, and M. Shapiro. Consistency without concurrency control in large, dynamic systems. In International Workshop on Large Scale Distributed Systems and Middleware, Oct. 2009.Google Scholar
Y. Lin, B. Kemme, M. P. no Martínez, and R. Jiménez-Peris. Middleware based data replication providing snapshot isolation. In International Conference on Management of Data, pages 419--430, June 2005. Google ScholarDigital Library
W. Lloyd, M. Freedman, M. Kaminsky, and D. Andersen. Don't settle for eventual: Stronger consistency for wide-area storage with cops. In ACM Symposium on Operating Systems Principles, Oct. 2011. Google ScholarDigital Library
N. A. Lynch. Distributed Algorithms. Morgan Kaufmann Publishers, 1996. Google ScholarDigital Library
D. Mazières and D. Shasha. Building secure file systems out of byzantine storage. In ACM Symposium on Principles of Distributed Computing, pages 108--117, July 2002. Google ScholarDigital Library
L. B. Mummert, M. R. Eblig, and M. Satyanarayanan. Exploiting weak connectivity for mobile file access. In ACM Symposium on Operating Systems Principles, pages 143--155, Dec. 1995. Google ScholarDigital Library
P. E. O'Neil. The escrow transactional method. ACM Transactions on Database Systems, 11(4):405--430, Dec. 1986. Google ScholarDigital Library
E. Pacitti, P. Minet, and E. Simon. Fast algorithms for maintaining replica consistency in lazy master replicated databases. In International Conference on Very Large Data Bases, pages 126--137, Sept. 1999. Google ScholarDigital Library
M. Patino-Martinez, R. Jiménez-Peris, B. Kemme, and G. Alonso. MIDDLE-R: Consistent database replication at the middleware level. ACM Transactions on Computer Systems, 23(4):375--423, Nov. 2005. Google ScholarDigital Library
D. Peng and F. Dabek. Large-scale incremental processing using distributed transactions and notifications. In Symposium on Operating Systems Design and Implementation, pages 251--264, Oct. 2010. Google ScholarDigital Library
C. Plattner and G. Alonso. Ganymed: Scalable replication for transactional web applications. In International Middleware Conference, pages 155--174, Oct. 2004. Google ScholarDigital Library
Y. Saito et al. FAB: building distributed enterprise disk arrays from commodity components. In International Conference on Architectural Support for Programming Languages and Operating Systems, pages 48--58, Oct. 2004. Google ScholarDigital Library
R. Schenkel et al. Federated transaction management with snapshot isolation. In Workshop on Foundations of Models and Languages for Data and Objects, Transactions and Database Dynamics, pages 1--25, Sept. 1999. Google ScholarDigital Library
P. Schwarz and A. Spector. Synchronizing shared abstract types. ACM Transactions on Computer Systems, 2(3):223--250, Aug. 1984. Google ScholarDigital Library
M. Shapiro, N. Preguiça, C. Baquero, and M. Zawirski. Conflict-free replicated data types. In International Symposium on Stabilization, Safety, and Security of Distributed Systems, Oct. 2011. Google ScholarDigital Library
A. Singh, P. Fonseca, P. Kuznetsov, R. Rodrigues, and P. Maniatis. Zeno: Eventually consistent byzantine fault tolerance. In Symposium on Networked Systems Design and Implementation, pages 169--184, Apr. 2009. Google ScholarDigital Library
M. Stonebraker et al. Mariposa: a wide-area distributed database system. In International Conference on Very Large Data Bases, pages 48--63, Jan. 1996.Google ScholarDigital Library
J. Stribling, Y. Sovran, I. Zhang, X. Pretzer, J. Li, F. Kaashoek, and R. Morris. Simplifying wide-area application development with WheelFS. In Symposium on Networked Systems Design and Implementation, pages 43--58, Apr. 2009. Google ScholarDigital Library
D. B. Terry et al. Managing update conflicts in Bayou, a weakly connected replicated storage system. In ACM Symposium on Operating Systems Principles, pages 172--183, Dec. 1995. Google ScholarDigital Library
C. A. Thekkath, T. Mann, and E. K. Lee. Frangipani: A scalable distributed file system. In ACM Symposium on Operating Systems Principles, pages 224--237, Oct. 1997. Google ScholarDigital Library
W. Vogels. Data access patterns in the amazon.com technology platform. In International Conference on Very Large Data Bases, page 1, Sept, 2007. Google ScholarDigital Library
W. Weihl. Commutativity-based concurrency control for abstract data types. IEEE Transactions on Computers, 37(12):1488--1505, Dec. 1988. Google ScholarDigital Library
G. Weikum and G. Vossen. Transactional Information Systems: Theory, Algorithms, and the Practice of Concurrency Control and Recovery. Morgan Kaufmann, 2009. Google ScholarDigital Library
S. Wu and B. Kemme. Postgres-R(SI): Combining replica control with concurrency control based on snapshot isolation. In International Conference on Data Engineering, pages 422--433, Apr. 2005. Google ScholarDigital Library
G. T. J. Wuu and A. J. Bernstein. Efficient solutions to the replicated log and dictionart problems. In ACM Symposium on Principles of Distributed Computing, pages 233--242, Aug. 1984. Google ScholarDigital Library
H. Yu and A. Vahdat. Design and evaluation of a conit-based continuous consistency model for replicated services. ACM Transactions on Computer Systems, 20(3):239--282, Aug. 2002. Google ScholarDigital Library

Index Terms

Recommendations

Unbounded page-based transactional memory
Proceedings of the 2006 ASPLOS Conference

Exploiting thread level parallelism is paramount in the multicore era. Transactions enable programmers to expose such parallelism by greatly simplifying the multi-threaded programming model. Virtualized transactions (unbounded in space and time) are ...
Read More
Time-Based Software Transactional Memory

Software transactional memory (STM) is a concurrency control mechanism that is widely considered to be easier to use by programmers than other mechanisms such as locking. The first generations of STMs have either relied on visible read designs, which ...
Read More
Hybrid transactional memory
PPoPP '06: Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming

High performance parallel programs are currently difficult to write and debug. One major source of difficulty is protecting concurrent accesses to shared data with an appropriate synchronization mechanism. Locks are the most common mechanism but they ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
SOSP '11: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
October 2011
417 pages
ISBN:9781450309776
DOI:10.1145/2043556
General Chair:
Ted Wobber
MSR Silicon Valley
,
Program Chair:
Peter Druschel
MPI-SWS
Copyright © 2011 ACM
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 23 October 2011
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
asynchronous replication
distributed storage
geo-distributed systems
key-value store
parallel snapshot isolation
transactions
Qualifiers
- research-article
Conference

Acceptance Rates
Overall Acceptance Rate131of716submissions,18%
Upcoming Conference
SOSP '24

Sponsor:

sigops

ACM SIGOPS 29th Symposium on Operating Systems Principles

November 5 - 8, 2024

Austin , TX , USA
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 281
  Total Citations
  View Citations
- 1,427
  Total Downloads
- Downloads (Last 12 months)82
- Downloads (Last 6 weeks)7
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Transactional storage for geo-replicated systems

SOSP '11: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

ABSTRACT

References

Cited By

Index Terms

Recommendations

Unbounded page-based transactional memory

Time-Based Software Transactional Memory

Hybrid transactional memory

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Transactional storage for geo-replicated systems

SOSP '11: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles

ABSTRACT

References

Cited By

Index Terms

Recommendations

Unbounded page-based transactional memory

Time-Based Software Transactional Memory

Hybrid transactional memory

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Upcoming Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media