research-article

Blotter: Low Latency Transactions for Geo-Replicated Storage

Authors:
Henrique Moniz

Google, New York, NY, USA

Google, New York, NY, USA
View Profile

,
João Leitão

Universidade NOVA de Lisboa, Lisboa, Portugal

Universidade NOVA de Lisboa, Lisboa, Portugal
View Profile

,
Ricardo J. Dias

NOVA LINCS & SUSE Linux GmbH, Lisboa, Portugal

NOVA LINCS & SUSE Linux GmbH, Lisboa, Portugal
View Profile

,
Johannes Gehrke

Microsoft, Seattle, WA, USA

Microsoft, Seattle, WA, USA
View Profile

,
Nuno Preguiça

Universidade NOVA de Lisboa, Lisboa, Portugal

Universidade NOVA de Lisboa, Lisboa, Portugal
View Profile

,
Rodrigo Rodrigues

Universidade de Lisboa, Lisboa, Portugal

Universidade de Lisboa, Lisboa, Portugal
View Profile

WWW '17: Proceedings of the 26th International Conference on World Wide WebApril 2017Pages 263–272https://doi.org/10.1145/3038912.3052603

Published:03 April 2017Publication History

WWW '17: Proceedings of the 26th International Conference on World Wide Web

Pages 263–272

ABSTRACT

Most geo-replicated storage systems use weak consistency to avoid the performance penalty of coordinating replicas in different data centers. This departure from strong semantics poses problems to application programmers, who need to address the anomalies enabled by weak consistency. In this paper we use a recently proposed isolation level, called Non-Monotonic Snapshot Isolation, to achieve ACID transactions with low latency. To this end, we present Blotter, a geo-replicated system that leverages these semantics in the design of a new concurrency control protocol that leaves a small amount of local state during reads to make commits more efficient, which is combined with a configuration of Paxos that is tailored for good performance in wide area settings. Read operations always run on the local data center, and update transactions complete in a small number of message steps to a subset of the replicas. We implemented Blotter as an extension to Cassandra. Our experimental evaluation shows that Blotter has a small overhead at the data center scale, and performs better across data centers when compared with our implementations of the core Spanner protocol and of Snapshot Isolation on the same codebase.

References

S. Almeida, J. Leitão, and L. Rodrigues. Chain- reaction: A causal consistent datastore based on chain replication. In Proc. of 8th European Conference on Computer Systems, EuroSys'13, pages 85--98, 2013. Google ScholarDigital Library
R. Ananthanarayanan, V. Basker, S. Das, A. Gupta, H. Jiang, T. Qiu, A. Reznichenko, D. Ryabkov, M. Singh, and S. Venkataraman. Photon: Fault- tolerant and scalable joining of continuous data streams. In SIGMOD '13: Proc. of 2013 international conf. on Management of data, pages 577--588, 2013. Google ScholarDigital Library
J. Baker, C. Bond, J. C. Corbett, J. Furman, A. Khorlin, J. Larson, J.-M. Leon, Y. Li, A. Lloyd, and V. Yushprakh. Megastore: Providing scalable, highly available storage for interactive services. In Proc. of the Conference on Innovative Data system Research (CIDR), pages 223--234, 2011.Google Scholar
P. Bernstein and N. Goodman. Concurrency control in distributed database systems. ACM Computing Surveys, 13(2), January 1981. Google ScholarDigital Library
N. Bronson et al. Tao: Facebook rights distributed data store for the social graph. In Proc. of the 2013 USENIX Annual Technical Conference, pages 49--60, 2013. Google ScholarDigital Library
D. G. Campbell, G. Kakivaya, and N. Ellis. In Proc. of the 2010 ACM SIGMOD International Conference on Management of Data, pages 1021--1024. Google ScholarDigital Library
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst., 26(2):4:1--4:26, June 2008. Google ScholarDigital Library
B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with ycsb. In Proc. of the 1st ACM Symposium on Cloud Computing, pages 143--154, 2010. Google ScholarDigital Library
J. C. Corbett et al. Spanner: Google's globally-distributed database. In Proc. of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI'12, pages 251--264, 2012. Google ScholarDigital Library
G. DeCandia et al. In Proc. of the 21st ACM Symposium on Operating Systems Principles, pages 205--220.Google Scholar
A. K. Elmagarmid. A survey of distributed deadlock detection algorithms. SIGMOD Rec., 15(3):37--45, Sept. 1986. Google ScholarDigital Library
S. Elnikety, W. Zwaenepoel, and F. Pedone. Database replication using generalized snapshot isolation. In Proceedings of the 24th IEEE Symposium on Reliable Distributed Systems, SRDS '05, pages 73--84, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarDigital Library
L. Glendenning, I. Beschastnikh, A. Krishnamurthy, and T. Anderson. Scalable consistency in Scatter. In Proc. of the 23rd ACM Symposium on Operating Systems Principles, SOSP '11, pages 15--28, 2011. Google ScholarDigital Library
J. Gray and L. Lamport. Consensus on transaction commit. ACM Trans. Database Syst., 31(1):133--160, Mar. 2006. Google ScholarDigital Library
T. Hoff. Latency is everywhere and it costs you sales - how to crush it. Post at the High Scalability blog. http://tinyurl.com/5g8mp2, 2009.Google Scholar
T. Kraska, G. Pang, M. J. Franklin, S. Madden, and A. Fekete. Mdcc: Multi-data center consistency. In Proc. of the 8th ACM European Conference on Computer Systems, EuroSys '13, pages 113--126, 2013. Google ScholarDigital Library
A. Lakshman and P. Malik. Cassandra: A decentralized structured storage system. SIGOPS Oper. Syst. Rev., 44(2):35--40, Apr. 2010. Google ScholarDigital Library
L. Lamport. Time, clocks, and the ordering of events in a distributed system. Commun. ACM, 21(7):558--565, July 1978. Google ScholarDigital Library
L. Lamport. The part-time parliament. ACM Trans. Comput. Syst., 16(2):133--169, May 1998. Google ScholarDigital Library
L. Lamport, D. Malkhi, and L. Zhou. Reconfiguring a state machine. ACM SIGACT News, 41(1):63--73, Mar. 2010. Google ScholarDigital Library
W. Lloyd, M. J. Freedman, M. Kaminsky, and D. G. Andersen. In Proc. of the Twenty-Third ACM Symposium on Operating Systems Principles, pages 401--416. Google ScholarDigital Library
W. Lloyd, M. J. Freedman, M. Kaminsky, and D. G. Andersen. Stronger semantics for low-latency geo-replicated storage. In Proc. of the 10th USENIX Conference on Networked Systems Design and Implementation, NSDI'13, pages 313--328, 2013. Google ScholarDigital Library
H. Mahmoud, F. Nawab, A. Pucher, D. Agrawal, and A. El Abbadi. Low-latency multi-datacenter databases using replicated commit. Proc. VLDB Endow., 6(9):661--672, July 2013. Google ScholarDigital Library
M. Saeida Ardekani, P. Sutra, and M. Shapiro. Non-Monotonic Snapshot Isolation: scalable and strong consistency for geo-replicated transactional systems. In Proc. of the 32nd IEEE Symposium on Reliable Distributed Systems (SRDS 2013), pages 163--172, 2013. Google ScholarDigital Library
M. Saeida Ardekani, P. Sutra, M. Shapiro, and N. Preguiça. On the scalability of snapshot isolation. In Euro-Par 2013 Parallel Processing, volume 8097 of LNCS, pages 369--381. Springer, 2013. Google ScholarDigital Library
F. B. Schneider. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Comput. Surv., 22(4):299--319, Dec. 1990. Google ScholarDigital Library
J. Shute, R. Vingralek, B. Samwel, B. Handy, C. Whipkey, E. Rollins, M. Oancea, K. Littlefield, D. Menestrina, S. Ellner, J. Cieslewicz, I. Rae, T. Stancescu, and H. Apte. F1: A distributed sql database that scales. Proc. VLDB Endow., 6(11):1068--1079, Aug. 2013. Google ScholarDigital Library
Y. Sovran, R. Power, M. K. Aguilera, and J. Li. Transactional storage for geo-replicated systems. In Proc. of the 23rd ACM Symposium on Operating Systems Principles, SOSP '11, pages 385--400, 2011. Google ScholarDigital Library
I. Zhang, N. K. Sharma, A. Szekeres, A. Krishnamurthy, and D. R. K. Ports. Building consistent transactions with inconsistent replication. In Proc. of the 25th ACM Symposium on Operating Systems Principles (SOSP), pages 263--278, 2015. Google ScholarDigital Library
Y. Zhang, R. Power, S. Zhou, Y. Sovran, M. Aguilera, and J. Li. Transaction chains: Achieving serializability with low latency in geo-distributed storage systems. In Proc. of the 24th ACM Symposium on Operating Systems Principles, SOSP, pages 276--291, 2013. Google ScholarDigital Library

Index Terms

Recommendations

Multi-shot distributed transaction commit
Abstract
Atomic Commit Problem (ACP) is a single-shot agreement problem similar to consensus, meant to model the properties of transaction commit protocols in fault-prone distributed systems. We argue that ACP is too restrictive to capture the complexities ...
Read More
Dictatorial Transaction Processing: Atomic Commitment Without Veto Right

The current standard in governing distributed transaction termination is the so-called Two-Phase Commit protocol (2PC). The first phase of 2PC is a voting phase, where the participants in the transaction are given an ultimate right to abort that ...
Read More
An algorithm modelling primary copy two-phase locking with integrated mechanism of timestamps
CompSysTech '13: Proceedings of the 14th International Conference on Computer Systems and Technologies

The following paper presents a model of algorithm for primary copy two-phase locking (2PL). To avoid deadlocks of distributed transactions in distributed database management systems is used a timestamps mechanism. From the two strategies for timestamps ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in
WWW '17: Proceedings of the 26th International Conference on World Wide Web
April 2017
1678 pages
ISBN:9781450349130
General Chairs:
Rick Barrett
W3Events
,
Rick Cummings
Murdoch University
,
Program Chairs:
Eugene Agichtein
Emory University
,
Evgeniy Gabrilovich
Google Research
Copyright © 2017 Copyright is held by the International World Wide Web Conference Committee (IW3C2).
Sponsors
In-Cooperation
Publisher
International World Wide Web Conferences Steering Committee
Republic and Canton of Geneva, Switzerland
Publication History
- Published: 3 April 2017
Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
concurrency control
distributed database systems
distributed transactions
geo-replicated storage
paxos
Qualifiers
- research-article
Conference

Acceptance Rates
WWW '17 Paper Acceptance Rate164of966submissions,17%Overall Acceptance Rate1,899of8,196submissions,23%
More
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 15
  Total Citations
  View Citations
- 234
  Total Downloads
- Downloads (Last 12 months)7
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Blotter: Low Latency Transactions for Geo-Replicated Storage

WWW '17: Proceedings of the 26th International Conference on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multi-shot distributed transaction commit

Dictatorial Transaction Processing: Atomic Commitment Without Veto Right

An algorithm modelling primary copy two-phase locking with integrated mechanism of timestamps

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Blotter: Low Latency Transactions for Geo-Replicated Storage

WWW '17: Proceedings of the 26th International Conference on World Wide Web

ABSTRACT

References

Cited By

Index Terms

Recommendations

Multi-shot distributed transaction commit

Dictatorial Transaction Processing: Atomic Commitment Without Veto Right

An algorithm modelling primary copy two-phase locking with integrated mechanism of timestamps

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media