ABSTRACT
Most geo-replicated storage systems use weak consistency to avoid the performance penalty of coordinating replicas in different data centers. This departure from strong semantics poses problems to application programmers, who need to address the anomalies enabled by weak consistency. In this paper we use a recently proposed isolation level, called Non-Monotonic Snapshot Isolation, to achieve ACID transactions with low latency. To this end, we present Blotter, a geo-replicated system that leverages these semantics in the design of a new concurrency control protocol that leaves a small amount of local state during reads to make commits more efficient, which is combined with a configuration of Paxos that is tailored for good performance in wide area settings. Read operations always run on the local data center, and update transactions complete in a small number of message steps to a subset of the replicas. We implemented Blotter as an extension to Cassandra. Our experimental evaluation shows that Blotter has a small overhead at the data center scale, and performs better across data centers when compared with our implementations of the core Spanner protocol and of Snapshot Isolation on the same codebase.
- S. Almeida, J. Leitão, and L. Rodrigues. Chain- reaction: A causal consistent datastore based on chain replication. In Proc. of 8th European Conference on Computer Systems, EuroSys'13, pages 85--98, 2013. Google ScholarDigital Library
- R. Ananthanarayanan, V. Basker, S. Das, A. Gupta, H. Jiang, T. Qiu, A. Reznichenko, D. Ryabkov, M. Singh, and S. Venkataraman. Photon: Fault- tolerant and scalable joining of continuous data streams. In SIGMOD '13: Proc. of 2013 international conf. on Management of data, pages 577--588, 2013. Google ScholarDigital Library
- J. Baker, C. Bond, J. C. Corbett, J. Furman, A. Khorlin, J. Larson, J.-M. Leon, Y. Li, A. Lloyd, and V. Yushprakh. Megastore: Providing scalable, highly available storage for interactive services. In Proc. of the Conference on Innovative Data system Research (CIDR), pages 223--234, 2011.Google Scholar
- P. Bernstein and N. Goodman. Concurrency control in distributed database systems. ACM Computing Surveys, 13(2), January 1981. Google ScholarDigital Library
- N. Bronson et al. Tao: Facebook rights distributed data store for the social graph. In Proc. of the 2013 USENIX Annual Technical Conference, pages 49--60, 2013. Google ScholarDigital Library
- D. G. Campbell, G. Kakivaya, and N. Ellis. In Proc. of the 2010 ACM SIGMOD International Conference on Management of Data, pages 1021--1024. Google ScholarDigital Library
- F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst., 26(2):4:1--4:26, June 2008. Google ScholarDigital Library
- B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with ycsb. In Proc. of the 1st ACM Symposium on Cloud Computing, pages 143--154, 2010. Google ScholarDigital Library
- J. C. Corbett et al. Spanner: Google's globally-distributed database. In Proc. of the 10th USENIX Conference on Operating Systems Design and Implementation, OSDI'12, pages 251--264, 2012. Google ScholarDigital Library
- G. DeCandia et al. In Proc. of the 21st ACM Symposium on Operating Systems Principles, pages 205--220.Google Scholar
- A. K. Elmagarmid. A survey of distributed deadlock detection algorithms. SIGMOD Rec., 15(3):37--45, Sept. 1986. Google ScholarDigital Library
- S. Elnikety, W. Zwaenepoel, and F. Pedone. Database replication using generalized snapshot isolation. In Proceedings of the 24th IEEE Symposium on Reliable Distributed Systems, SRDS '05, pages 73--84, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarDigital Library
- L. Glendenning, I. Beschastnikh, A. Krishnamurthy, and T. Anderson. Scalable consistency in Scatter. In Proc. of the 23rd ACM Symposium on Operating Systems Principles, SOSP '11, pages 15--28, 2011. Google ScholarDigital Library
- J. Gray and L. Lamport. Consensus on transaction commit. ACM Trans. Database Syst., 31(1):133--160, Mar. 2006. Google ScholarDigital Library
- T. Hoff. Latency is everywhere and it costs you sales - how to crush it. Post at the High Scalability blog. http://tinyurl.com/5g8mp2, 2009.Google Scholar
- T. Kraska, G. Pang, M. J. Franklin, S. Madden, and A. Fekete. Mdcc: Multi-data center consistency. In Proc. of the 8th ACM European Conference on Computer Systems, EuroSys '13, pages 113--126, 2013. Google ScholarDigital Library
- A. Lakshman and P. Malik. Cassandra: A decentralized structured storage system. SIGOPS Oper. Syst. Rev., 44(2):35--40, Apr. 2010. Google ScholarDigital Library
- L. Lamport. Time, clocks, and the ordering of events in a distributed system. Commun. ACM, 21(7):558--565, July 1978. Google ScholarDigital Library
- L. Lamport. The part-time parliament. ACM Trans. Comput. Syst., 16(2):133--169, May 1998. Google ScholarDigital Library
- L. Lamport, D. Malkhi, and L. Zhou. Reconfiguring a state machine. ACM SIGACT News, 41(1):63--73, Mar. 2010. Google ScholarDigital Library
- W. Lloyd, M. J. Freedman, M. Kaminsky, and D. G. Andersen. In Proc. of the Twenty-Third ACM Symposium on Operating Systems Principles, pages 401--416. Google ScholarDigital Library
- W. Lloyd, M. J. Freedman, M. Kaminsky, and D. G. Andersen. Stronger semantics for low-latency geo-replicated storage. In Proc. of the 10th USENIX Conference on Networked Systems Design and Implementation, NSDI'13, pages 313--328, 2013. Google ScholarDigital Library
- H. Mahmoud, F. Nawab, A. Pucher, D. Agrawal, and A. El Abbadi. Low-latency multi-datacenter databases using replicated commit. Proc. VLDB Endow., 6(9):661--672, July 2013. Google ScholarDigital Library
- M. Saeida Ardekani, P. Sutra, and M. Shapiro. Non-Monotonic Snapshot Isolation: scalable and strong consistency for geo-replicated transactional systems. In Proc. of the 32nd IEEE Symposium on Reliable Distributed Systems (SRDS 2013), pages 163--172, 2013. Google ScholarDigital Library
- M. Saeida Ardekani, P. Sutra, M. Shapiro, and N. Preguiça. On the scalability of snapshot isolation. In Euro-Par 2013 Parallel Processing, volume 8097 of LNCS, pages 369--381. Springer, 2013. Google ScholarDigital Library
- F. B. Schneider. Implementing fault-tolerant services using the state machine approach: A tutorial. ACM Comput. Surv., 22(4):299--319, Dec. 1990. Google ScholarDigital Library
- J. Shute, R. Vingralek, B. Samwel, B. Handy, C. Whipkey, E. Rollins, M. Oancea, K. Littlefield, D. Menestrina, S. Ellner, J. Cieslewicz, I. Rae, T. Stancescu, and H. Apte. F1: A distributed sql database that scales. Proc. VLDB Endow., 6(11):1068--1079, Aug. 2013. Google ScholarDigital Library
- Y. Sovran, R. Power, M. K. Aguilera, and J. Li. Transactional storage for geo-replicated systems. In Proc. of the 23rd ACM Symposium on Operating Systems Principles, SOSP '11, pages 385--400, 2011. Google ScholarDigital Library
- I. Zhang, N. K. Sharma, A. Szekeres, A. Krishnamurthy, and D. R. K. Ports. Building consistent transactions with inconsistent replication. In Proc. of the 25th ACM Symposium on Operating Systems Principles (SOSP), pages 263--278, 2015. Google ScholarDigital Library
- Y. Zhang, R. Power, S. Zhou, Y. Sovran, M. Aguilera, and J. Li. Transaction chains: Achieving serializability with low latency in geo-distributed storage systems. In Proc. of the 24th ACM Symposium on Operating Systems Principles, SOSP, pages 276--291, 2013. Google ScholarDigital Library
Index Terms
- Blotter: Low Latency Transactions for Geo-Replicated Storage
Recommendations
Multi-shot distributed transaction commit
AbstractAtomic Commit Problem (ACP) is a single-shot agreement problem similar to consensus, meant to model the properties of transaction commit protocols in fault-prone distributed systems. We argue that ACP is too restrictive to capture the complexities ...
Dictatorial Transaction Processing: Atomic Commitment Without Veto Right
The current standard in governing distributed transaction termination is the so-called Two-Phase Commit protocol (2PC). The first phase of 2PC is a voting phase, where the participants in the transaction are given an ultimate right to abort that ...
An algorithm modelling primary copy two-phase locking with integrated mechanism of timestamps
CompSysTech '13: Proceedings of the 14th International Conference on Computer Systems and TechnologiesThe following paper presents a model of algorithm for primary copy two-phase locking (2PL). To avoid deadlocks of distributed transactions in distributed database management systems is used a timestamps mechanism. From the two strategies for timestamps ...
Comments