Abstract
Strongly consistent distributed systems are easy to reason about but face fundamental limitations in availability and performance. Weakly consistent systems can be implemented with very high performance but place a burden on the application developer to reason about complex interleavings of execution. Invariant confluence provides a formal framework for understanding when we can get the best of both worlds. An invariant confluent object can be efficiently replicated with no coordination needed to preserve its invariants. However, actually determining whether or not an object is invariant confluent is challenging. In this paper, we establish conditions under which a commonly used sufficient condition for invariant confluence is both necessary and sufficient, and we use this condition to design (a) a general-purpose interactive invariant confluence decision procedure and (b) a novel sufficient condition that can be checked automatically. We then take a step beyond invariant confluence and introduce a generalization of invariant confluence, called segmented invariant confluence, that allows us to replicate non-invariant confluent objects with a small amount of coordination. We implemented these formalisms in a prototype called Lucy and found that our decision procedures efficiently handle common real-world workloads including foreign keys, rollups, escrow transactions and more. We also found that segmented invariant confluent replication can deliver up to an order of magnitude more throughput than linearizable replication for low contention workloads and comparable throughput for medium-to-high contention workloads.











Similar content being viewed by others
Notes
Notably, if O is a CRDT—i.e., O is a semilattice and every transaction \(t \in T\) is inflationary—then this periodic merging ensures that O is strongly eventually consistent [45].
Another small difference is that IsIclosed behaves differently in Algorithm 1 and Algorithm 3. In Algorithm 3, IsIclosed returns a triple (closed, \(s_1\), \(s_2\)). If closed is false, then \(s_1, s_2 \in I\) are two states not in NR such that \(I(s_1)\) and \(I(s_2)\) but \(\lnot I(s_1 \sqcup s_2)\). If no such states exist, then closed is true, and \(s_1\) and \(s_2\) are null.
References
Abadi, D.: Consistency tradeoffs in modern distributed database system design: cap is only part of the story. Computer 45(2), 37–42 (2012)
Ahamad, M., Neiger, G., Burns, J.E., Kohli, P., Hutto, P.W.: Causal memory: definitions, implementation, and programming. Distrib. Comput. 9(1), 37–49 (1995)
Alvaro, P., Ameloot, T.J., Hellerstein, J.M., Marczak, W., Van den Bussche, J.: A declarative semantics for dedalus. Technical Report UCB/EECS-2011-120, EECS Department, University of California, Berkeley (2011)
Alvaro, P., Condie, T., Conway, N., Elmeleegy, K., Hellerstein, J.M., Sears, R.: Boom analytics: exploring data-centric, declarative programming for the cloud. In: Proceedings of the 5th European Conference on Computer systems, pp. 223–236. ACM (2010)
Alvaro, P., Conway, N., Hellerstein, J.M., Marczak, W.R.: Consistency analysis in bloom: a calm and collected approach. In: CIDR, pp. 249–260 (2011)
Alvaro, P., Marczak, W.R., Conway, N., Hellerstein, J.M., Maier, D., Sears, R.: Dedalus: Datalog in time and space. In: Datalog Reloaded, pp. 262–281. Springer (2011)
Ameloot, T.J., Neven, F., Van den Bussche, J.: Relational transducers for declarative networking. J. ACM (JACM) 60(2), 15 (2013)
Bailis, P., Fekete, A., Franklin, M.J., Ghodsi, A., Hellerstein, J.M., Stoica, I.: Coordination avoidance in database systems. PVLDB 8(3), 185–196 (2014)
Balegas, V., Duarte, S., Ferreira, C., Rodrigues, R., Preguiça, N., Najafzadeh, M., Shapiro, M.: Putting consistency back into eventual consistency. In: Proceedings of the Tenth European Conference on Computer Systems. ACM (2015)
Balegas, V., Duarte, S., Ferreira, C., Rodrigues, R., Preguiça, N., Najafzadeh, M., Shapiro, M.: Towards fast invariant preservation in geo-replicated systems. ACM SIGOPS Oper. Syst. Rev. 49(1), 121–125 (2015)
Barbará-Millá, D., Garcia-Molina, H.: The demarcation protocol: a technique for maintaining constraints in distributed database systems. VLDB J. 3(3), 325–353 (1994)
Bernstein, P.A., Goodman, N.: Concurrency control in distributed database systems. ACM Comput. Surv. (CSUR) 13(2), 185–221 (1981)
Brewer, E.: Cap twelve years later: how the “rules” have changed. Computer 45(2), 23–29 (2012)
Cheung, A., Madden, S., Solar-Lezama, A., Arden, O., Myers, A.C.: Using program analysis to improve database applications. IEEE Data Eng. Bull. 37(1), 48–59 (2014)
Conway, N., Marczak, W., Alvaro, P., Hellerstein, J.M., Maier, D.: Logic and lattices for distributed programming. Technical Report UCB/EECS-2012-167, EECS Department, University of California, Berkeley (2012)
Crooks, N., Pu, Y., Estrada, N., Gupta, T., Alvisi, L., Clement, A.: Tardis: A branch-and-merge approach to weak consistency. In: Proceedings of the 2016 International Conference on Management of Data, pp. 1615–1628. ACM (2016)
De Moura, L., Bjørner, N.: Z3: An efficient smt solver. In: International conference on Tools and Algorithms for the Construction and Analysis of Systems, pp. 337–340. Springer (2008)
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazon’s highly available key-value store. In: ACM SIGOPS Operating Systems Review, vol. 41, pp. 205–220. ACM (2007)
Difallah, D.E., Pavlo, A., Curino, C., Cudre-Mauroux, P.: OLTP-bench: an extensible testbed for benchmarking relational databases. PVLDB 7(4), 277–288 (2013)
Gilbert, S., Lynch, N.: Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant web services. ACM SIGACT News 33(2), 51–59 (2002)
Gotsman, A., Yang, H., Ferreira, C., Najafzadeh, M., Shapiro, M.: ’cause i’m strong enough: reasoning about consistency choices in distributed systems. ACM SIGPLAN Notices 51(1), 371–384 (2016)
Grefen, P.W., Apers, P.M.: Integrity control in relational database systems—an overview. Data Knowl. Eng. 10(2), 187–223 (1993)
Gupta, A., Widom, J.: Local verification of global integrity constraints in distributed databases. ACM SIGMOD Rec. 22(2), 49–58 (1993)
Hellerstein, J.M.: The declarative imperative: experiences and conjectures in distributed logic. ACM SIGMOD Rec. 39(1), 5–19 (2010)
Herlihy, M.P., Wing, J.M.: Linearizability: a correctness condition for concurrent objects. ACM Trans. Program. Lang. Syst. (TOPLAS) 12(3), 463–492 (1990)
Hoare, C.A.R.: An axiomatic basis for computer programming. Commun. ACM 12(10), 576–580 (1969)
Kaki, G., Priya, S., Sivaramakrishnan, K., Jagannathan, S.: Mergeable replicated data types. Proc. ACM Program. Lang. 3(OOPSLA), 1–29 (2019)
Kröning, D., Rümmer, P., Weissenbacher, G.: A proposal for a theory of finite sets, lists, and maps for the SMT-LIB standard. In: Informal Proceedings, 7th International Workshop on Satisfiability Modulo Theories at CADE, vol. 22 (2009)
Lamport, L.: The part-time parliament. ACM Trans. Comput. Syst. 16(2), 133–169 (1998)
Lamport, L., et al.: Paxos made simple. ACM Sigact News 32(4), 18–25 (2001)
Li, C., Leitão, J., Clement, A., Preguiça, N., Rodrigues, R., Vafeiadis, V.: Automating the choice of consistency levels in replicated systems. In: 2014 USENIX Annual Technical Conference (USENIX ATC 14), pp. 281–292 (2014)
Li, C., Porto, D., Clement, A., Gehrke, J., Preguiça, N., Rodrigues, R.: Making geo-replicated systems fast as possible, consistent when necessary. In: Presented as part of the 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI 12), pp. 265–278 (2012)
Lipton, R.J., Sandberg, J.S.: Pram: A scalable shared memory. Technical Report TR-180-88, Computer Science Department, Princeton University (1988)
Liskov, B., Cowling, J.: Viewstamped replication revisited. Technical Report MIT-CSAIL-TR-2012-021, CSAIL, Massachusetts Institute of Technology (2012)
Lloyd, W., Freedman, M.J., Kaminsky, M., Andersen, D.G.: Don’t settle for eventual: scalable causal consistency for wide-area storage with cops. In: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, pp. 401–416. ACM (2011)
Mehdi, S.A., Littley, C., Crooks, N., Alvisi, L., Bronson, N., Lloyd, W.: I can’t believe it’s not causal! scalable causal consistency with no slowdown cascades. In: NSDI, pp. 453–468 (2017)
Mohan, C., Lindsay, B., Obermarck, R.: Transaction management in the r* distributed database management system. ACM Trans. Database Syst. (TODS) 11(4), 378–396 (1986)
Mu, S., Nelson, L., Lloyd, W., Li, J.: Consolidating concurrency control and consensus for commits under conflicts. In: OSDI, pp. 517–532 (2016)
O’Neil, P.E.: The escrow transactional method. ACM Trans. Database Syst. (TODS) 11(4), 405–430 (1986)
Ongaro, D., Ousterhout, J.K.: In search of an understandable consensus algorithm. In: USENIX Annual Technical Conference, pp. 305–319 (2014)
Ramachandra, K., Guravannavar, R., Sudarshan, S.: Program analysis and transformation for holistic optimization of database applications. In: Proceedings of the ACM SIGPLAN International Workshop on State of the Art in Java Program analysis, pp. 39–44. ACM (2012)
Roy, S., Kot, L., Bender, G., Ding, B., Hojjat, H., Koch, C., Foster, N., Gehrke, J.: The homeostasis protocol: avoiding transaction coordination through program analysis. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pp. 1311–1326. ACM (2015)
Schneider, F.B.: Implementing fault-tolerant services using the state machine approach: a tutorial. ACM Comput. Surv. (CSUR) 22(4), 299–319 (1990)
Shapiro, M., Preguiça, N., Baquero, C., Zawirski, M.: A comprehensive study of convergent and commutative replicated data types. Ph.D. thesis, Inria–Centre Paris-Rocquencourt; INRIA (2011)
Shapiro, M., Preguiça, N., Baquero, C., Zawirski, M.: Conflict-free replicated data types. In: Symposium on Self-Stabilizing Systems, pp. 386–400. Springer (2011)
Terry, D.B., Theimer, M.M., Petersen, K., Demers, A.J., Spreitzer, M.J., Hauser, C.H.: Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System, vol. 29. ACM, New York (1995)
Thomson, A., Diamond, T., Weng, S.-C., Ren, K., Shao, P., Abadi, D.J.: Calvin: fast distributed transactions for partitioned database systems. In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 1–12. ACM (2012)
Vogels, W.: Eventually consistent. Commun. ACM 52(1), 40–44 (2009)
Wu, C., Faleiro, J., Lin, Y., Hellerstein, J.: Anna: A KVS for any scale. IEEE Trans. Knowl. Data Eng. (2019)
Wu, Y., Chan, C.-Y., Tan, K.-L.: Transaction healing: Scaling optimistic concurrency control on multicores. In: Proceedings of the 2016 International Conference on Management of Data, pp. 1689–1704. ACM (2016)
Zhang, I., Sharma, N.K., Szekeres, A., Krishnamurthy, A., Ports, D.R.: Building consistent transactions with inconsistent replication. In: Proceedings of the 25th Symposium on Operating Systems Principles, pp. 263–278. ACM (2015)
Acknowledgements
The authors would like to thank Alan Fekete, Alexandra Meliou, Alvin Cheung, Anthony Tan, Cristina Teodoropol, Peter Alvaro and Peter Bailis, for fruitful discussion and feedback. This research is supported in part by DHS Award HSHQDC-16-3-00083, NSF CISE Expeditions Award CCF-1139158 and gifts from Alibaba, Amazon Web Services, Ant Financial, CapitalOne, Ericsson, GE, Google, Huawei, Intel, IBM, Microsoft, Scotiabank, Splunk and VMware.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Whittaker, M., Hellerstein, J.M. Interactive checks for coordination avoidance. The VLDB Journal 30, 71–92 (2021). https://doi.org/10.1007/s00778-020-00628-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-020-00628-3