skip to main content
10.1145/3578358.3591331acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article
Open access

Probabilistic Causal Contexts for Scalable CRDTs

Published: 08 May 2023 Publication History

Abstract

Conflict-free Replicated Data Types (CRDTs) are useful to allow a distributed system to operate on data even when partitions occur, and thus preserve operational availability. Most CRDTs need to track whether data evolved concurrently at different nodes and needs to be reconciled; this requires storing causality metadata that is proportional to the number of nodes. In this paper, we try to overcome this limitation by introducing a stochastic mechanism that is no longer linear on the number of nodes, but whose accuracy is now tied to how much divergence occurs between synchronizations. This provides a new tool that can be useful in deployments with many anonymous nodes and frequent synchronizations. However, there is an underlying trade-off with classic deterministic solutions, since the approach is now probabilistic and the accuracy depends on the configurable metadata space size.

References

[1]
Paulo Sérgio Almeida, Carlos Baquero, Nuno Preguiça, and David Hutchison. 2007. Scalable bloom filters. Inform. Process. Lett. 101, 6 (2007), 255--261.
[2]
Paulo Sérgio Almeida, Ali Shoker, and Carlos Baquero. 2018. Delta state replicated data types. J. Parallel and Distrib. Comput. 111 (2018), 162--173.
[3]
Eran Assaf, Ran Ben Basat, Gil Einziger, and Roy Friedman. 2018. Pay for a sliding bloom filter and get counting, distinct elements, and entropy for free. In IEEE INFOCOM 2018-IEEE Conference on Computer Communications. IEEE, 2204--2212.
[4]
Carlos Baquero, Paulo Sérgio Almeida, and Ali Shoker. 2017. Pure Operation-Based Replicated Data Types. arXiv preprint arXiv:1710.04469 (2017).
[5]
Annette Bieniusa, Marek Zawirski, Nuno Preguiça, Marc Shapiro, Carlos Baquero, Valter Balegas, and Sérgio Duarte. 2012. An optimized conflict-free replicated set. arXiv preprint arXiv:1210.3368 (2012).
[6]
Burton H Bloom. 1970. Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13, 7 (1970), 422--426.
[7]
Flavio Bonomi, Michael Mitzenmacher, Rina Panigrahy, Sushil Singh, and George Varghese. 2006. An improved construction for counting bloom filters. In European Symposium on Algorithms. Springer, 684--695.
[8]
Alex D Breslow and Nuwan S Jayasena. 2018. Morton filters: faster, space-efficient cuckoo filters via biasing, compression, and decoupled logical sparsity. Proceedings of the VLDB Endowment 11, 9 (2018), 1041--1055.
[9]
Eric A Brewer. 2000. Towards robust distributed systems. In PODC, Vol. 7. Portland, OR, 343477--343502.
[10]
Andrei Broder and Michael Mitzenmacher. 2004. Network applications of bloom filters: A survey. Internet mathematics 1, 4 (2004), 485--509.
[11]
Sebastian Burckhardt, Alexey Gotsman, Hongseok Yang, and Marek Zawirski. 2014. Replicated data types: specification, verification, optimality. ACM Sigplan Notices 49, 1 (2014), 271--284.
[12]
Francis Chang, Wu-chang Feng, and Kang Li. 2004. Approximate caches for packet classification. In IEEE INFOCOM 2004, Vol. 4. IEEE, 2196--2207.
[13]
Mayur Datar, Aristides Gionis, Piotr Indyk, and Rajeev Motwani. 2002. Maintaining stream statistics over sliding windows. SIAM journal on computing 31, 6 (2002), 1794--1813.
[14]
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: amazon's highly available key-value store. ACM SIGOPS operating systems review 41, 6 (2007), 205--220.
[15]
Fan Deng and Davood Rafiei. 2006. Approximately detecting duplicates for streaming data using stable bloom filters. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data. 25--36.
[16]
Gil Einziger and Roy Friedman. 2016. Counting with tinytable: Every bit counts!. In Proceedings of the 17th International Conference on Distributed Computing and Networking. 1--10.
[17]
VitorEnes. 2017. Efficient Synchronization of State-based CRDTs. Master's thesis. Universidade do Minho. https://vitorenes.org/page/other/msc-thesis.pdf
[18]
Bin Fan, Dave G Andersen, Michael Kaminsky, and Michael D Mitzenmacher. 2014. Cuckoo filter: Practically better than bloom. In Proceedings of the 10th ACM International on Conference on emerging Networking Experiments and Technologies. 75--88.
[19]
Pedro Henrique Fernandes. 2021. Age-Partitioned Bloom filter in Rust. Retrieved February 27, 2023 from https://github.com/A77377/filte-rs
[20]
Pedro Henrique Fernandes. 2021. Hybrid (classic + probabilistic) delta-state causal CRDTs. Retrieved February 27, 2023 from https://github.com/A77377/delta-crdts
[21]
Pedro Henrique Fernandes. 2021. Simple simulator used for comparison between classic and probabilistic AWOR-Sets. Retrieved February 27, 2023 from https://github.com/A77377/probabilistic-aworset-sim
[22]
Seth Gilbert and Nancy Lynch. 2002. Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services. Acm Sigact News 33, 2 (2002), 51--59.
[23]
P Brighten Godfrey, Scott Shenker, and Ion Stoica. 2006. Minimizing churn in distributed systems. ACM SIGCOMM Computer Communication Review 36, 4 (2006), 147--158.
[24]
Brent ByungHoon Kang, Robert Wilensky, and John Kubiatowicz. 2003. The hash history approach for reconciling mutual inconsistency. In 23rd International Conference on Distributed Computing Systems, 2003. Proceedings. IEEE, 670--677.
[25]
Martin Kleppmann. 2017. Designing data-intensive applications: The big ideas behind reliable, scalable, and maintainable systems. "O'Reilly Media, Inc.".
[26]
Leslie Lamport. 2019. Time, clocks, and the ordering of events in a distributed system. In Concurrency: the Works of Leslie Lamport. 179--196.
[27]
Leslie Lamport et al. 2001. Paxos made simple. ACM Sigact News 32, 4 (2001), 18--25.
[28]
Rafael P Laufer, Pedro B Velloso, and Otto Carlos MB Duarte. 2011. A generalized bloom filter to secure distributed network applications. Computer Networks 55, 8 (2011), 1804--1819.
[29]
Yang Liu, Wenji Chen, and Yong Guan. 2013. Near-optimal approximate membership query over time-decaying windows. In 2013 Proceedings IEEE INFOCOM. IEEE, 1447--1455.
[30]
Dahlia Malkhi and Doug Terry. 2005. Concise version vectors in WinFS. In International Symposium on Distributed Computing. Springer, 339--353.
[31]
Ahmed Metwally, Divyakant Agrawal, and Amr El Abbadi. 2005. Duplicate detection in click streams. In Proceedings of the 14th international conference on World Wide Web. 12--21.
[32]
Madhavan Mukund, Gautham Shenoy, and SP Suresh. 2014. Optimized or-sets without ordering constraints. In International Conference on Distributed Computing and Networking. Springer, 227--241.
[33]
James K Mullin. 1983. A second look at Bloom filters. Commun. ACM 26, 8 (1983), 570--571.
[34]
Moni Naor and Eylon Yogev. 2015. Tight bounds for sliding bloom filters. Algorithmica 73, 4 (2015), 652--672.
[35]
D Stott Parker, Gerald J Popek, Gerard Rudisin, Allen Stoughton, Bruce J Walker, Evelyn Walton, Johanna M Chow, David Edwards, Stephen Kiser, and Charles Kline. 1983. Detection of mutual inconsistency in distributed systems. IEEE transactions on Software Engineering 3 (1983), 240--247.
[36]
Marc Shapiro, Nuno Preguiça, Carlos Baquero, and Marek Zawirski. 2011. A comprehensive study of convergent and commutative replicated data types. (2011).
[37]
Marc Shapiro, Nuno Preguiça, Carlos Baquero, and Marek Zawirski. 2011. Conflict-free replicated data types. In Symposium on Self-Stabilizing Systems. Springer, 386--400.
[38]
Marc Shapiro, Nuno Preguiça, Carlos Baquero, and Marek Zawirski. 2011. Convergent and commutative replicated data types. (2011).
[39]
Xiao Shi, Scott Pruett, Kevin Doherty, Jinyu Han, Dmitri Petrov, Jim Carrig, John Hugg, and Nathan Bronson. 2020. FlightTracker: Consistency across Read-Optimized Online Stores at Facebook. In 14th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 20). 407--423.
[40]
Ariel Shtul, Carlos Baquero, and Paulo Sérgio Almeida. 2020. Age-Partitioned Bloom Filters. arXiv preprint arXiv:2001.03147 (2020).
[41]
Douglas B Terry, Alan J Demers, Karin Petersen, Mike J Spreitzer, Marvin M Theimer, and Brent B Welch. 1994. Session guarantees for weakly consistent replicated data. In Proceedings of 3rd International Conference on Parallel and Distributed Information Systems. IEEE, 140--149.
[42]
Werner Vogels. 2008. Eventually consistent. Queue 6, 6 (2008), 14--19.
[43]
Gene TJ Wuu and Arthur J Bernstein. 1984. Efficient solutions to the replicated log and dictionary problems. In Proceedings of the third annual ACM symposium on Principles of distributed computing. 233--242.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PaPoC '23: Proceedings of the 10th Workshop on Principles and Practice of Consistency for Distributed Data
May 2023
89 pages
ISBN:9798400700866
DOI:10.1145/3578358
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 May 2023

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. conflict-free replicated data types (CRDTs)
  2. bloom filters
  3. eventual consistency

Qualifiers

  • Research-article

Funding Sources

  • FCT

Conference

PaPoC '23
Sponsor:

Acceptance Rates

Overall Acceptance Rate 34 of 47 submissions, 72%

Upcoming Conference

EuroSys '25
Twentieth European Conference on Computer Systems
March 30 - April 3, 2025
Rotterdam , Netherlands

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 209
    Total Downloads
  • Downloads (Last 12 months)111
  • Downloads (Last 6 weeks)19
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media