Skip to main content
Log in

Idempotent distributed counters using a forgetful bloom filter

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

Distributed key-value stores power the backend of high-performance web services and cloud computing applications. Key-value stores such as Cassandra rely heavily on counters to track occurrences of various kinds of events. However, modern implementations of counters do not provide exactly-once semantics. E.g., a client may request a counter increment, time out waiting for a response, and create a duplicate request resulting in a double increment at the server. In this paper, we address this problem by presenting, analyzing, and evaluating a novel server-side data structure called the forgetful bloom filter (FBF). Like a traditional Bloom filter, an FBF is a compact representation of a set of elements (e.g., client requests). However, an FBF: (i) can forget older elements (e.g., requests that are too old to apply), and (ii) adapts itself to meet a desired false positive rate under a varying workload. We present experimental results from a prototype implementation of FBFs and an implementation of FBFs in the Cassandra key-value store. Our results show that the FBF is highly accurate in maintaining correct counter values.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17

Similar content being viewed by others

References

  1. Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010). doi:10.1145/1773912.1773922

    Article  Google Scholar 

  2. The Apache Cassandra Project. http://cassandra.apache.org. Accessed 11 May 2011

  3. RIAK. http://basho.com. Accessed 11 May 2011

  4. Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. In: Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation, vol. 7, pp. 15–15. ser. OSDI ’06. USENIX Association, Berkeley (2006). http://dl.acm.org/citation.cfmid=1267308.1267323

  5. DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazon’s highly available key-value store. SIGOPS Oper. Syst. Rev. 41(6), 205–220 (2007). doi:10.1145/1323293.1294281

    Article  Google Scholar 

  6. AWS | Dynamo DB - NoSQL Cloud Database Service. http://aws.amazon.com/dynamodb. Accessed 11 May 2015

  7. Using a counter. http://www.datastax.com/documentation/cql/3.0/cql/cqlXXSlahUndXXusing/useXXSlahUndXXcounterXXSlahUndXXt.html. Accessed 11 May 2015

  8. Counters in RIAK 1.4. http://basho.com/counters-in-riak-1-4. Accessed 11 May 2015

  9. Databases | Research at Facebook. https://research.facebook.com/databases. Accessed 11 May 2015

  10. “Rainbird: Real-time analytics at Twitter.” http://cdn.oreillystatic.com/en/assets/1/event/55/Realtime%20Analytics%20at%20Twitter%20Presentation.pdf. Accessed 11 May 2015

  11. Birrell, A.D., Nelson, B.J.: Implementing remote procedure calls. ACM Trans. Comput. Syst. 2(1), 39–59 (1984). doi:10.1145/2080.357392

    Article  Google Scholar 

  12. [CASSANDRA-2495] Add a proper retry mechanism for counters in case of failed requests. https://issues.apache.org/jira/browse/CASSANDRA-2495. Accessed 11 May 2015

  13. Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970). doi:10.1145/362686.362692

    Article  MATH  Google Scholar 

  14. Wikipedia, “Bloom filter.” http://en.wikipedia.org/wiki/BloomXXSlahUndXXfilter (2015). Accessed 11 May 2015

  15. Fan, L., Cao, P., Almeida, J., Broder, A.Z.: Summary cache: a scalable wide-area web cache sharing protocol. IEEE ACM Trans. Netw. 8(3), 281–293 (2000). doi:10.1109/90.851975

    Article  Google Scholar 

  16. Jacobson, V.: Congestion avoidance and control. In: Symposium Proceedings on Communications Architectures and Protocols. ser. SIGCOMM ’88, pp. 314–329. ACM, New York, 1988. http://doi.acm.org/10.1145/52324.52356

  17. Broder, A., Mitzenmacher, M.: Network applications of bloom filters: a survey. Internet Math. 1(4), 485–509 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  18. Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine, M., Lewin, D.: Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world wide web. In: Proceedings of the 29th Annual ACM Symposium on Theory of Computing. ser. STOC ’97, pp. 654–663. ACM, New York, 1997. http://doi.acm.org/10.1145/258533.258660

  19. Counters in Cassandra. http://wiki.apache.org/cassandra/CountersHrBHrB. Accessed 11 May 2015

  20. Durability Cassandra. http://wiki.apache.org/cassandra/DurabilityHrBHrB. Accessed 11 May 2014

  21. MemTable in Cassandra. http://wiki.apache.org/cassandra/MemtableSSTable. Accessed 11 May 2015

  22. Partitioned Counters Design Document. https://issues.apache.org/jira/secure/attachment/12459754/Partitionedcountersdesigndoc.pdf. Accessed 11 May 2015

  23. Read Repair on Apache Cassandra Wiki. http://wiki.apache.org/cassandra/ReadRepair. Accessed 11 May 2015

  24. Anti-entropy on apache cassandra Wiki. https://wiki.apache.org/cassandra/AntiEntropy. Accessed 11 May 2015

  25. Vogels, W.: Eventually consistent. Queue 6(6), 14–19 (2008). doi:10.1145/1466443.1466448

    Article  Google Scholar 

  26. SSTable in Cassandra. http://wiki.apache.org/cassandra/MemtableHrBSSTableHrB. Accessed 16 Dec 2015

  27. A C++ Cassandra simulator. https://github.com/rajath26/CassandraHrBSimulatorHrB. Accessed 11 May 2015

  28. Network emulation testbed home. https://www.emulab.net/. Accessed 11 May 2015

  29. Architecture internals on Cassandra Wiki. https://wiki.apache.org/cassandra/ArchitectureInternals. Accessed 11 July 2015

  30. Java Driver 2.0 for apache Cassandra. http://docs.datastax.com/en/developer/java-driver/2.0/java-driver/whatsNew2.html. Accessed 11 July 2015

  31. Datastax Java Driver for Apache Cassandra on Github. https://github.com/datastax/java-driver. Accessed 11 May 2015

  32. Keyspace docuemntation on Datastax. http://docs.datastax.com/en/cql/3.0/cql/cqlXXSlahUndXXreference/createXXSlahUndXXkeyspaceXXSlahUndXXr.html. Accessed 11 July 2015

  33. Couchbase. http://www.couchbase.com/. Accessed 11 May 2015

  34. Shapiro, M., Preguiça, N., Baquero, C., Zawirski, M.: Conflict-free replicated data types. In: Proceedings of the 13th International Conference on Stabilization, Safety, and Security of Distributed Systems, ser. SSS’11, pp. 386–400. Springer, Berlin, 2011. http://dl.acm.org/citation.cfmid=2050613.2050642

  35. What’s new in Cassandra 2.1: better implementation of counters. http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters. Accessed 16 Dec 2014

  36. No more over-counting: making apache storm counters easy and idempotent using kafka and redis. https://blog.deck36.de/no-more-over-counting-//making-counters-in-apache-storm-idempotent-using-redis-hyperloglog/. Accessed 11 May 2015

  37. Redis. http://redis.io/. Accessed 11 May 2015

  38. Apache Kafka. http://kafka.apache.org/. Accessed 11 May 2015

  39. Rhea, S.C., Kubiatowicz, J.: Probabilistic location and routing (2002)

  40. Mitzenmacher, M.: Compressed bloom filters. In: Proceedings of the 20th Annual ACM Symposium on Principles of Distributed Computing, ser. PODC ’01, pp. 144–150. ACM, New York (2001). http://doi.acm.org/10.1145/383962.384004

  41. Bonomi, F., Mitzenmacher, M., Panigrahy, R., Singh, S., Varghese, G.: An improved construction for counting bloom filters. In: Proceedings of the 14th Conference on Annual European Symposium, ser. ESA’06, vol. 14, pp. 684–695. Springer, London (2006). http://dx.doi.org/10.1007/11841036XXSlahUndXX61

  42. Cohen, S., Matias, Y.: Spectral bloom filters. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’03, pp. 241–252. ACM, New York (2003). http://doi.acm.org/10.1145/872757.872787

  43. Deng, F., Rafiei, D.: Approximately detecting duplicates for streaming data using stable bloom filters. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’06, pp. 25–36. ACM, New York (2006). http://doi.acm.org/10.1145/1142473.1142477

Download references

Acknowledgments

This work was supported in part by the following grants: NSF CNS 1409416, NSF CCF 0964471, NSF CNS 1319527, AFOSR/AFRL FA8750-11-2-0084, and a generous gift from Microsoft.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rajath Subramanyam.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Subramanyam, R., Gupta, I., Leslie, L.M. et al. Idempotent distributed counters using a forgetful bloom filter. Cluster Comput 19, 879–892 (2016). https://doi.org/10.1007/s10586-016-0567-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10586-016-0567-8

Keywords

Navigation