Abstract
Distributed key-value stores power the backend of high-performance web services and cloud computing applications. Key-value stores such as Cassandra rely heavily on counters to track occurrences of various kinds of events. However, modern implementations of counters do not provide exactly-once semantics. E.g., a client may request a counter increment, time out waiting for a response, and create a duplicate request resulting in a double increment at the server. In this paper, we address this problem by presenting, analyzing, and evaluating a novel server-side data structure called the forgetful bloom filter (FBF). Like a traditional Bloom filter, an FBF is a compact representation of a set of elements (e.g., client requests). However, an FBF: (i) can forget older elements (e.g., requests that are too old to apply), and (ii) adapts itself to meet a desired false positive rate under a varying workload. We present experimental results from a prototype implementation of FBFs and an implementation of FBFs in the Cassandra key-value store. Our results show that the FBF is highly accurate in maintaining correct counter values.

















Similar content being viewed by others
References
Lakshman, A., Malik, P.: Cassandra: a decentralized structured storage system. SIGOPS Oper. Syst. Rev. 44(2), 35–40 (2010). doi:10.1145/1773912.1773922
The Apache Cassandra Project. http://cassandra.apache.org. Accessed 11 May 2011
RIAK. http://basho.com. Accessed 11 May 2011
Chang, F., Dean, J., Ghemawat, S., Hsieh, W.C., Wallach, D.A., Burrows, M., Chandra, T., Fikes, A., Gruber, R.E.: Bigtable: a distributed storage system for structured data. In: Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation, vol. 7, pp. 15–15. ser. OSDI ’06. USENIX Association, Berkeley (2006). http://dl.acm.org/citation.cfmid=1267308.1267323
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Lakshman, A., Pilchin, A., Sivasubramanian, S., Vosshall, P., Vogels, W.: Dynamo: amazon’s highly available key-value store. SIGOPS Oper. Syst. Rev. 41(6), 205–220 (2007). doi:10.1145/1323293.1294281
AWS | Dynamo DB - NoSQL Cloud Database Service. http://aws.amazon.com/dynamodb. Accessed 11 May 2015
Using a counter. http://www.datastax.com/documentation/cql/3.0/cql/cqlXXSlahUndXXusing/useXXSlahUndXXcounterXXSlahUndXXt.html. Accessed 11 May 2015
Counters in RIAK 1.4. http://basho.com/counters-in-riak-1-4. Accessed 11 May 2015
Databases | Research at Facebook. https://research.facebook.com/databases. Accessed 11 May 2015
“Rainbird: Real-time analytics at Twitter.” http://cdn.oreillystatic.com/en/assets/1/event/55/Realtime%20Analytics%20at%20Twitter%20Presentation.pdf. Accessed 11 May 2015
Birrell, A.D., Nelson, B.J.: Implementing remote procedure calls. ACM Trans. Comput. Syst. 2(1), 39–59 (1984). doi:10.1145/2080.357392
[CASSANDRA-2495] Add a proper retry mechanism for counters in case of failed requests. https://issues.apache.org/jira/browse/CASSANDRA-2495. Accessed 11 May 2015
Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970). doi:10.1145/362686.362692
Wikipedia, “Bloom filter.” http://en.wikipedia.org/wiki/BloomXXSlahUndXXfilter (2015). Accessed 11 May 2015
Fan, L., Cao, P., Almeida, J., Broder, A.Z.: Summary cache: a scalable wide-area web cache sharing protocol. IEEE ACM Trans. Netw. 8(3), 281–293 (2000). doi:10.1109/90.851975
Jacobson, V.: Congestion avoidance and control. In: Symposium Proceedings on Communications Architectures and Protocols. ser. SIGCOMM ’88, pp. 314–329. ACM, New York, 1988. http://doi.acm.org/10.1145/52324.52356
Broder, A., Mitzenmacher, M.: Network applications of bloom filters: a survey. Internet Math. 1(4), 485–509 (2004)
Karger, D., Lehman, E., Leighton, T., Panigrahy, R., Levine, M., Lewin, D.: Consistent hashing and random trees: distributed caching protocols for relieving hot spots on the world wide web. In: Proceedings of the 29th Annual ACM Symposium on Theory of Computing. ser. STOC ’97, pp. 654–663. ACM, New York, 1997. http://doi.acm.org/10.1145/258533.258660
Counters in Cassandra. http://wiki.apache.org/cassandra/CountersHrBHrB. Accessed 11 May 2015
Durability Cassandra. http://wiki.apache.org/cassandra/DurabilityHrBHrB. Accessed 11 May 2014
MemTable in Cassandra. http://wiki.apache.org/cassandra/MemtableSSTable. Accessed 11 May 2015
Partitioned Counters Design Document. https://issues.apache.org/jira/secure/attachment/12459754/Partitionedcountersdesigndoc.pdf. Accessed 11 May 2015
Read Repair on Apache Cassandra Wiki. http://wiki.apache.org/cassandra/ReadRepair. Accessed 11 May 2015
Anti-entropy on apache cassandra Wiki. https://wiki.apache.org/cassandra/AntiEntropy. Accessed 11 May 2015
Vogels, W.: Eventually consistent. Queue 6(6), 14–19 (2008). doi:10.1145/1466443.1466448
SSTable in Cassandra. http://wiki.apache.org/cassandra/MemtableHrBSSTableHrB. Accessed 16 Dec 2015
A C++ Cassandra simulator. https://github.com/rajath26/CassandraHrBSimulatorHrB. Accessed 11 May 2015
Network emulation testbed home. https://www.emulab.net/. Accessed 11 May 2015
Architecture internals on Cassandra Wiki. https://wiki.apache.org/cassandra/ArchitectureInternals. Accessed 11 July 2015
Java Driver 2.0 for apache Cassandra. http://docs.datastax.com/en/developer/java-driver/2.0/java-driver/whatsNew2.html. Accessed 11 July 2015
Datastax Java Driver for Apache Cassandra on Github. https://github.com/datastax/java-driver. Accessed 11 May 2015
Keyspace docuemntation on Datastax. http://docs.datastax.com/en/cql/3.0/cql/cqlXXSlahUndXXreference/createXXSlahUndXXkeyspaceXXSlahUndXXr.html. Accessed 11 July 2015
Couchbase. http://www.couchbase.com/. Accessed 11 May 2015
Shapiro, M., Preguiça, N., Baquero, C., Zawirski, M.: Conflict-free replicated data types. In: Proceedings of the 13th International Conference on Stabilization, Safety, and Security of Distributed Systems, ser. SSS’11, pp. 386–400. Springer, Berlin, 2011. http://dl.acm.org/citation.cfmid=2050613.2050642
What’s new in Cassandra 2.1: better implementation of counters. http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters. Accessed 16 Dec 2014
No more over-counting: making apache storm counters easy and idempotent using kafka and redis. https://blog.deck36.de/no-more-over-counting-//making-counters-in-apache-storm-idempotent-using-redis-hyperloglog/. Accessed 11 May 2015
Redis. http://redis.io/. Accessed 11 May 2015
Apache Kafka. http://kafka.apache.org/. Accessed 11 May 2015
Rhea, S.C., Kubiatowicz, J.: Probabilistic location and routing (2002)
Mitzenmacher, M.: Compressed bloom filters. In: Proceedings of the 20th Annual ACM Symposium on Principles of Distributed Computing, ser. PODC ’01, pp. 144–150. ACM, New York (2001). http://doi.acm.org/10.1145/383962.384004
Bonomi, F., Mitzenmacher, M., Panigrahy, R., Singh, S., Varghese, G.: An improved construction for counting bloom filters. In: Proceedings of the 14th Conference on Annual European Symposium, ser. ESA’06, vol. 14, pp. 684–695. Springer, London (2006). http://dx.doi.org/10.1007/11841036XXSlahUndXX61
Cohen, S., Matias, Y.: Spectral bloom filters. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’03, pp. 241–252. ACM, New York (2003). http://doi.acm.org/10.1145/872757.872787
Deng, F., Rafiei, D.: Approximately detecting duplicates for streaming data using stable bloom filters. In: Proceedings of the 2006 ACM SIGMOD International Conference on Management of Data, ser. SIGMOD ’06, pp. 25–36. ACM, New York (2006). http://doi.acm.org/10.1145/1142473.1142477
Acknowledgments
This work was supported in part by the following grants: NSF CNS 1409416, NSF CCF 0964471, NSF CNS 1319527, AFOSR/AFRL FA8750-11-2-0084, and a generous gift from Microsoft.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Subramanyam, R., Gupta, I., Leslie, L.M. et al. Idempotent distributed counters using a forgetful bloom filter. Cluster Comput 19, 879–892 (2016). https://doi.org/10.1007/s10586-016-0567-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10586-016-0567-8