Abstract
The retrieval problem is the problem of associating data with keys in a set. Formally, the data structure must store a function \(f\colon U\to \{0,1\}^r\) that has specified values on the elements of a given set S ⊆ U, |S| = n, but may have any value on elements outside S. All known methods (e. g. those based on perfect hash functions), induce a space overhead of Θ(n) bits over the optimum, regardless of the evaluation time. We show that for any k, query time O(k) can be achieved using space that is within a factor 1 + e − k of optimal, asymptotically for large n. The time to construct the data structure is O(n), expected. If we allow logarithmic evaluation time, the additive overhead can be reduced to O(loglogn) bits whp. A general reduction transfers the results on retrieval into analogous results on approximate membership, a problem traditionally addressed using Bloom filters. Thus we obtain space bounds arbitrarily close to the lower bound for this problem as well. The evaluation procedures of our data structures are extremely simple. For the results stated above we assume free access to fully random hash functions. This assumption can be justified using space o(n) to simulate full randomness on a RAM.
The main ideas for this paper were conceived while the authors were participating in the 2006 Seminar on Data Structures at IBFI Schloss Dagstuhl, Germany.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Alstrup, S., Brodal, G.S., Rauhe, T.: Optimal static range reporting in one dimension. In: Proc. 33rd ACM STOC, pp. 476–482 (2001)
Bloom, B.H.: Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13(7), 422–426 (1970)
Botelho, F.C., Pagh, R., Ziviani, N.: Simple and space-efficient minimal perfect hash functions. In: Dehne, F., Sack, J.-R., Zeh, N. (eds.) WADS 2007. LNCS, vol. 4619, pp. 139–150. Springer, Heidelberg (2007)
Broder, A.Z., Mitzenmacher, M.: Network applications of Bloom filters: A survey. In: Proc. 40th Annual Allerton Conference on Communication, Control, and Computing, pp. 636–646. ACM Press, New York (2002)
Cain, J.A., Sanders, P., Wormald, N.C.: The random graph threshold for k-orientiability and a fast algorithm for optimal multiple-choice allocation. In: Proc. 18th ACM-SIAM SODA, pp. 469–476 (2007)
Calkin, N.J.: Dependent sets of constant weight binary vectors. Combinatorics, Probability and Computing 6(3), 263–271 (1997)
Carter, L., Floyd, R.W., Gill, J., Markowsky, G., Wegman, M.N.: Exact and approximate membership testers. In: Proc. 10th ACM STOC, pp. 59–65 (1978)
Chazelle, B., Kilian, J., Rubinfeld, R., Tal, A.: The Bloomier filter: an efficient data structure for static support lookup tables. In: Proc. 15th ACM-SIAM SODA, pp. 30–39 (2004)
Cooper, C.: On the rank of random matrices. Random Struct. Algorithms 16(2), 209–232 (2001)
Czumaj, A., Riley, C., Scheideler, C.: Perfectly Balanced Allocation. In: Arora, S., Jansen, K., Rolim, J.D.P., Sahai, A. (eds.) RANDOM 2003 and APPROX 2003. LNCS, vol. 2764, pp. 240–251. Springer, Heidelberg (2003)
Dietzfelbinger, M.: Design strategies for minimal perfect hash functions. In: Proc. 4th Int. Symp. on Stochastic Algorithms: Foundations and Applications (SAGA). LNCS, vol. 4665, pp. 2–17. Springer, Heidelberg (2007)
Dietzfelbinger, M., Pagh, R.: Succinct data structures for retrieval and approximate membership, Technical Report, arXiv:0803.3693v1 [cs.DS] (March 26, 2008)
Dietzfelbinger, M., Weidling, C.: Balanced allocation and dictionaries with tightly packed constant size bins. Theoret. Comput. Sci. 380(1–2), 47–68 (2007)
Fernholz, D., Ramachandran, V.: The k-orientability thresholds for G n,p. In: Proc. 18th ACM-SIAM SODA, pp. 459–468 (2007)
Fotakis, D., Pagh, R., Sanders, P., Spirakis, P.G.: Space efficient hash tables with worst case constant access time. Theory Comput. Syst. 38(2), 229–248 (2005)
Hagerup, T., Tholey, T.: Efficient minimal perfect hashing in nearly minimal space. In: Ferreira, A., Reichel, H. (eds.) STACS 2001. LNCS, vol. 2010, pp. 317–326. Springer, Heidelberg (2001)
Majewski, B.S., Wormald, N.C., Havas, G., Czech, Z.J.: A family of perfect hashing methods. Computer J. 39(6), 547–554 (1996)
Mitzenmacher, M.: Compressed Bloom filters. IEEE/ACM Transactions on Networking 10(5), 604–612 (2002)
Mortensen, C.W., Pagh, R., Pǎtraşcu, M.: On dynamic range reporting in one dimension. In: Proc. 37th ACM STOC, pp. 104–111 (2005)
Pagh, R., Rodler, F.F.: Cuckoo Hashing. J. Algorithms 51, 122–144 (2004)
Panigrahy, R.: Efficient hashing with lookups in two memory accesses. In: Proc. 16th ACM-SIAM SODA, pp. 830–839 (2005)
Porat, E.: An optimal Bloom filter replacement based on matrix solving, Technical Report, arXiv:0804.1845v1 [cs.DS] (April 11, 2008)
Seiden, S.S., Hirschberg, D.S.: Finding succinct ordered minimal perfect hash functions. Inf. Process. Lett. 51(6), 283–288 (1994)
Zukowski, M., Heman, S., Boncz, P.A.: Architecture-conscious hashing. In: Proc. Int. Workshop on Data Management on New Hardware (DaMoN), Chicago, 8 pages, Article No. 6 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Dietzfelbinger, M., Pagh, R. (2008). Succinct Data Structures for Retrieval and Approximate Membership (Extended Abstract). In: Aceto, L., Damgård, I., Goldberg, L.A., Halldórsson, M.M., Ingólfsdóttir, A., Walukiewicz, I. (eds) Automata, Languages and Programming. ICALP 2008. Lecture Notes in Computer Science, vol 5125. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-70575-8_32
Download citation
DOI: https://doi.org/10.1007/978-3-540-70575-8_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-70574-1
Online ISBN: 978-3-540-70575-8
eBook Packages: Computer ScienceComputer Science (R0)