Abstract
Recent advances in random linear systems on finite fields have paved the way for the construction of constant-time data structures representing static functions and minimal perfect hash functions using less space with respect to existing techniques. The main obstruction for any practical application of these results is the cubic-time Gaussian elimination required to solve these linear systems: despite they can be made very small, the computation is still too slow to be feasible.
In this paper we describe in detail a number of heuristics and programming techniques to speed up the resolution of these systems by several orders of magnitude, making the overall construction competitive with the standard and widely used MWHC technique, which is based on hypergraph peeling. In particular, we introduce broadword programming techniques for fast equation manipulation and a lazy Gaussian elimination algorithm. We also describe a number of technical improvements to the data structure which further reduce space usage and improve lookup speed.
Our implementation of these techniques yields a minimal perfect hash function data structure occupying 2.24 bits per element, compared to 2.68 for MWHC-based ones, and a static function data structure which reduces the multiplicative overhead from 1.23 to 1.03.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Technically, the proof in the paper is for \(k>15\), but the author claim that the result can be proved for \(k\ge 3\) with the same techniques, and in practice we never needed more than two attempts to generate a solvable system.
- 2.
References
Aumüller, M., Dietzfelbinger, M., Rink, M.: Experimental variations of a theoretically good retrieval data structure. In: Fiat, A., Sanders, P. (eds.) ESA 2009. LNCS, vol. 5757, pp. 742–751. Springer, Heidelberg (2009)
Belazzougui, D., Boldi, P., Ottaviano, G., Venturini, R., Vigna, S.: Cache-oblivious peeling of random hypergraphs. In: 2014 Data Compression Conference (DCC 2014), pp. 352–361. IEEE (2014)
Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Monotone minimal perfect hashing: searching a sorted table with \(O(1)\) accesses. In: Proceedings of the 20th Annual ACM-SIAM Symposium on Discrete Mathematics (SODA), pp. 785–794. ACM, New York (2009)
Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Fast prefix search in little space, with applications. In: de Berg, M., Meyer, U. (eds.) ESA 2010, Part I. LNCS, vol. 6346, pp. 427–438. Springer, Heidelberg (2010)
Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Theory and practice of monotone minimal perfect hashing. ACM J. Exp. Algorithm. 16(3), 3.2:1–3.2:26 (2011)
Belazzougui, D., Botelho, F.C., Dietzfelbinger, M.: Hash, displace, and compress. In: Fiat, A., Sanders, P. (eds.) ESA 2009. LNCS, vol. 5757, pp. 682–693. Springer, Heidelberg (2009)
Belazzougui, D., Navarro, G.: Alphabet-independent compressed text indexing. In: Demetrescu, C., Halldórsson, M.M. (eds.) ESA 2011. LNCS, vol. 6942, pp. 748–759. Springer, Heidelberg (2011)
Belazzougui, D., Venturini, R.: Compressed static functions with applications. In: SODA, pp. 229–240 (2013)
Boldi, P., Marino, A., Santini, M., Vigna, S.: BUbiNG: massive crawling for the masses. In: Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion, WWW Companion 2014, pp. 227–228. International World Wide Web Conferences Steering Committee (2014)
Botelho, F.C., Pagh, R., Ziviani, N.: Practical perfect hashing in nearly optimal space. Inf. Syst. 38(1), 108–131 (2013)
Carter, L., Floyd, R., Gill, J., Markowsky, G., Wegman, M.: Exact and approximate membership testers. In: Proceedings of Symposium on Theory of Computation (STOC 1978), pp. 59–65. ACM Press (1978)
Chazelle, B., Kilian, J., Rubinfeld, R., Tal, A.: The Bloomier filter: an efficient data structure for static support lookup tables. In: Munro, J.I. (ed.) Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2004, pp. 30–39. SIAM (2004)
Dietzfelbinger, M., Goerdt, A., Mitzenmacher, M., Montanari, A., Pagh, R., Rink, M.: Tight thresholds for cuckoo hashing via XORSAT. In: Abramsky, S., Gavoille, C., Kirchner, C., Meyer auf der Heide, F., Spirakis, P.G. (eds.) ICALP 2010. LNCS, vol. 6198, pp. 213–225. Springer, Heidelberg (2010)
Dietzfelbinger, M., Pagh, R.: Succinct data structures for retrieval and approximate membership (extended abstract). In: Aceto, L., Damgård, I., Goldberg, L.A., Halldórsson, M.M., Ingólfsdóttir, A., Walukiewicz, I. (eds.) ICALP 2008, Part I. LNCS, vol. 5125, pp. 385–396. Springer, Heidelberg (2008)
Fountoulakis, N., Panagiotou, K.: Sharp load thresholds for cuckoo hashing. Random Struct. Algorithms 41(3), 306–333 (2012)
Frieze, A.M., Melsted, P.: Maximum matchings in random bipartite graphs and the space utilization of cuckoo hash tables. Random Struct. Algorithms 41(3), 334–364 (2012)
Goerdt, A., Falke, L.: Satisfiability thresholds beyond k \(-\)XORSAT. In: Hirsch, E.A., Karhumäki, J., Lepistö, A., Prilutskii, M. (eds.) CSR 2012. LNCS, vol. 7353, pp. 148–159. Springer, Heidelberg (2012)
Knuth, D.E.: The Art of Computer Programming. Pre-Fascicle 1A. Draft of Section 7.1.3: Bitwise Tricks and Techniques (2007)
LaMacchia, B.A., Odlyzko, A.M.: Solving large sparse linear systems over finite fields. In: Menezes, A., Vanstone, S.A. (eds.) CRYPTO 1990. LNCS, vol. 537, pp. 109–133. Springer, Heidelberg (1991)
Majewski, B.S., Wormald, N.C., Havas, G., Czech, Z.J.: A family of perfect hashing methods. Comput. J. 39(6), 547–554 (1996)
Odlyzko, A.M.: Discrete logarithms in finite fields and their cryptographic significance. In: Beth, T., Cot, N., Ingemarsson, I. (eds.) EUROCRYPT 1984. LNCS, vol. 209, pp. 224–314. Springer, Heidelberg (1985)
Rink, M.: Thresholds for matchings in random bipartite graphs with applications to hashing-based data structures. Ph.D. thesis, Technische Universität Ilmenau (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Genuzio, M., Ottaviano, G., Vigna, S. (2016). Fast Scalable Construction of (Minimal Perfect Hash) Functions. In: Goldberg, A., Kulikov, A. (eds) Experimental Algorithms. SEA 2016. Lecture Notes in Computer Science(), vol 9685. Springer, Cham. https://doi.org/10.1007/978-3-319-38851-9_23
Download citation
DOI: https://doi.org/10.1007/978-3-319-38851-9_23
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-38850-2
Online ISBN: 978-3-319-38851-9
eBook Packages: Computer ScienceComputer Science (R0)