Skip to main content

Fast Scalable Construction of (Minimal Perfect Hash) Functions

  • Conference paper
  • First Online:
Experimental Algorithms (SEA 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9685))

Included in the following conference series:

Abstract

Recent advances in random linear systems on finite fields have paved the way for the construction of constant-time data structures representing static functions and minimal perfect hash functions using less space with respect to existing techniques. The main obstruction for any practical application of these results is the cubic-time Gaussian elimination required to solve these linear systems: despite they can be made very small, the computation is still too slow to be feasible.

In this paper we describe in detail a number of heuristics and programming techniques to speed up the resolution of these systems by several orders of magnitude, making the overall construction competitive with the standard and widely used MWHC technique, which is based on hypergraph peeling. In particular, we introduce broadword programming techniques for fast equation manipulation and a lazy Gaussian elimination algorithm. We also describe a number of technical improvements to the data structure which further reduce space usage and improve lookup speed.

Our implementation of these techniques yields a minimal perfect hash function data structure occupying 2.24 bits per element, compared to 2.68 for MWHC-based ones, and a static function data structure which reduces the multiplicative overhead from 1.23 to 1.03.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Technically, the proof in the paper is for \(k>15\), but the author claim that the result can be proved for \(k\ge 3\) with the same techniques, and in practice we never needed more than two attempts to generate a solvable system.

  2. 2.

    http://law.di.unimi.it/.

References

  1. Aumüller, M., Dietzfelbinger, M., Rink, M.: Experimental variations of a theoretically good retrieval data structure. In: Fiat, A., Sanders, P. (eds.) ESA 2009. LNCS, vol. 5757, pp. 742–751. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  2. Belazzougui, D., Boldi, P., Ottaviano, G., Venturini, R., Vigna, S.: Cache-oblivious peeling of random hypergraphs. In: 2014 Data Compression Conference (DCC 2014), pp. 352–361. IEEE (2014)

    Google Scholar 

  3. Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Monotone minimal perfect hashing: searching a sorted table with \(O(1)\) accesses. In: Proceedings of the 20th Annual ACM-SIAM Symposium on Discrete Mathematics (SODA), pp. 785–794. ACM, New York (2009)

    Google Scholar 

  4. Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Fast prefix search in little space, with applications. In: de Berg, M., Meyer, U. (eds.) ESA 2010, Part I. LNCS, vol. 6346, pp. 427–438. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  5. Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Theory and practice of monotone minimal perfect hashing. ACM J. Exp. Algorithm. 16(3), 3.2:1–3.2:26 (2011)

    MathSciNet  MATH  Google Scholar 

  6. Belazzougui, D., Botelho, F.C., Dietzfelbinger, M.: Hash, displace, and compress. In: Fiat, A., Sanders, P. (eds.) ESA 2009. LNCS, vol. 5757, pp. 682–693. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  7. Belazzougui, D., Navarro, G.: Alphabet-independent compressed text indexing. In: Demetrescu, C., Halldórsson, M.M. (eds.) ESA 2011. LNCS, vol. 6942, pp. 748–759. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  8. Belazzougui, D., Venturini, R.: Compressed static functions with applications. In: SODA, pp. 229–240 (2013)

    Google Scholar 

  9. Boldi, P., Marino, A., Santini, M., Vigna, S.: BUbiNG: massive crawling for the masses. In: Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion, WWW Companion 2014, pp. 227–228. International World Wide Web Conferences Steering Committee (2014)

    Google Scholar 

  10. Botelho, F.C., Pagh, R., Ziviani, N.: Practical perfect hashing in nearly optimal space. Inf. Syst. 38(1), 108–131 (2013)

    Article  Google Scholar 

  11. Carter, L., Floyd, R., Gill, J., Markowsky, G., Wegman, M.: Exact and approximate membership testers. In: Proceedings of Symposium on Theory of Computation (STOC 1978), pp. 59–65. ACM Press (1978)

    Google Scholar 

  12. Chazelle, B., Kilian, J., Rubinfeld, R., Tal, A.: The Bloomier filter: an efficient data structure for static support lookup tables. In: Munro, J.I. (ed.) Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2004, pp. 30–39. SIAM (2004)

    Google Scholar 

  13. Dietzfelbinger, M., Goerdt, A., Mitzenmacher, M., Montanari, A., Pagh, R., Rink, M.: Tight thresholds for cuckoo hashing via XORSAT. In: Abramsky, S., Gavoille, C., Kirchner, C., Meyer auf der Heide, F., Spirakis, P.G. (eds.) ICALP 2010. LNCS, vol. 6198, pp. 213–225. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  14. Dietzfelbinger, M., Pagh, R.: Succinct data structures for retrieval and approximate membership (extended abstract). In: Aceto, L., Damgård, I., Goldberg, L.A., Halldórsson, M.M., Ingólfsdóttir, A., Walukiewicz, I. (eds.) ICALP 2008, Part I. LNCS, vol. 5125, pp. 385–396. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  15. Fountoulakis, N., Panagiotou, K.: Sharp load thresholds for cuckoo hashing. Random Struct. Algorithms 41(3), 306–333 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  16. Frieze, A.M., Melsted, P.: Maximum matchings in random bipartite graphs and the space utilization of cuckoo hash tables. Random Struct. Algorithms 41(3), 334–364 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  17. Goerdt, A., Falke, L.: Satisfiability thresholds beyond k \(-\)XORSAT. In: Hirsch, E.A., Karhumäki, J., Lepistö, A., Prilutskii, M. (eds.) CSR 2012. LNCS, vol. 7353, pp. 148–159. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  18. Knuth, D.E.: The Art of Computer Programming. Pre-Fascicle 1A. Draft of Section 7.1.3: Bitwise Tricks and Techniques (2007)

    Google Scholar 

  19. LaMacchia, B.A., Odlyzko, A.M.: Solving large sparse linear systems over finite fields. In: Menezes, A., Vanstone, S.A. (eds.) CRYPTO 1990. LNCS, vol. 537, pp. 109–133. Springer, Heidelberg (1991)

    Google Scholar 

  20. Majewski, B.S., Wormald, N.C., Havas, G., Czech, Z.J.: A family of perfect hashing methods. Comput. J. 39(6), 547–554 (1996)

    Article  Google Scholar 

  21. Odlyzko, A.M.: Discrete logarithms in finite fields and their cryptographic significance. In: Beth, T., Cot, N., Ingemarsson, I. (eds.) EUROCRYPT 1984. LNCS, vol. 209, pp. 224–314. Springer, Heidelberg (1985)

    Chapter  Google Scholar 

  22. Rink, M.: Thresholds for matchings in random bipartite graphs with applications to hashing-based data structures. Ph.D. thesis, Technische Universität Ilmenau (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sebastiano Vigna .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Genuzio, M., Ottaviano, G., Vigna, S. (2016). Fast Scalable Construction of (Minimal Perfect Hash) Functions. In: Goldberg, A., Kulikov, A. (eds) Experimental Algorithms. SEA 2016. Lecture Notes in Computer Science(), vol 9685. Springer, Cham. https://doi.org/10.1007/978-3-319-38851-9_23

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-38851-9_23

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-38850-2

  • Online ISBN: 978-3-319-38851-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics