Abstract
Numbers and strings are two objects manipulated by most programs. Hashing has been well-studied for numbers and it has been effective in practice. In contrast, basic hashing issues for strings remain largely unexplored. In this paper, we identify and formulate the core hashing problem for strings that we call substring hashing. Our main technical results are highly efficient sequential/parallel (CRCW PRAM) Las Vegas type algorithms that determine a perfect hash function for substring hashing. For example, given a binary string of length n, one of our algorithms finds a perfect hash function in O(log n) time, O(n) work, and O(n) space; the hash value for any substring can then be computed in O(log log n) time using a single processor. Our approach relies on a novel use of the suffix tree of a string. In implementing our approach, we design optimal parallel algorithms for the problem of determining weighted ancestors on a edge-weighted tree that may be of independent interest.
Supported by NSF Career Development Award CCR-9501942 and an Alfred P. Sloan Research Fellowship.
Partly supported by DIMACS (Center for Discrete Mathematics and Theoretical Computer Science), and partly supported by ALCOM IT.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
A. Amir, M. Farach, and Y. Matias. Efficient randomized dictionary matching algorithms. Proc. of 3rd Combinatorial Pattern Matching Conference, pages 259–272, 1992. Tucson, Arizona.
S.F. Altschul, W. Gish, W. Miller, E.W Myers, and D. J. Lipman. Basic local alignment search tool. Journal of Molecular Biology, 215:403–410, 1990.
A. Apostolico, C. Iliopoulos, G.M. Landau, B. Scieber, and U. Vishkin. Parallel construction of a suffix tree with applications. Algorithmica, 3:347–365, 1988.
A. Broder. Applications of Karp-Rabin fingerprints. Manuscript, 1993.
Omer Berkman and Uzi Vishkin. Recursive *-tree parallel data-structure. In Proc. of the 30th IEEE Annual Symp. on Foundation of Computer Science, pages 196–202, 1989.
B. Chazelle. Computing on a free tree via complexity-preserving mappings. Algorithmica, 2:337–361, 1987.
Z J. Czech, G. Havas, and B S. Majewski. An optimal algorithm for generating minimal perfect hash functions. Technical Report 24, DIMACS, 1992.
P. Dietz. Finding level-ancestors in dynamic trees. Manuscript, 1992.
Michael L. Fredman, János Komlós, and Endre Szemerédi. Storing a sparse table with O(1) worst case access time. Journal of the ACM, 31(3):538–544, July 1984.
M. Farach and S. Muthukrishnan. Optimal parallel dictionary matching and compression. 7th Annual ACM Symposium on Parallel Algorithms and Architectures, 1995.
M. Farach and S. Muthukrishnan. Optimal Logarithmic Time Randomized Suffix Tree Construction. To be presented at the 23rd Intl. Colloq. on Automata, Languages and Programming, 1996.
L. Gasieniec and K. Park. Optimal parallel prefix matching. Proceedings of E.S.A., 1994.
H. Hampapuram and M. Fredman. Optimal bi-weighted binary trees and the complexity of mainitaining partial sums. Proc. IEEE Symp. on Foundations on Computer Sc, 1993, 480–485.
R. Hariharan and S. Muthukrishnan. Optimal parallel prefix matching. Proc. of 21st International Colloquium on Automata Languages and Programming, 1994.
D. E. Knuth. The Art of Computer Programming, V. 3: Sorting and Searching. Addison-Wesley, Reading, 1973.
D.E. Knuth, J.H. Morris, and V.R. Pratt. Fast pattern matching in strings. SIAM Journal on Computing, 6:323–350, 1977.
R.M. Karp and M.O. Rabin. Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development, 31:249–260, 1987.
S. Muthukrishnan. Detecting false matches in string matching algorithms. In Proc. of 4th Combinatorial Pattern Matching Conference, 1993.
M. Naor. String matching with preprocessing of text and pattern. Proc. of 18th International Colloquium on Automata Languages and Programming, pages 739–750, 1991.
M. Rabin. An algorithm for finding all repetitions. In A. Apostolico and Z. Galil, editors, Combinatorial Algorithms on Words, pages 85–96. Springer-Verlag, Berlin, 1984.
D. Sleator and R. Tarjan. A data structure for dynamic trees. Journal of Computer and System Sciences, 24, 1983.
S. C. Sahinalp and U. Vishkin. Symmetry breaking for suffix tree construction. Proc. of the 26th Ann. ACM Symp. on Theory of Computing, 1994.
P. van Emde Boas, R. Kaas, and E. Zijlstra. Design and implementation of an efficient priority queue. Math. Systems Theory, 10:99–127, 1977.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Farach, M., Muthukrishnan, S. (1996). Perfect hashing for strings: Formalization and algorithms. In: Hirschberg, D., Myers, G. (eds) Combinatorial Pattern Matching. CPM 1996. Lecture Notes in Computer Science, vol 1075. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-61258-0_11
Download citation
DOI: https://doi.org/10.1007/3-540-61258-0_11
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61258-2
Online ISBN: 978-3-540-68390-2
eBook Packages: Springer Book Archive