Abstract
We present a new sublinear-size index structure for q-grams. A q-gram index of the text is used in many approximate pattern matching algorithms. All earlier q-gram indexes have at least linear size. The new method takes advantage of repetitions in the text found by Lempel-Ziv parsing.
This work was supported by the Academy of Finland.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
R. Baeza-Yates: Space-time trade-offs in text retrieval. In: Proc. First South American Workshop on String Processing (ed. R. Baeza-Yates and N. Ziviani), Universidade Federal de Minas Gerais, 1993, 15–21.
A. Califano and I. Rigoutsos: FLASH: A fast look-up algorithm for string homology. In: Proc. First International Conference on Intelligent Systems for Molecular Biology (ed. L. Hunter, D. Searls, and J. Shavlik), AAAI Press, 1993, 56–64.
W. Chang and T. Marr: Approximate string matching and local similarity. In: Combinatorial Pattern Matching, Proceedings of 5th Annual Symposium (ed. M. Crochemore and D. Gusfield), Lecture Notes in Computer Science 807, Springer-Verlag, Berlin, 1994, 259–273.
N. Holsti and E. Sutinen: Approximate string matching using q-gram places. Proc. Seventh Finnish Symposium on Computer Science (ed. M. Penttonen), University of Joensuu, 1994, 23–32.
P. Jokinen and E. Ukkonen: Two algorithms for approximate string matching in static texts. In: Proceedings of Mathematical Foundations of Computer Science 1991 (ed. A. Tarlecki), Lecture Notes in Computer Science 520, Springer-Verlag, Berlin, 1991, 240–248.
J. Kärkkäinen and E. Ukkonen: Lempel-Ziv parsing and sublinear-size index structures for string matching. To appear in: Proc. 3rd South American Workshop on String Processing WSP '96.
G. Landau and U. Vishkin: Fast string matching with k differences. Journal of Computer and System Sciences 37 (1988), 63–78.
O. Lehtinen, E. Sutinen and J. Tarhio: Experiments on block indexing. To appear in: Proc. 3rd South American Workshop on String Processing WSP '96.
E. M. McCreight: A space-economical suffix tree construction algorithm. J. Assoc. Comput. Mach. 23 (1976), 262–272.
E. Myers: A sublinear algorithm for approximate keyword searching. Algorithmica 12, 4–5 (1994), 345–374.
P. Pevzner and M. Waterman: Multiple filtration and approximate pattern matching. Algorithmica 13 (1995), 135–154.
E. Sutinen and J. Tarhio: On using q-gram locations in approximate string matching. In: Proc. 3rd Annual European Symposium on Algorithms ESA '95 (ed. P. Spirakis), Lecture Notes in Computer Science 979, Springer, Berlin, 1995, 327–340.
E. Sutinen and J. Tarhio: Filtration with q-samples in approximate string matching. In: Proc. 7th Symposium on Combinatorial Pattern Matching CPM '96 (ed. D. Hirschberg and G. Myers), Lecture Notes in Computer Science 1075, Springer, Berlin, 1996, 50–63.
T. Takaoka: Approximate pattern matching with samples. Proceedings of ISAAC '94, Lecture Notes in Computer Science 834, Springer-Verlag, Berlin, 1994, 234–242.
E. Ukkonen: Finding approximate patterns in strings. Journal of Algorithms 6 (1985), 132–137.
E. Ukkonen: Approximate string matching with q-grams and maximal matches. Theoretical Computer Science 92, 1 (1992), 191–211.
E. Ukkonen: On-line construction of suffix-trees. Algorithmica 14 (1995), 249–260.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1996 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kärkkäinen, J., Sutinen, E. (1996). Lempel-Ziv index for q-grams. In: Diaz, J., Serna, M. (eds) Algorithms — ESA '96. ESA 1996. Lecture Notes in Computer Science, vol 1136. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-61680-2_69
Download citation
DOI: https://doi.org/10.1007/3-540-61680-2_69
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-61680-1
Online ISBN: 978-3-540-70667-0
eBook Packages: Springer Book Archive