Skip to main content

Lempel-Ziv index for q-grams

  • Conference paper
  • First Online:
Algorithms — ESA '96 (ESA 1996)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1136))

Included in the following conference series:

  • 191 Accesses

Abstract

We present a new sublinear-size index structure for q-grams. A q-gram index of the text is used in many approximate pattern matching algorithms. All earlier q-gram indexes have at least linear size. The new method takes advantage of repetitions in the text found by Lempel-Ziv parsing.

This work was supported by the Academy of Finland.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Baeza-Yates: Space-time trade-offs in text retrieval. In: Proc. First South American Workshop on String Processing (ed. R. Baeza-Yates and N. Ziviani), Universidade Federal de Minas Gerais, 1993, 15–21.

    Google Scholar 

  2. A. Califano and I. Rigoutsos: FLASH: A fast look-up algorithm for string homology. In: Proc. First International Conference on Intelligent Systems for Molecular Biology (ed. L. Hunter, D. Searls, and J. Shavlik), AAAI Press, 1993, 56–64.

    Google Scholar 

  3. W. Chang and T. Marr: Approximate string matching and local similarity. In: Combinatorial Pattern Matching, Proceedings of 5th Annual Symposium (ed. M. Crochemore and D. Gusfield), Lecture Notes in Computer Science 807, Springer-Verlag, Berlin, 1994, 259–273.

    Google Scholar 

  4. N. Holsti and E. Sutinen: Approximate string matching using q-gram places. Proc. Seventh Finnish Symposium on Computer Science (ed. M. Penttonen), University of Joensuu, 1994, 23–32.

    Google Scholar 

  5. P. Jokinen and E. Ukkonen: Two algorithms for approximate string matching in static texts. In: Proceedings of Mathematical Foundations of Computer Science 1991 (ed. A. Tarlecki), Lecture Notes in Computer Science 520, Springer-Verlag, Berlin, 1991, 240–248.

    Google Scholar 

  6. J. Kärkkäinen and E. Ukkonen: Lempel-Ziv parsing and sublinear-size index structures for string matching. To appear in: Proc. 3rd South American Workshop on String Processing WSP '96.

    Google Scholar 

  7. G. Landau and U. Vishkin: Fast string matching with k differences. Journal of Computer and System Sciences 37 (1988), 63–78.

    Google Scholar 

  8. O. Lehtinen, E. Sutinen and J. Tarhio: Experiments on block indexing. To appear in: Proc. 3rd South American Workshop on String Processing WSP '96.

    Google Scholar 

  9. E. M. McCreight: A space-economical suffix tree construction algorithm. J. Assoc. Comput. Mach. 23 (1976), 262–272.

    Google Scholar 

  10. E. Myers: A sublinear algorithm for approximate keyword searching. Algorithmica 12, 4–5 (1994), 345–374.

    Article  Google Scholar 

  11. P. Pevzner and M. Waterman: Multiple filtration and approximate pattern matching. Algorithmica 13 (1995), 135–154.

    Google Scholar 

  12. E. Sutinen and J. Tarhio: On using q-gram locations in approximate string matching. In: Proc. 3rd Annual European Symposium on Algorithms ESA '95 (ed. P. Spirakis), Lecture Notes in Computer Science 979, Springer, Berlin, 1995, 327–340.

    Google Scholar 

  13. E. Sutinen and J. Tarhio: Filtration with q-samples in approximate string matching. In: Proc. 7th Symposium on Combinatorial Pattern Matching CPM '96 (ed. D. Hirschberg and G. Myers), Lecture Notes in Computer Science 1075, Springer, Berlin, 1996, 50–63.

    Google Scholar 

  14. T. Takaoka: Approximate pattern matching with samples. Proceedings of ISAAC '94, Lecture Notes in Computer Science 834, Springer-Verlag, Berlin, 1994, 234–242.

    Google Scholar 

  15. E. Ukkonen: Finding approximate patterns in strings. Journal of Algorithms 6 (1985), 132–137.

    Article  Google Scholar 

  16. E. Ukkonen: Approximate string matching with q-grams and maximal matches. Theoretical Computer Science 92, 1 (1992), 191–211.

    Article  Google Scholar 

  17. E. Ukkonen: On-line construction of suffix-trees. Algorithmica 14 (1995), 249–260.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Josep Diaz Maria Serna

Rights and permissions

Reprints and permissions

Copyright information

© 1996 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kärkkäinen, J., Sutinen, E. (1996). Lempel-Ziv index for q-grams. In: Diaz, J., Serna, M. (eds) Algorithms — ESA '96. ESA 1996. Lecture Notes in Computer Science, vol 1136. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-61680-2_69

Download citation

  • DOI: https://doi.org/10.1007/3-540-61680-2_69

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-61680-1

  • Online ISBN: 978-3-540-70667-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics