Skip to main content

Trade Off Between Compression and Search Times in Compact Suffix Array

  • Conference paper
  • First Online:
  • 467 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2153))

Abstract

Suffix array is a widely used full-text index that allows fast searches on the text. It is constructed by sorting all suffixes of the text in the lexicographic order and storing pointers to the suffixes in this order. Binary search is used for fast searches on the suffix array. Compact suffix array is a compressed form of the suffix array that still allows binary searches, but the search times are also dependent on the compression. In this paper, we answer some open questions concerning the compact suffix array, and study practical issues, such as the trade off between compression and search times, and show how to reduce the space requirement of the construction. Experimental results are provided in comparison with other search methods. The results show that usually the size of a compact suffix array is less than twice the size of the text, while the search times are still comparable to those of suffix arrays.

A work supported by the Academy of Finland under grant 22584.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. P. Ferragina and G. Manzini, Opportunistic Data Structures with Applications, In Proc. IEEE Symposium on Foundations of Computer Science, 2000.

    Google Scholar 

  2. P. Ferragina and G. Manzini, An Experimental Study of an Opportunistic Index, In Proc. ACM-SIAM Symposium on Discrete Algorithms, 2001, to appear.

    Google Scholar 

  3. R. Giegerich, S. Kurtz, and J. Stoye, Efficient Implementation of Lazy Suffix Trees, In Proc. Third Workshop on Algorithmic Engineering (WAE99), LNCS 1668, Springer Verlag, 1999, pp. 30–42.

    Google Scholar 

  4. G. H. Gonnet and R. Baeza-Yates, Handbook of Algorithms and Data Structures-In Pascal and C, Addison-Wesley, Wokingham, UK, 1991. (second edition).

    Google Scholar 

  5. G. H. Gonnet, R. A. Baeza-Yates, and T. Snider, Lexicographical indices for text: Inverted files vs. PAT trees, Technical Report OED-91-01, Centre for the New OED, University of Waterloo, 1991.

    Google Scholar 

  6. R. Grossi and J. Vitter, Compressed suffix arrays and suffix trees with applications to text indexing and string matching, In Proc. 32nd ACM Symposium on Theory of Computing, 2000, pp. 397–406.

    Google Scholar 

  7. R. N. Horspool, Practical fast searching in strings, Soft. Pract. and Exp., 10, 1980, pp. 501–506.

    Article  Google Scholar 

  8. T. Kasai, H. Arimura, and S. Arikawa, Virtual suffix trees: Fast computation of subword frequency using suffix arrays, In Proc. 1999 Winter LA Symposium, 1999, in Japanese.

    Google Scholar 

  9. J. Kärkkäinen, Repetition-Based Text Indexes, PhD Thesis, Report A-1999-4, Department of Computer Science, University of Helsinki, Finland, 1999.

    Google Scholar 

  10. N. Jesper Larsson and K. Sadakane, Faster Suffix Sorting, Technical Report, number LU-CS-TR:99-214, Department of Computer Science, Lund University, Sweden, 1999.

    Google Scholar 

  11. U. Manber and G. Myers, Suffix arrays: A new method for on-line string searches, SIAM J. Comput., 22, 1993, pp. 935–948.

    Article  MATH  MathSciNet  Google Scholar 

  12. E. M. McCreight, A space economical suffix tree construction algorithm, Journal of the ACM, 23, 1976, pp. 262–272.

    Article  MATH  MathSciNet  Google Scholar 

  13. V. Mäkinen, Compact Suffix Array, In Proc. 11th Annual Symposium on Combinatorial Pattern Matching (CPM 2000), LNCS 1848, 2000, pp. 305–319.

    Chapter  Google Scholar 

  14. E. Ukkonen, On-line construction of suffix-trees, Algorithmica, 14, 1995, pp. 249–260.

    Article  MATH  MathSciNet  Google Scholar 

  15. P. Weiner, Linear pattern matching algorithms, In Proc. IEEE 14th Annual Symposium on Switching and Automata Theory, 1973, pp. 1–11.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mäkinen, V. (2001). Trade Off Between Compression and Search Times in Compact Suffix Array. In: Buchsbaum, A.L., Snoeyink, J. (eds) Algorithm Engineering and Experimentation. ALENEX 2001. Lecture Notes in Computer Science, vol 2153. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44808-X_16

Download citation

  • DOI: https://doi.org/10.1007/3-540-44808-X_16

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42560-1

  • Online ISBN: 978-3-540-44808-2

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics