Abstract
Suffix array is a widely used full-text index that allows fast searches on the text. It is constructed by sorting all suffixes of the text in the lexicographic order and storing pointers to the suffixes in this order. Binary search is used for fast searches on the suffix array. Compact suffix array is a compressed form of the suffix array that still allows binary searches, but the search times are also dependent on the compression. In this paper, we answer some open questions concerning the compact suffix array, and study practical issues, such as the trade off between compression and search times, and show how to reduce the space requirement of the construction. Experimental results are provided in comparison with other search methods. The results show that usually the size of a compact suffix array is less than twice the size of the text, while the search times are still comparable to those of suffix arrays.
A work supported by the Academy of Finland under grant 22584.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
P. Ferragina and G. Manzini, Opportunistic Data Structures with Applications, In Proc. IEEE Symposium on Foundations of Computer Science, 2000.
P. Ferragina and G. Manzini, An Experimental Study of an Opportunistic Index, In Proc. ACM-SIAM Symposium on Discrete Algorithms, 2001, to appear.
R. Giegerich, S. Kurtz, and J. Stoye, Efficient Implementation of Lazy Suffix Trees, In Proc. Third Workshop on Algorithmic Engineering (WAE99), LNCS 1668, Springer Verlag, 1999, pp. 30–42.
G. H. Gonnet and R. Baeza-Yates, Handbook of Algorithms and Data Structures-In Pascal and C, Addison-Wesley, Wokingham, UK, 1991. (second edition).
G. H. Gonnet, R. A. Baeza-Yates, and T. Snider, Lexicographical indices for text: Inverted files vs. PAT trees, Technical Report OED-91-01, Centre for the New OED, University of Waterloo, 1991.
R. Grossi and J. Vitter, Compressed suffix arrays and suffix trees with applications to text indexing and string matching, In Proc. 32nd ACM Symposium on Theory of Computing, 2000, pp. 397–406.
R. N. Horspool, Practical fast searching in strings, Soft. Pract. and Exp., 10, 1980, pp. 501–506.
T. Kasai, H. Arimura, and S. Arikawa, Virtual suffix trees: Fast computation of subword frequency using suffix arrays, In Proc. 1999 Winter LA Symposium, 1999, in Japanese.
J. Kärkkäinen, Repetition-Based Text Indexes, PhD Thesis, Report A-1999-4, Department of Computer Science, University of Helsinki, Finland, 1999.
N. Jesper Larsson and K. Sadakane, Faster Suffix Sorting, Technical Report, number LU-CS-TR:99-214, Department of Computer Science, Lund University, Sweden, 1999.
U. Manber and G. Myers, Suffix arrays: A new method for on-line string searches, SIAM J. Comput., 22, 1993, pp. 935–948.
E. M. McCreight, A space economical suffix tree construction algorithm, Journal of the ACM, 23, 1976, pp. 262–272.
V. Mäkinen, Compact Suffix Array, In Proc. 11th Annual Symposium on Combinatorial Pattern Matching (CPM 2000), LNCS 1848, 2000, pp. 305–319.
E. Ukkonen, On-line construction of suffix-trees, Algorithmica, 14, 1995, pp. 249–260.
P. Weiner, Linear pattern matching algorithms, In Proc. IEEE 14th Annual Symposium on Switching and Automata Theory, 1973, pp. 1–11.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mäkinen, V. (2001). Trade Off Between Compression and Search Times in Compact Suffix Array. In: Buchsbaum, A.L., Snoeyink, J. (eds) Algorithm Engineering and Experimentation. ALENEX 2001. Lecture Notes in Computer Science, vol 2153. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44808-X_16
Download citation
DOI: https://doi.org/10.1007/3-540-44808-X_16
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42560-1
Online ISBN: 978-3-540-44808-2
eBook Packages: Springer Book Archive