Abstract
We describe an algorithm that, for any v ∈ [2, n], constructs the suffix array of a string of length n in \( \mathcal{O}\left( {vn + n log{\mathbf{ }}n} \right) \) time using \( \mathcal{O}\left( {v + n/\sqrt v } \right) \) space in addition to the input (the string) and the output (the suffix array). By setting v = log n, we obtain an \( \mathcal{O}\left( {n log n} \right) \) time algorithm using \( \mathcal{O}\left( {n/\sqrt {log n} } \right) \) extra space. This solves the open problem stated by Manzini and Ferragina [ESA ’02] of whether there exists a lightweight (sublinear extra space) \( \mathcal{O}\left( {n log n} \right) \) time algorithm. The key idea of the algorithm is to first sort a sample of suffixes chosen using mathematical constructs called difference covers. The algorithm is not only lightweight but also fast in practice as demonstrated by experiments. Additionally, we describe fast and lightweight suffix array checkers, i.e., algorithms that check the correctness of a suffix array.
Partially supported by the Future and Emerging Technologies programme of the EU under contract number IST-1999-14186 (ALCOM-FT).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
M. I. Abouelhoda, S. Kurtz, and E. Ohlebusch. The enhanced suffix array and its applications to genome analysis. In Proc. 2nd Workshop on Algorithms in Bioinformatics, volume 2452 of LNCS, pages 449–463. Springer, 2002.
A. Andersson, N. J. Larsson, and K. Swanson. Suffix trees on words. Algorithmica, 23(3):246–260, 1999.
J. L. Bentley and R. Sedgewick. Fast algorithms for sorting and searching strings. In Proc. 8th Annual Symposium on Discrete Algorithms, pages 360–369. ACM, 1997.
M. Blum and S. Kannan. Designing programs that check their work. J. ACM, 42(1):269–291, Jan. 1995.
M. Burrows and D. J. Wheeler. A block-sorting lossless data compression algorithm. Technical Report 124, SRC (digital, Palo Alto), May 1994.
R. Clifford. Distributed and paged suffix trees for large genetic databases. In Proc. 14th Annual Symposium on Combinatorial Pattern Matching. Springer, 2003. This volume.
C. J. Colbourn and A. C. H. Ling. Quorums from difference covers. Inf. Process. Lett., 75(1–2):9–12, July 2000.
M. Farach. Optimal suffix tree construction with large alphabets. In Proc. 38th Annual Symposium on Foundations of Computer Science, pages 137–143. IEEE, 1997.
G. Gonnet, R. Baeza-Yates, and T. Snider. New indices for text: PAT trees and PAT arrays. In W. B. Frakes and R. Baeza-Yates, editors, Information Retrieval: Data Structures & Algorithms. Prentice-Hall, 1992.
H. Itoh and H. Tanaka. An efficient method for in memory construction of suffix arrays. In Proc. 6th Symposium on String Processing and Information Retrieval, pages 125–136. IEEE, 1999.
J. Kärkkäinen and P. Sanders. Simple linear work suffix array construction. In Proc. 13th International Conference on Automata, Languages and Programming. Springer, 2003. To appear.
J. Kärkkäinen and E. Ukkonen. Sparse suffix trees. In Proc. 2nd Annual International Conference on Computing and Combinatorics, volume 1090 of LNCS, pages 219–230. Springer, 1996.
R. M. Karp, R. E. Miller, and A. L. Rosenberg. Rapid identification of repeated patterns in strings, trees and arrays. In Proc. 4th Annual Symposium on Theory of Computing, pages 125–136. ACM, 1972.
T. Kasai, G. Lee, H. Arimura, S. Arikawa, and K. Park. Linear-time longest-common-prefix computation in suffix arrays and its applications. In Proc. 12th Annual Symposium on Combinatorial Pattern Matching, volume 2089 of LNCS, pages 181–192. Springer, 2001.
J. Kilian, S. Kipnis, and C. E. Leiserson. The organization of permutation architectures with bused interconnections. IEEE Transactions on Computers, 39(11):1346–1358, Nov. 1990.
D. K. Kim, J. S. Sim, H. Park, and K. Park. Linear-time construction of suffix arrays. In Proc. 14th Annual Symposium on Combinatorial Pattern Matching. Springer, 2003. This volume.
P. Ko and S. Aluru. Linear time construction of suffix arrays. In Proc. 14th Annual Symposium on Combinatorial Pattern Matching. Springer, 2003. This volume.
S. Kurtz. Reducing the space requirement of suffix trees. Software — Practice and Experience, 29(13):1149–1171, 1999.
N. J. Larsson and K. Sadakane. Faster suffix sorting. Technical report LU-CSTR: 99-214, Dept. of Computer Science, Lund University, Sweden, 1999.
W.-S. Luk and T.-T. Wong. Two new quorum based algorithms for distributed mutual exclusion. In Proc. 17th International Conference on Distributed Computing Systems, pages 100–106. IEEE, 1997.
U. Manber and G. Myers. Suffix arrays: A new method for on-line string searches. SIAM J. Comput., 22(5):935–948, Oct. 1993.
G. Manzini and P. Ferragina. Engineering a lightweight suffix array construction algorithm. In Proc. 10th Annual European Symposium on Algorithms, volume 2461 of LNCS, pages 698–710. Springer, 2002.
E. M. McCreight. A space-economic suffix tree construction algorithm. J. ACM, 23(2):262–272, 1976.
J. Seward. On the performance of BWT sorting algorithms. In Proc. Data Compression Conference, pages 173–182. IEEE, 2000.
J. Seward. The bzip2 and libbzip2 official home page, 2002. http://sources.redhat.com/bzip2/.
E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249–260, 1995.
H. Wasserman and M. Blum. Software reliability via run-time result-checking. J. ACM, 44(6):826–849, Nov. 1997.
P. Weiner. Linear pattern matching algorithm. In Proc. 14th Symposium on Switching and Automata Theory, pages 1–11. IEEE, 1973.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Burkhardt, S., Kärkkäinen, J. (2003). Fast Lightweight Suffix Array Construction and Checking. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds) Combinatorial Pattern Matching. CPM 2003. Lecture Notes in Computer Science, vol 2676. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44888-8_5
Download citation
DOI: https://doi.org/10.1007/3-540-44888-8_5
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40311-1
Online ISBN: 978-3-540-44888-4
eBook Packages: Springer Book Archive