Skip to main content

Fast Lightweight Suffix Array Construction and Checking

  • Conference paper
  • First Online:
Combinatorial Pattern Matching (CPM 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2676))

Included in the following conference series:

Abstract

We describe an algorithm that, for any v ∈ [2, n], constructs the suffix array of a string of length n in \( \mathcal{O}\left( {vn + n log{\mathbf{ }}n} \right) \) time using \( \mathcal{O}\left( {v + n/\sqrt v } \right) \) space in addition to the input (the string) and the output (the suffix array). By setting v = log n, we obtain an \( \mathcal{O}\left( {n log n} \right) \) time algorithm using \( \mathcal{O}\left( {n/\sqrt {log n} } \right) \) extra space. This solves the open problem stated by Manzini and Ferragina [ESA ’02] of whether there exists a lightweight (sublinear extra space) \( \mathcal{O}\left( {n log n} \right) \) time algorithm. The key idea of the algorithm is to first sort a sample of suffixes chosen using mathematical constructs called difference covers. The algorithm is not only lightweight but also fast in practice as demonstrated by experiments. Additionally, we describe fast and lightweight suffix array checkers, i.e., algorithms that check the correctness of a suffix array.

Partially supported by the Future and Emerging Technologies programme of the EU under contract number IST-1999-14186 (ALCOM-FT).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. M. I. Abouelhoda, S. Kurtz, and E. Ohlebusch. The enhanced suffix array and its applications to genome analysis. In Proc. 2nd Workshop on Algorithms in Bioinformatics, volume 2452 of LNCS, pages 449–463. Springer, 2002.

    Chapter  Google Scholar 

  2. A. Andersson, N. J. Larsson, and K. Swanson. Suffix trees on words. Algorithmica, 23(3):246–260, 1999.

    Article  MATH  MathSciNet  Google Scholar 

  3. J. L. Bentley and R. Sedgewick. Fast algorithms for sorting and searching strings. In Proc. 8th Annual Symposium on Discrete Algorithms, pages 360–369. ACM, 1997.

    Google Scholar 

  4. M. Blum and S. Kannan. Designing programs that check their work. J. ACM, 42(1):269–291, Jan. 1995.

    Article  MATH  Google Scholar 

  5. M. Burrows and D. J. Wheeler. A block-sorting lossless data compression algorithm. Technical Report 124, SRC (digital, Palo Alto), May 1994.

    Google Scholar 

  6. R. Clifford. Distributed and paged suffix trees for large genetic databases. In Proc. 14th Annual Symposium on Combinatorial Pattern Matching. Springer, 2003. This volume.

    Google Scholar 

  7. C. J. Colbourn and A. C. H. Ling. Quorums from difference covers. Inf. Process. Lett., 75(1–2):9–12, July 2000.

    Article  MathSciNet  Google Scholar 

  8. M. Farach. Optimal suffix tree construction with large alphabets. In Proc. 38th Annual Symposium on Foundations of Computer Science, pages 137–143. IEEE, 1997.

    Google Scholar 

  9. G. Gonnet, R. Baeza-Yates, and T. Snider. New indices for text: PAT trees and PAT arrays. In W. B. Frakes and R. Baeza-Yates, editors, Information Retrieval: Data Structures & Algorithms. Prentice-Hall, 1992.

    Google Scholar 

  10. H. Itoh and H. Tanaka. An efficient method for in memory construction of suffix arrays. In Proc. 6th Symposium on String Processing and Information Retrieval, pages 125–136. IEEE, 1999.

    Google Scholar 

  11. J. Kärkkäinen and P. Sanders. Simple linear work suffix array construction. In Proc. 13th International Conference on Automata, Languages and Programming. Springer, 2003. To appear.

    Google Scholar 

  12. J. Kärkkäinen and E. Ukkonen. Sparse suffix trees. In Proc. 2nd Annual International Conference on Computing and Combinatorics, volume 1090 of LNCS, pages 219–230. Springer, 1996.

    Google Scholar 

  13. R. M. Karp, R. E. Miller, and A. L. Rosenberg. Rapid identification of repeated patterns in strings, trees and arrays. In Proc. 4th Annual Symposium on Theory of Computing, pages 125–136. ACM, 1972.

    Google Scholar 

  14. T. Kasai, G. Lee, H. Arimura, S. Arikawa, and K. Park. Linear-time longest-common-prefix computation in suffix arrays and its applications. In Proc. 12th Annual Symposium on Combinatorial Pattern Matching, volume 2089 of LNCS, pages 181–192. Springer, 2001.

    Google Scholar 

  15. J. Kilian, S. Kipnis, and C. E. Leiserson. The organization of permutation architectures with bused interconnections. IEEE Transactions on Computers, 39(11):1346–1358, Nov. 1990.

    Article  Google Scholar 

  16. D. K. Kim, J. S. Sim, H. Park, and K. Park. Linear-time construction of suffix arrays. In Proc. 14th Annual Symposium on Combinatorial Pattern Matching. Springer, 2003. This volume.

    Google Scholar 

  17. P. Ko and S. Aluru. Linear time construction of suffix arrays. In Proc. 14th Annual Symposium on Combinatorial Pattern Matching. Springer, 2003. This volume.

    Google Scholar 

  18. S. Kurtz. Reducing the space requirement of suffix trees. Software — Practice and Experience, 29(13):1149–1171, 1999.

    Article  Google Scholar 

  19. N. J. Larsson and K. Sadakane. Faster suffix sorting. Technical report LU-CSTR: 99-214, Dept. of Computer Science, Lund University, Sweden, 1999.

    Google Scholar 

  20. W.-S. Luk and T.-T. Wong. Two new quorum based algorithms for distributed mutual exclusion. In Proc. 17th International Conference on Distributed Computing Systems, pages 100–106. IEEE, 1997.

    Google Scholar 

  21. U. Manber and G. Myers. Suffix arrays: A new method for on-line string searches. SIAM J. Comput., 22(5):935–948, Oct. 1993.

    Article  MATH  MathSciNet  Google Scholar 

  22. G. Manzini and P. Ferragina. Engineering a lightweight suffix array construction algorithm. In Proc. 10th Annual European Symposium on Algorithms, volume 2461 of LNCS, pages 698–710. Springer, 2002.

    Google Scholar 

  23. E. M. McCreight. A space-economic suffix tree construction algorithm. J. ACM, 23(2):262–272, 1976.

    Article  MATH  MathSciNet  Google Scholar 

  24. J. Seward. On the performance of BWT sorting algorithms. In Proc. Data Compression Conference, pages 173–182. IEEE, 2000.

    Google Scholar 

  25. J. Seward. The bzip2 and libbzip2 official home page, 2002. http://sources.redhat.com/bzip2/.

  26. E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249–260, 1995.

    Article  MATH  MathSciNet  Google Scholar 

  27. H. Wasserman and M. Blum. Software reliability via run-time result-checking. J. ACM, 44(6):826–849, Nov. 1997.

    Article  MATH  MathSciNet  Google Scholar 

  28. P. Weiner. Linear pattern matching algorithm. In Proc. 14th Symposium on Switching and Automata Theory, pages 1–11. IEEE, 1973.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Burkhardt, S., Kärkkäinen, J. (2003). Fast Lightweight Suffix Array Construction and Checking. In: Baeza-Yates, R., Chávez, E., Crochemore, M. (eds) Combinatorial Pattern Matching. CPM 2003. Lecture Notes in Computer Science, vol 2676. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44888-8_5

Download citation

  • DOI: https://doi.org/10.1007/3-540-44888-8_5

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40311-1

  • Online ISBN: 978-3-540-44888-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics