Skip to main content

Broadword Computing and Fibonacci Code Speed Up Compressed Suffix Arrays

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5526))

Abstract

The suffix array of a string s of length n over the alphabet Σ is the permutation that gives us the lexicographic order of all suffixes of s. This popular index can be used to solve many problems in sequence analysis. In practice, one limitation of this data structure is its size of n logn bits, while the size of the text is n log|Σ| bits. For this reason compressed suffix arrays (CSAs) were introduced. The size of these CSAs is asymptotically less than or equal to the text size if the text is compressible, while maintaining O(logε n) access time to the elements (0 < ε ≤ 1). The goal of a good CSA implementation is to provide fast access time to the elements while using minimal space for the CSA. Both access time and space depend on the choice of a self-delimiting code for compression. We show that the Fibonacci code is superior to the Elias δ code for strings that are low compressible. Our second contribution are two new broadword methods that support the decoding of Fibonacci encoded numbers on 64 bit architectures. Furthermore, our experiments show that the use of known broadword methods speed up the decoding of Elias δ code for strings that are high compressible, like XML. Finally, we provide a new efficient C++ library for succinct data structures which includes a generic CSA for further experiments.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix trees with enhanced suffix arrays. J. Discrete Algorithms 2(1), 53–83 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  2. Elias, P.: Universal code word sets and representations of the integers. IEEE Transactions on Information Theory 21(2), 194–203 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  3. Ferragina, P., González, R., Navarro, G., Venturini, R.: Compressed text indexes: From theory to practice! arXiv:0712.3360v1 [cs.DS] (2007)

    Google Scholar 

  4. Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: FOCS, pp. 390–398 (2000)

    Google Scholar 

  5. Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35(2), 378–407 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  6. Jacobson, G.: Space-efficient static trees and graphs. In: FOCS, pp. 549–554. IEEE, Los Alamitos (1989)

    Google Scholar 

  7. Knuth, D.E.: The Art of Computer Programming. Pre-Fascicle 1a. A Draft of Section 7.1.3: Bitwise Tricks and Techniques (2008)

    Google Scholar 

  8. Lakos, J.: Large-Scale C++ Software Design. Eddison-Wesley (1996)

    Google Scholar 

  9. Mäkinen, V., Navarro, G.: Compressed compact suffix arrays. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 420–433. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  10. Manber, U., Myers, E.W.: Suffix arrays: A new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  11. Ian Munro, J.: Tables. In: Chandru, V., Vinay, V. (eds.) FSTTCS 1996. LNCS, vol. 1180, pp. 37–42. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  12. Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comput. Surv. 39(1) (2007)

    Google Scholar 

  13. Rahman, N., Raman, R.: Rank and select operations on binary strings. In: Encyclopedia of Algorithms. Springer, Heidelberg (2008)

    Google Scholar 

  14. Sadakane, K.: New text indexing functionalities of the compressed suffix arrays. J. Algorithms 48(2), 294–313 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  15. Vigna, S.: Broadword implementation of rank/select queries. In: McGeoch, C.C. (ed.) WEA 2008. LNCS, vol. 5038, pp. 154–168. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  16. Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes, 2nd edn. Morgan Kaufmann Publishers, San Francisco (1999)

    MATH  Google Scholar 

  17. Zeckendorf, E.: Représentation des nombres naturels par une somme de nombres de Fibonacci ou de nombres de Lucas. Bull. Soc. R. Sci. Liège 41, 179–182 (1972)

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gog, S. (2009). Broadword Computing and Fibonacci Code Speed Up Compressed Suffix Arrays. In: Vahrenhold, J. (eds) Experimental Algorithms. SEA 2009. Lecture Notes in Computer Science, vol 5526. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02011-7_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-02011-7_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-02010-0

  • Online ISBN: 978-3-642-02011-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics