Skip to main content

Access, Rank, and Select in Grammar-compressed Strings

  • Conference paper
  • First Online:
Algorithms - ESA 2015

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9294))

Abstract

Given a string S of length N on a fixed alphabet of σ symbols, a grammar compressor produces a context-free grammar G of size n that generates S and only S. In this paper we describe data structures to support the following operations on a grammar-compressed string: access(S,i,j) (return substring S[i,j]), rank c (S,i) (return the number of occurrences of symbol c before position i in S), and select c (S,i) (return the position of the ith occurrence of c in S). Our main result for access is a method that requires \(\O(n\log N)\) bits of space and \(\O(\log N+m/\log_\sigma N)\) time to extract m = j − i + 1 consecutive symbols from S. Alternatively, we can achieve \(\O(\log_\tau N+m/\log_\sigma N)\) query time using \(\O(n\tau\log_\tau (N/n)\log N)\) bits of space, matching a lower bound stated by Verbin and Yu for strings where N is polynomially related to n when τ = logε N. For rank and select we describe data structures of size \(\O(n\sigma\log N)\) bits that support the two operations in \(\O(\log N)\) time. We also extend our other structure to support both operations in \(\O(\log_\tau N)\) time using \(\O(n\tau\sigma\log_\tau (N/n)\log N)\) bits of space. When τ = logε N the query time is O(logN/loglogN) and we provide a hardness result showing that significantly improving this would imply a major breakthrough on a hard graph-theoretical problem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bannai, H., Gagie, T.I., Inenaga, S., Landau, G.M., Lewenstein, M.: An efficient algorithm to test square-freeness of strings compressed by straight-line programs. Information Processing Letters 112(19), 711–714 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  2. Bille, P., Cording, P.H., Gørtz, I.L.: Compressed subsequence matching and packed tree coloring. In: Kulikov, A.S., Kuznetsov, S.O., Pevzner, P. (eds.) CPM 2014. LNCS, vol. 8486, pp. 40–49. Springer, Heidelberg (2014)

    Google Scholar 

  3. Bille, P., Landau, G.M., Raman, R., Sadakane, K., Satti, S.R., Weimann, O.: Random access to grammar-compressed strings. In: Proc. 22nd SODA, pp. 373–389. SIAM (2011)

    Google Scholar 

  4. Bille, P., Cording, P.H., Gørtz, I.L., Sach, B., Vildhøj, H.W., Vind, S.: Fingerprints in compressed strings. In: Dehne, F., Solis-Oba, R., Sack, J.-R. (eds.) WADS 2013. LNCS, vol. 8037, pp. 146–157. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  5. Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Transactions on Information Theory 51(7), 2554–2576 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  6. Cohen, E., Halperin, E., Kaplan, H., Zwick, U.: Reachability and distance queries via 2-hop labels. SIAM Journal on Computing 32(5), 1338–1355 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  7. Ferragina, P., Venturini, R.: A simple storage scheme for strings achieving entropy bounds. Theoretical Computer Science 372(1), 115–121 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  8. Fredman, M.L., Willard, D.E.: Blasting through the information theoretic barrier with fusion trees. In: Proceedings of the Twenty-Second Annual ACM Symposium on Theory of Computing, pp. 1–7. ACM (1990)

    Google Scholar 

  9. Fredman, M.L., Willard, D.E.: Surpassing the information theoretic bound with fusion trees. Journal of Computer and System Sciences 47(3), 424–436 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  10. Golynski, A., Munro, J.I., Rao, S.S.: Rank/select operations on large alphabets: a tool for text indexing. In: Proc. 17th SODA, pp. 368–373. SIAM (2006)

    Google Scholar 

  11. Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: Proc. 14th SODA, pp. 841–850. SIAM (2003)

    Google Scholar 

  12. Hon, W.K., Patil, M., Shah, R., Thankachan, S.V., Vitter, J.S.: Indexes for document retrieval with relevance. In: Brodnik, A., López-Ortiz, A., Raman, V., Viola, A. (eds.) Ianfest-66. LNCS, vol. 8066, pp. 351–362. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  13. I, T., Matsubara, W., Shimohira, K., Inenaga, S., Bannai, H., Takeda, M., Narisawa, K., Shinohara, A.: Detecting regularities on grammar-compressed strings. In: Chatterjee, K., Sgall, J. (eds.) MFCS 2013. LNCS, vol. 8087, pp. 571–582. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  14. I, T., Nakashima, Y., Inenaga, S., Bannai, H., Takeda, M.: Faster lyndon factorization algorithms for SLP and LZ78 compressed text. In: Kurland, O., Lewenstein, M., Porat, E. (eds.) SPIRE 2013. LNCS, vol. 8214, pp. 174–185. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

  15. Inenaga, S., Bannai, H.: Finding characteristic substrings from compressed texts. International Journal of Foundations of Computer Science 23(2), 261–280 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  16. Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM Journal of Research and Development 31(2), 249–260 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  17. Karpinski, M., Rytter, W., Shinohara, A.: An efficient pattern-matching algorithm for strings with short descriptions. Nordic Journal of Computing 4, 172–186 (1997)

    MathSciNet  MATH  Google Scholar 

  18. Manzini, G.: An analysis of the Burrows-Wheeler transform. Journal of the ACM 48(3), 407–430 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  19. Munro, J.I.: Tables. In: Chandru, V., Vinay, V. (eds.) FSTTCS 1996. LNCS, vol. 1180, pp. 37–42. Springer, Heidelberg (1996)

    Chapter  Google Scholar 

  20. Navarro, G.: Indexing highly repetitive collections. In: Smyth, B. (ed.) IWOCA 2012. LNCS, vol. 7643, pp. 274–279. Springer, Heidelberg (2012)

    Google Scholar 

  21. Navarro, G.: Spaces, trees and colors: The algorithmic landscape of document retrieval on sequences. ACM Computing Surveys 46(4), article 52, 47 pages (2014)

    Google Scholar 

  22. Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Computing Surveys 39(1), article 2 (2007)

    Google Scholar 

  23. Navarro, G., Ordóñez, A.: Grammar compressed sequences with rank/Select support. In: Moura, E., Crochemore, M. (eds.) SPIRE 2014. LNCS, vol. 8799, pp. 31–44. Springer, Heidelberg (2014)

    Google Scholar 

  24. Okanohara, D., Sadakane, K.: Practical entropy-compressed rank/select dictionary. In: Proc. 9th ALENEX, pp. 60–70. SIAM (2007)

    Google Scholar 

  25. Raman, R., Raman, V., Rao, S.S.: Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Transactions on Algorithms 3(4) (2007)

    Google Scholar 

  26. Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comp. Sci. 302(1–3), 211–222 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  27. Verbin, E., Yu, W.: Data structure lower bounds on random access to grammar-compressed strings. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 247–258. Springer, Heidelberg (2013)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Djamal Belazzougui .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Belazzougui, D., Cording, P.H., Puglisi, S.J., Tabei, Y. (2015). Access, Rank, and Select in Grammar-compressed Strings. In: Bansal, N., Finocchi, I. (eds) Algorithms - ESA 2015. Lecture Notes in Computer Science(), vol 9294. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48350-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-48350-3_13

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-48349-7

  • Online ISBN: 978-3-662-48350-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics