Skip to main content

Smaller Fully-Functional Bidirectional BWT Indexes

  • Conference paper
  • First Online:
String Processing and Information Retrieval (SPIRE 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12303))

Included in the following conference series:

  • 484 Accesses

Abstract

Burrows-Wheeler indexes that support both extending and contracting any substring of the text T of length n on which they are built, in any direction, provide substantial flexibility in traversing the text and can be used to implement several algorithms. The practical appeal of such indexes is contingent on them being compact, and current designs that are sensitive to the compressibility of the input take either \(O(e+\overline{e})\) words of space, where e and \(\overline{e}\) are the number of right and left extensions of the maximal repeats of T, or \(O(r\log (n/r)+\overline{r}\log (n/\overline{r}))\) words, where r and \(\overline{r}\) are the number of runs in the Burrows-Wheeler transform of T and of its reverse. In this paper we describe a fully-functional bidirectional index that takes \(O(m+r+\overline{r})\) words, where m is the number of maximal repeats of T, as well as a variant that takes \(O(r+\overline{r})\) words.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    \(\mathtt {select}_{a}(S,i)\) is the well-known select operation on string S with character a and rank i, and C[a] for \(a \in [0..\sigma ]\) contains the number of occurrences of all characters smaller than a in lexicographic order.

References

  1. Alstrup, S., Stolting Brodal, G., Rauhe, T.: New data structures for orthogonal range searching. In: Proceedings of the 41st Annual Symposium on Foundations of Computer Science, pp. 198–207 (2000)

    Google Scholar 

  2. Amir, A., Landau, G.M., Lewenstein, M., Sokol, D.: Dynamic text and static pattern matching. ACM Trans. Algorithms (TALG) 3(2), 19 (2007)

    Article  MathSciNet  Google Scholar 

  3. Belazzougui, D., Cunial, F.: Fully-functional bidirectional Burrows-Wheeler indexes and infinite-order de Bruijn graphs. In: 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2019)

    Google Scholar 

  4. Belazzougui, D., Cunial, F., Gagie, T., Prezza, N., Raffinot, M.: Composite repetition-aware data structures. In: Cicalese, F., Porat, E., Vaccaro, U. (eds.) CPM 2015. LNCS, vol. 9133, pp. 26–39. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19929-0_3

    Chapter  Google Scholar 

  5. Belazzougui, D., Cunial, F., Kärkkäinen, J., Mäkinen, V.: Versatile succinct representations of the bidirectional Burrows-Wheeler transform. In: Bodlaender, H.L., Italiano, G.F. (eds.) ESA 2013. LNCS, vol. 8125, pp. 133–144. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40450-4_12

    Chapter  Google Scholar 

  6. Cánovas, R., Rivals, E.: Full compressed affix tree representations. In: Data Compression Conference (DCC 2017), pp. 102–111. IEEE (2017)

    Google Scholar 

  7. Crochemore, M., Epifanio, C., Grossi, R., Mignosi, F.: Linear-size suffix tries. Theoret. Comput. Sci. 638, 171–178 (2016)

    Article  MathSciNet  Google Scholar 

  8. Cunial, F., Alanko, J., Belazzougui, D.: A framework for space-efficient variable-order Markov models. Bioinformatics 35(22), 4607–4616 (2019)

    Article  Google Scholar 

  9. Farach, M., Muthukrishnan, S.: Perfect hashing for strings: formalization and algorithms. In: Hirschberg, D., Myers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 130–140. Springer, Heidelberg (1996). https://doi.org/10.1007/3-540-61258-0_11

    Chapter  Google Scholar 

  10. Gagie, T., Navarro, G., Prezza, N.: Fully functional suffix trees and optimal text searching in BWT-runs bounded space. J. ACM 67(1), 1–54 (2020)

    Article  Google Scholar 

  11. Hagerup, T., Miltersen, P.B., Pagh, R.: Deterministic dictionaries. J. Algorithms 41(1), 69–85 (2001)

    Article  MathSciNet  Google Scholar 

  12. Maaß, M.G.: Linear bidirectional on-line construction of affix trees. In: Giancarlo, R., Sankoff, D. (eds.) CPM 2000. LNCS, vol. 1848, pp. 320–334. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45123-4_27

    Chapter  Google Scholar 

  13. Mäkinen, V., Navarro, G.: Succinct suffix arrays based on run-length encoding. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 45–56. Springer, Heidelberg (2005). https://doi.org/10.1007/11496656_5

    Chapter  Google Scholar 

  14. Mäkinen, V., Navarro, G., Sirén, J., Välimäki, N.: Storage and retrieval of highly repetitive sequence collections. J. Comput. Biol. 17(3), 281–308 (2010)

    Article  MathSciNet  Google Scholar 

  15. Munro, J.I., Navarro, G., Nekrich, Y.: Space-efficient construction of compressed indexes in deterministic linear time. In: Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 408–424. SIAM (2017)

    Google Scholar 

  16. Okanohara, D., Sadakane, K.: Practical entropy-compressed rank/select dictionary. In: 2007 Proceedings of the Ninth Workshop on Algorithm Engineering and Experiments (ALENEX), pp. 60–70. SIAM (2007)

    Google Scholar 

  17. Schnattinger, T., Ohlebusch, E., Gog, S.: Bidirectional search in a string with wavelet trees and bidirectional matching statistics. Inf. Comput. 213, 13–22 (2012)

    Article  MathSciNet  Google Scholar 

  18. Sirén, J., Välimäki, N., Mäkinen, V., Navarro, G.: Run-length compressed indexes are superior for highly repetitive sequence collections. In: Amir, A., Turpin, A., Moffat, A. (eds.) SPIRE 2008. LNCS, vol. 5280, pp. 164–175. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89097-3_17

    Chapter  Google Scholar 

  19. Stoye, J.: Affix trees. Master’s thesis, Universität Bielefeld (2000)

    Google Scholar 

  20. Strothmann, D.: The affix array data structure and its applications to RNA secondary structure analysis. Theoret. Comput. Sci. 389(1–2), 278–294 (2007)

    Article  MathSciNet  Google Scholar 

  21. Takagi, T., Goto, K., Fujishige, Y., Inenaga, S., Arimura, H.: Linear-size CDAWG: new repetition-aware indexing and grammar compression. In: Fici, G., Sciortino, M., Venturini, R. (eds.) SPIRE 2017. LNCS, vol. 10508, pp. 304–316. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67428-5_26

    Chapter  Google Scholar 

  22. Willard, D.E.: Log-logarithmic worst-case range queries are possible in space \(\theta \) (n). Inf. Process. Lett. 17(2), 81–84 (1983)

    Article  MathSciNet  Google Scholar 

  23. Willard, D.E.: New data structures for orthogonal range queries. SIAM J. Comput. 14(1), 232–253 (1985)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We thank Timothy Chan for insights on static weighted 2D orthogonal range counting, and Gene Myers and the Myers’ lab for hosting and fruitful discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fabio Cunial .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Belazzougui, D., Cunial, F. (2020). Smaller Fully-Functional Bidirectional BWT Indexes. In: Boucher, C., Thankachan, S.V. (eds) String Processing and Information Retrieval. SPIRE 2020. Lecture Notes in Computer Science(), vol 12303. Springer, Cham. https://doi.org/10.1007/978-3-030-59212-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-59212-7_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-59211-0

  • Online ISBN: 978-3-030-59212-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics