Abstract
Burrows-Wheeler indexes that support both extending and contracting any substring of the text T of length n on which they are built, in any direction, provide substantial flexibility in traversing the text and can be used to implement several algorithms. The practical appeal of such indexes is contingent on them being compact, and current designs that are sensitive to the compressibility of the input take either \(O(e+\overline{e})\) words of space, where e and \(\overline{e}\) are the number of right and left extensions of the maximal repeats of T, or \(O(r\log (n/r)+\overline{r}\log (n/\overline{r}))\) words, where r and \(\overline{r}\) are the number of runs in the Burrows-Wheeler transform of T and of its reverse. In this paper we describe a fully-functional bidirectional index that takes \(O(m+r+\overline{r})\) words, where m is the number of maximal repeats of T, as well as a variant that takes \(O(r+\overline{r})\) words.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
\(\mathtt {select}_{a}(S,i)\) is the well-known select operation on string S with character a and rank i, and C[a] for \(a \in [0..\sigma ]\) contains the number of occurrences of all characters smaller than a in lexicographic order.
References
Alstrup, S., Stolting Brodal, G., Rauhe, T.: New data structures for orthogonal range searching. In: Proceedings of the 41st Annual Symposium on Foundations of Computer Science, pp. 198–207 (2000)
Amir, A., Landau, G.M., Lewenstein, M., Sokol, D.: Dynamic text and static pattern matching. ACM Trans. Algorithms (TALG) 3(2), 19 (2007)
Belazzougui, D., Cunial, F.: Fully-functional bidirectional Burrows-Wheeler indexes and infinite-order de Bruijn graphs. In: 30th Annual Symposium on Combinatorial Pattern Matching (CPM 2019). Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2019)
Belazzougui, D., Cunial, F., Gagie, T., Prezza, N., Raffinot, M.: Composite repetition-aware data structures. In: Cicalese, F., Porat, E., Vaccaro, U. (eds.) CPM 2015. LNCS, vol. 9133, pp. 26–39. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19929-0_3
Belazzougui, D., Cunial, F., Kärkkäinen, J., Mäkinen, V.: Versatile succinct representations of the bidirectional Burrows-Wheeler transform. In: Bodlaender, H.L., Italiano, G.F. (eds.) ESA 2013. LNCS, vol. 8125, pp. 133–144. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-40450-4_12
Cánovas, R., Rivals, E.: Full compressed affix tree representations. In: Data Compression Conference (DCC 2017), pp. 102–111. IEEE (2017)
Crochemore, M., Epifanio, C., Grossi, R., Mignosi, F.: Linear-size suffix tries. Theoret. Comput. Sci. 638, 171–178 (2016)
Cunial, F., Alanko, J., Belazzougui, D.: A framework for space-efficient variable-order Markov models. Bioinformatics 35(22), 4607–4616 (2019)
Farach, M., Muthukrishnan, S.: Perfect hashing for strings: formalization and algorithms. In: Hirschberg, D., Myers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 130–140. Springer, Heidelberg (1996). https://doi.org/10.1007/3-540-61258-0_11
Gagie, T., Navarro, G., Prezza, N.: Fully functional suffix trees and optimal text searching in BWT-runs bounded space. J. ACM 67(1), 1–54 (2020)
Hagerup, T., Miltersen, P.B., Pagh, R.: Deterministic dictionaries. J. Algorithms 41(1), 69–85 (2001)
Maaß, M.G.: Linear bidirectional on-line construction of affix trees. In: Giancarlo, R., Sankoff, D. (eds.) CPM 2000. LNCS, vol. 1848, pp. 320–334. Springer, Heidelberg (2000). https://doi.org/10.1007/3-540-45123-4_27
Mäkinen, V., Navarro, G.: Succinct suffix arrays based on run-length encoding. In: Apostolico, A., Crochemore, M., Park, K. (eds.) CPM 2005. LNCS, vol. 3537, pp. 45–56. Springer, Heidelberg (2005). https://doi.org/10.1007/11496656_5
Mäkinen, V., Navarro, G., Sirén, J., Välimäki, N.: Storage and retrieval of highly repetitive sequence collections. J. Comput. Biol. 17(3), 281–308 (2010)
Munro, J.I., Navarro, G., Nekrich, Y.: Space-efficient construction of compressed indexes in deterministic linear time. In: Proceedings of the Twenty-Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 408–424. SIAM (2017)
Okanohara, D., Sadakane, K.: Practical entropy-compressed rank/select dictionary. In: 2007 Proceedings of the Ninth Workshop on Algorithm Engineering and Experiments (ALENEX), pp. 60–70. SIAM (2007)
Schnattinger, T., Ohlebusch, E., Gog, S.: Bidirectional search in a string with wavelet trees and bidirectional matching statistics. Inf. Comput. 213, 13–22 (2012)
Sirén, J., Välimäki, N., Mäkinen, V., Navarro, G.: Run-length compressed indexes are superior for highly repetitive sequence collections. In: Amir, A., Turpin, A., Moffat, A. (eds.) SPIRE 2008. LNCS, vol. 5280, pp. 164–175. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-89097-3_17
Stoye, J.: Affix trees. Master’s thesis, Universität Bielefeld (2000)
Strothmann, D.: The affix array data structure and its applications to RNA secondary structure analysis. Theoret. Comput. Sci. 389(1–2), 278–294 (2007)
Takagi, T., Goto, K., Fujishige, Y., Inenaga, S., Arimura, H.: Linear-size CDAWG: new repetition-aware indexing and grammar compression. In: Fici, G., Sciortino, M., Venturini, R. (eds.) SPIRE 2017. LNCS, vol. 10508, pp. 304–316. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67428-5_26
Willard, D.E.: Log-logarithmic worst-case range queries are possible in space \(\theta \) (n). Inf. Process. Lett. 17(2), 81–84 (1983)
Willard, D.E.: New data structures for orthogonal range queries. SIAM J. Comput. 14(1), 232–253 (1985)
Acknowledgements
We thank Timothy Chan for insights on static weighted 2D orthogonal range counting, and Gene Myers and the Myers’ lab for hosting and fruitful discussions.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Belazzougui, D., Cunial, F. (2020). Smaller Fully-Functional Bidirectional BWT Indexes. In: Boucher, C., Thankachan, S.V. (eds) String Processing and Information Retrieval. SPIRE 2020. Lecture Notes in Computer Science(), vol 12303. Springer, Cham. https://doi.org/10.1007/978-3-030-59212-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-59212-7_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59211-0
Online ISBN: 978-3-030-59212-7
eBook Packages: Computer ScienceComputer Science (R0)