Abstract
We present a new text indexing structure based on the run length encoding (RLE) of a text string \(T\) which, given the RLE of a query pattern \(P\), reports all the \(occ\) occurrences of \(P\) in \(T\) in \(O(m + occ + \log n)\) time, where \(n\) and \(m\) are the sizes of the RLEs of \(T\) and \(P\), respectively. The data structure requires \(n (2\log N + \log n + \log \sigma ) + O(n)\) bits of space, where \(N\) is the length of the uncompressed text string \(T\) and \(\sigma \) is the alphabet size. Moreover, using \(n (3\log N + \log n + \log \sigma ) + 2 \sigma \log \frac{N}{\sigma } + O(n \log \log n)\) bits of total space, our data structure can be enhanced to answer the beginning position of the lexicographically \(i\)th smallest suffix of \(T\) for a given rank \(i\) in \(O(\log ^2 n)\) time. All these data structures can be constructed in \(O(n \log n)\) time using \(O(n \log N)\) bits of extra space.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Apostolico, A., Erdös, P.L., Jüttner, A.: Parameterized searching with mismatches for run-length encoded strings. Theor. Comput., Sci. (2012)
Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Tech. Rep. SRC-RR-124, Systems Research Center (1994)
Chen, K.Y., Chao, K.M.: A fully compressed algorithm for computing the edit distance of run-length encoded strings. Algorithmica (2011)
Eltabakh, M.Y., Hon, W.K., Shah, R., Aref, W.G., Vitter, J.S.: The SBC-tree: an index for run-length compressed sequences. In: Proc. EDBT, pp. 523–534 (2008)
Fischer, J., Heun, V.: Space-efficient preprocessing schemes for range minimum queries on static arrays. SIAM J. Comput. 40(2), 465–492 (2011)
Freschi, V., Bogliolo, A.: Longest common subsequence between run-length-encoded strings: a new algorithm with improved parallelism. IPL 90(4), 167–173 (2004)
Golynski, A.: Optimal lower bounds for rank and select indexes. Theor. Comput. Sci. 387(3), 348–359 (2007)
Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35(2), 378–407 (2005)
Kasai, T., Lee, G., Arimura, H., Arikawa, S., Park, K.: Linear-time longest-common-prefix computation in suffix arrays and its applications. In: Proc. CPM 2001, pp. 181–192 (2001)
Ko, P., Aluru, S.: Space efficient linear time construction of suffix arrays. J. Discrete Algorithms 3(2–4), 143–156 (2005)
Lee, S., Park, K.: Dynamic rank/select structures with applications to run-length encoded texts. Theor. Comput. Sci. 410(43), 4402–4413 (2009)
Mäkinen, V., Navarro, G.: Succinct suffix arrays based on run-length encoding. Nord. J. Comput. 12(1), 40–66 (2005)
Mäkinen, V., Navarro, G., Sirén, J., Välimäki, N.: Storage and retrieval of highly repetitive sequence collections. J. Computational Biology 17(3), 281–308 (2010)
Manber, U., Myers, E.W.: Suffix arrays: A new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)
Navarro, G.: Wavelet trees for all. In: Proc. CPM, pp. 2–26 (2012)
Nong, G., Zhang, S., Chan, W.H.: Two efficient algorithms for linear time suffix array construction. IEEE Trans. Computers 60(10), 1471–1484 (2011)
Yamamoto, J., I, T., Bannai, H., Inenaga, S., Takeda, M.: Faster compact on-line Lempel-Ziv factorization. In: Proc. STACS 2014. pp. 675–686 (2014)
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory IT-23(3), 337–349 (1977)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Tamakoshi, Y., Goto, K., Inenaga, S., Bannai, H., Takeda, M. (2015). An Opportunistic Text Indexing Structure Based on Run Length Encoding. In: Paschos, V., Widmayer, P. (eds) Algorithms and Complexity. CIAC 2015. Lecture Notes in Computer Science(), vol 9079. Springer, Cham. https://doi.org/10.1007/978-3-319-18173-8_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-18173-8_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-18172-1
Online ISBN: 978-3-319-18173-8
eBook Packages: Computer ScienceComputer Science (R0)