Abstract
To store and search genomic databases efficiently, researchers have recently started building compressed self-indexes based on straight-line programs and LZ77. In this paper we show how, given a balanced straight-line program for a string S[1..n] whose LZ77 parse consists of z phrases, we can add \(\mathcal{O}{z log log z}\) words and obtain a compressed self-index for S such that, given a pattern P [1..m], we can list the occ occurrences of P in S in \(\mathcal{O}({m^{2} + (m + occ) log log n})\) time. All previous self-indexes are either larger or slower in the worst case.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Chan, T.M., Larsen, K.G., Pǎtraşcu, M.: Orthogonal range searching on the RAM, revisited. In: Proceedings of the 27th Symposium on Computational Geometry (SoCG), pp. 1–10 (2011)
Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Transactions on Information Theory 51(7), 2554–2576 (2005)
Chien, Y.-F., Hon, W.-K., Shah, R., Vitter, J.S.: Geometric Burrows-Wheeler Transform: Linking range searching and text indexing. In: Proceedings of the Data Compression Conference (DCC), pp. 252–261 (2008)
Claude, F., Navarro, G.: Self-indexed Text Compression Using Straight-Line Programs. In: Královič, R., Niwiński, D. (eds.) MFCS 2009. LNCS, vol. 5734, pp. 235–246. Springer, Heidelberg (2009)
Claude, F., Navarro, G.: Improved grammar-based self-indexes. Tech. Rep. 1110.4493, arxiv.org (2011)
Do, H.H., Jansson, J., Sadakane, K., Sung, W.K.: Indexing strings via textual substitutions from a reference, manuscript
van Emde Boas, P.: Preserving order in a forest in less than logarithmic time. In: Proceedings of the 16th Symposium on Foundations of Computer Science (FOCS), pp. 75–84 (1975)
Ferragina, P., Manzini, G.: Indexing compressed text. Journal of the ACM 52(4), 552–581 (2005)
Fischer, J.: Wee LCP. Information Processing Letters 110(8-9), 317–320 (2010)
Fischer, J.: Combined data structure for previous- and next-smaller-values. Theoretical Computer Science 412(22), 2451–2456 (2011)
Gabow, H.N., Bentley, J.L., Tarjan, R.E.: Scaling and related techniques for geometry problems. In: Proceedings of the 16th Symposium on Theory of Computing (STOC), pp. 135–143 (1984)
Gagie, T., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: A compressed self-index for genomic databases. Tech. Rep. 1110.1355, arxiv.org (2011)
Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: Proceedings of the 14th Symposium on Discrete Algorithms (SODA), pp. 841–850 (2003)
Hon, W.-K., Shah, R., Thankachan, S.V., Vitter, J.S.: On Entropy-Compressed Text Indexing in External Memory. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 75–89. Springer, Heidelberg (2009)
Huang, S., Lam, T.W., Sung, W.K., Tam, S.L., Yiu, S.M.: Indexing Similar DNA Sequences. In: Chen, B. (ed.) AAIM 2010. LNCS, vol. 6124, pp. 180–190. Springer, Heidelberg (2010)
Kreft, S., Navarro, G.: Self-indexing Based on LZ77. In: Giancarlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp. 41–54. Springer, Heidelberg (2011)
Kuruppu, S., Puglisi, S.J., Zobel, J.: Relative Lempel-Ziv Compression of Genomes for Large-Scale Storage and Retrieval. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 201–206. Springer, Heidelberg (2010)
Maruyama, S., Nakahara, M., Kishiue, N., Sakamoto, H.: ESP-Index: A Compressed Index Based on Edit-Sensitive Parsing. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 398–409. Springer, Heidelberg (2011)
McCreight, E.M.: Priority search trees. SIAM Journal on Computing 14(2), 257–276 (1985)
Morrison, D.R.: PATRICIA - Practical algorithm to retrieve information coded in alphanumeric. Journal of the ACM 15(4), 514–534 (1968)
Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Computing Surveys 39(1) (2007)
Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theoretical Computer Science 302(1-3), 211–222 (2003)
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory 23(3), 337–343 (1977)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J. (2012). A Faster Grammar-Based Self-index. In: Dediu, AH., Martín-Vide, C. (eds) Language and Automata Theory and Applications. LATA 2012. Lecture Notes in Computer Science, vol 7183. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28332-1_21
Download citation
DOI: https://doi.org/10.1007/978-3-642-28332-1_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28331-4
Online ISBN: 978-3-642-28332-1
eBook Packages: Computer ScienceComputer Science (R0)