Abstract
Motivated by the rapidly increasing size of genomic databases, code repositories and versioned texts, several compression schemes have been proposed that work well on highly-repetitive strings and also support fast random access: e.g., LZ-End, RLZ, GDC, augmented SLPs, and block graphs. Block graphs have good worst-case bounds but it has been an open question whether they are practical. We describe an implementation of block graphs that, for several standard datasets, provides better compression and faster random access than competing schemes.
Similar content being viewed by others
References
Bille, P., Landau, G.M., Raman, R., Sadakane, K., Satti, S.R., Weimann, O.: Random access to grammar-compressed strings. In: Proceedings of the 22nd Symposium on Discrete Algorithms (SODA), pp. 373–389 (2011)
Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Trans. Inf. Theory 51(7), 2554–2576 (2005)
Deorowicz, S., Danek, A., Grabowski, S.: Genome compression: a novel approach for large collections. Bioinformatics 29(20), 2572–2578 (2013)
Deorowicz, S., Grabowski, S.: Robust relative compression of genomes with random access. Bioinformatics 27(21), 2979–2986 (2011)
Gagie, T., Gawrychowski, P., Puglisi, S.J.: Faster approximate pattern matching in compressed repetitive texts. In: Proceedings of the 22nd International Symposium on Algorithms and Computation (ISAAC), pp. 653–662 (2011)
Grossi, R.: Random access to high-order entropy compressed text. In: Brodnik, A., López-Ortiz, A., Raman, V., Viola, A. (eds.) Pace-Efficient Data Structures, Streams, and Algorithms, pp. 199–215. Springer, Berlin (2013)
Kreft, S., Navarro, G.: On compressing and indexing repetitive sequences. Theor. Comput. Sci. 483, 115–133 (2013)
Kuruppu, S., Puglisi, S.J., Zobel, J.: Relative Lempel-Ziv compression of genomes for large-scale storage and retrieval. In: Proceedings of the 17th Symposium on String Processing and Information Retrieval (SPIRE), pp. 201–206 (2010)
Kuruppu, S., Puglisi, S.J., Zobel, J.: Optimized relative Lempel-Ziv compression of genomes. In: Proceedings of the 34th Australasian Computer Science Conference (ACSC), pp. 91–98 (2011)
Maruyama, S., Tabei, Y., Sakamoto, H., Sadakane, K.: Fully-online grammar compression. In: Proceedings of the 20th Symposium on String Processing and Information Retrieval (SPIRE), pp. 218–229 (2013)
Okanohara, D., Sadakane, K.: Practical entropy-compressed rank/select dictionary. In: Proceedings of the Workshop on Algorithm Engineering and Experiments (ALENEX) (2007)
Raman, R., Raman, V., Satti, S.R.: Succinct indexable dictionaries with applications to encoding \(k\)-ary trees, prefix sums and multisets. ACM Trans. Algorithms 3(4), 43 (2007)
Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci. 302(1–3), 211–222 (2003)
Verbin, E., Yu, W.: Data structure lower bounds on random access to grammar-compressed strings. In: Proceedings of the 24th Symposium on Combinatorial Pattern Matching (CPM), pp. 247–258 (2013)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Gagie, T., Hoobin, C. & Puglisi, S.J. Block Graphs in Practice. Math.Comput.Sci. 11, 191–196 (2017). https://doi.org/10.1007/s11786-016-0286-9
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11786-016-0286-9