Skip to main content

LZ-ABT: A Practical Algorithm for \(\alpha \)-Balanced Grammar Compression

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10979))

Abstract

We propose a new LZ78-style grammar compression algorithm, named LZ-ABT, which is a simple online algorithm to create, given a string of length N over an alphabet of size \(\sigma \), an \(\alpha \)-balanced grammar in \(O(N \log N \log \sigma )\) time and O(n) space in addition to the input string, where n is the grammar size to output. LZ-ABT can avoid the lower-bound of \(\varOmega (N^{5/4})\) time of the naive algorithms for LZMW and LZD, other LZ78-style compression algorithms, which was observed in [Badkobeh et al. SPIRE 2017, pp. 51–67]. We also show that the algorithm can be executed in compressed space, i.e., without storing the whole input string explicitly in memory: in \(O(N \log ^2 N \log \sigma )\) time and O(n) space, or \(O(N \log N \log \sigma )\) time and \(O(n \log ^{*} N)\) space. We implement LZ-ABT running in \(O(N \log N \log \sigma )\) time and O(N) space and empirically show that its performance is competitive to LZD. This is the first practical implementation of \(\alpha \)-balanced grammar compression to the best of our knowledge.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    We remark that the \(\log \sigma \) multiplicative factor in the running time is the cost to conduct a binary search at internal nodes in the Patricia tree, and can be removed by using hash function if we allow its non-deterministic behavior.

  2. 2.

    Of course, we ignore any trivial input string of length one or zero.

  3. 3.

    Since \(S_{\ell }\) is represented in \(\mathcal {T}_{ V }\), we can shortcut by starting the traversal from the node representing \(S_{\ell }\), but it does not change the complexity.

  4. 4.

    https://github.com/kg86/lzd.

  5. 5.

    http://pizzachili.dcc.uchile.cl/texts.html.

  6. 6.

    http://pizzachili.dcc.uchile.cl/repcorpus/real/.

  7. 7.

    https://bitbucket.org/dkosolobov/lzd-lzmw.

References

  1. Badkobeh, G., Gagie, T., Inenaga, S., Kociumaka, T., Kosolobov, D., Puglisi, S.J.: On two LZ78-style grammars: compression bounds and compressed-space computation. In: Fici, G., Sciortino, M., Venturini, R. (eds.) SPIRE 2017. LNCS, vol. 10508, pp. 51–67. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67428-5_5

    Chapter  Google Scholar 

  2. Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Trans. Inf. Theory 51(7), 2554–2576 (2005)

    Article  MathSciNet  Google Scholar 

  3. Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: A faster grammar-based self-index. In: Dediu, A.-H., Martín-Vide, C. (eds.) LATA 2012. LNCS, vol. 7183, pp. 240–251. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28332-1_21

    Chapter  Google Scholar 

  4. Goto, K., Bannai, H., Inenaga, S., Takeda, M.: LZD Factorization: simple and practical online grammar compression with variable-to-fixed encoding. In: Cicalese, F., Porat, E., Vaccaro, U. (eds.) CPM 2015. LNCS, vol. 9133, pp. 219–230. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19929-0_19

    Chapter  Google Scholar 

  5. Hucke, D., Lohrey, M., Reh, C.P.: The smallest grammar problem revisited. In: Inenaga, S., Sadakane, K., Sakai, T. (eds.) SPIRE 2016. LNCS, vol. 9954, pp. 35–49. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46049-9_4

    Chapter  Google Scholar 

  6. Jez, A.: Approximation of grammar-based compression via recompression. Theor. Comput. Sci. 592, 115–134 (2015)

    Article  MathSciNet  Google Scholar 

  7. Jez, A.: A really simple approximation of smallest grammar. Theor. Comput. Sci. 616, 141–150 (2016)

    Article  MathSciNet  Google Scholar 

  8. Larsson, N.J., Moffat, A.: Offline dictionary-based compression. In: Data Compression Conference, DCC 1999, pp. 296–305 (1999)

    Google Scholar 

  9. Lohrey, M.: Algorithmics on SLP-compressed strings: a survey. Groups Complex. Cryptol. 4(2), 241–299 (2012)

    Article  MathSciNet  Google Scholar 

  10. Miller, V.S., Wegman, M.N.: Variations on a theme by Ziv and Lempel. In: Apostolico, A., Galil, Z. (eds.) Combinatorial Algorithms on Words. NATO ASI Series, vol. 12, pp. 131–140. Springer, Heidelberg (1985)

    Chapter  Google Scholar 

  11. Nelson, G., Kieffer, J., Cosman, P.: An interesting hierarchical lossless data compression algorithm. In: IEEE Information Theory Society Workshop (1995)

    Google Scholar 

  12. Nevill-Manning, C.G., Witten, I.H.: Identifying hierarchical strcture in sequences: a linear-time algorithm. J. Artif. Intell. Res. (JAIR) 7, 67–82 (1997)

    MATH  Google Scholar 

  13. Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci. 302(1–3), 211–222 (2003)

    Article  MathSciNet  Google Scholar 

  14. Sakamoto, H.: A fully linear-time approximation algorithm for grammar-based compression. J. Discrete Algorithms 3(2–4), 416–430 (2005)

    Article  MathSciNet  Google Scholar 

  15. Storer, J.A., Szymanski, T.G.: The macro model for data compression (extended abstract). In: Proceedings of the 10th Annual ACM Symposium on Theory of Computing, pp. 30–39 (1978)

    Google Scholar 

  16. Takabatake, Y., I, T., Sakamoto, H.: A space-optimal grammar compression. In: Proceedings of ESA 2017, pp. 67:1–67:15 (2017)

    Google Scholar 

  17. Ziv, J., Lempel, A.: Compression of individual sequences via variable-length coding. IEEE Trans. Inf. Theory 24(5), 530–536 (1978)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by JST CREST (Grant Number JPMJCR1402), and KAKENHI (Grant Numbers 18K18111, 17H01791 and 16K16009).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hiroshi Sakamoto .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ohno, T., Goto, K., Takabatake, Y., I, T., Sakamoto, H. (2018). LZ-ABT: A Practical Algorithm for \(\alpha \)-Balanced Grammar Compression. In: Iliopoulos, C., Leong, H., Sung, WK. (eds) Combinatorial Algorithms. IWOCA 2018. Lecture Notes in Computer Science(), vol 10979. Springer, Cham. https://doi.org/10.1007/978-3-319-94667-2_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-94667-2_27

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-94666-5

  • Online ISBN: 978-3-319-94667-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics