LZ-ABT: A Practical Algorithm for $$\alpha $$ -Balanced Grammar Compression

Ohno, Tatsuya; Goto, Keisuke; Takabatake, Yoshimasa; I, Tomohiro; Sakamoto, Hiroshi

doi:10.1007/978-3-319-94667-2_27

Tatsuya Ohno¹⁶,
Keisuke Goto¹⁷,
Yoshimasa Takabatake¹⁶,
Tomohiro I¹⁶ &
…
Hiroshi Sakamoto¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10979))

Included in the following conference series:

International Workshop on Combinatorial Algorithms

711 Accesses

Abstract

We propose a new LZ78-style grammar compression algorithm, named LZ-ABT, which is a simple online algorithm to create, given a string of length N over an alphabet of size $\sigma $, an $\alpha $-balanced grammar in $O(N \log N \log \sigma )$ time and O(n) space in addition to the input string, where n is the grammar size to output. LZ-ABT can avoid the lower-bound of $\varOmega (N^{5/4})$ time of the naive algorithms for LZMW and LZD, other LZ78-style compression algorithms, which was observed in [Badkobeh et al. SPIRE 2017, pp. 51–67]. We also show that the algorithm can be executed in compressed space, i.e., without storing the whole input string explicitly in memory: in $O(N \log ^2 N \log \sigma )$ time and O(n) space, or $O(N \log N \log \sigma )$ time and $O(n \log ^{*} N)$ space. We implement LZ-ABT running in $O(N \log N \log \sigma )$ time and O(N) space and empirically show that its performance is competitive to LZD. This is the first practical implementation of $\alpha $-balanced grammar compression to the best of our knowledge.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

On Two LZ78-style Grammars: Compression Bounds and Compressed-Space Computation

LZD Factorization: Simple and Practical Online Grammar Compression with Variable-to-Fixed Encoding

Grammar-Based Tree Compression

Notes

1.
We remark that the $\log \sigma $ multiplicative factor in the running time is the cost to conduct a binary search at internal nodes in the Patricia tree, and can be removed by using hash function if we allow its non-deterministic behavior.
2.
Of course, we ignore any trivial input string of length one or zero.
3.
Since $S_{\ell }$ is represented in $\mathcal {T}_{ V }$, we can shortcut by starting the traversal from the node representing $S_{\ell }$, but it does not change the complexity.
4.
https://github.com/kg86/lzd.
5.
http://pizzachili.dcc.uchile.cl/texts.html.
6.
http://pizzachili.dcc.uchile.cl/repcorpus/real/.
7.
https://bitbucket.org/dkosolobov/lzd-lzmw.

References

Badkobeh, G., Gagie, T., Inenaga, S., Kociumaka, T., Kosolobov, D., Puglisi, S.J.: On two LZ78-style grammars: compression bounds and compressed-space computation. In: Fici, G., Sciortino, M., Venturini, R. (eds.) SPIRE 2017. LNCS, vol. 10508, pp. 51–67. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-67428-5_5
Chapter Google Scholar
Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Trans. Inf. Theory 51(7), 2554–2576 (2005)
Article MathSciNet Google Scholar
Gagie, T., Gawrychowski, P., Kärkkäinen, J., Nekrich, Y., Puglisi, S.J.: A faster grammar-based self-index. In: Dediu, A.-H., Martín-Vide, C. (eds.) LATA 2012. LNCS, vol. 7183, pp. 240–251. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28332-1_21
Chapter Google Scholar
Goto, K., Bannai, H., Inenaga, S., Takeda, M.: LZD Factorization: simple and practical online grammar compression with variable-to-fixed encoding. In: Cicalese, F., Porat, E., Vaccaro, U. (eds.) CPM 2015. LNCS, vol. 9133, pp. 219–230. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-19929-0_19
Chapter Google Scholar
Hucke, D., Lohrey, M., Reh, C.P.: The smallest grammar problem revisited. In: Inenaga, S., Sadakane, K., Sakai, T. (eds.) SPIRE 2016. LNCS, vol. 9954, pp. 35–49. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46049-9_4
Chapter Google Scholar
Jez, A.: Approximation of grammar-based compression via recompression. Theor. Comput. Sci. 592, 115–134 (2015)
Article MathSciNet Google Scholar
Jez, A.: A really simple approximation of smallest grammar. Theor. Comput. Sci. 616, 141–150 (2016)
Article MathSciNet Google Scholar
Larsson, N.J., Moffat, A.: Offline dictionary-based compression. In: Data Compression Conference, DCC 1999, pp. 296–305 (1999)
Google Scholar
Lohrey, M.: Algorithmics on SLP-compressed strings: a survey. Groups Complex. Cryptol. 4(2), 241–299 (2012)
Article MathSciNet Google Scholar
Miller, V.S., Wegman, M.N.: Variations on a theme by Ziv and Lempel. In: Apostolico, A., Galil, Z. (eds.) Combinatorial Algorithms on Words. NATO ASI Series, vol. 12, pp. 131–140. Springer, Heidelberg (1985)
Chapter Google Scholar
Nelson, G., Kieffer, J., Cosman, P.: An interesting hierarchical lossless data compression algorithm. In: IEEE Information Theory Society Workshop (1995)
Google Scholar
Nevill-Manning, C.G., Witten, I.H.: Identifying hierarchical strcture in sequences: a linear-time algorithm. J. Artif. Intell. Res. (JAIR) 7, 67–82 (1997)
MATH Google Scholar
Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci. 302(1–3), 211–222 (2003)
Article MathSciNet Google Scholar
Sakamoto, H.: A fully linear-time approximation algorithm for grammar-based compression. J. Discrete Algorithms 3(2–4), 416–430 (2005)
Article MathSciNet Google Scholar
Storer, J.A., Szymanski, T.G.: The macro model for data compression (extended abstract). In: Proceedings of the 10th Annual ACM Symposium on Theory of Computing, pp. 30–39 (1978)
Google Scholar
Takabatake, Y., I, T., Sakamoto, H.: A space-optimal grammar compression. In: Proceedings of ESA 2017, pp. 67:1–67:15 (2017)
Google Scholar
Ziv, J., Lempel, A.: Compression of individual sequences via variable-length coding. IEEE Trans. Inf. Theory 24(5), 530–536 (1978)
Article Google Scholar

Download references

Acknowledgments

This work was supported by JST CREST (Grant Number JPMJCR1402), and KAKENHI (Grant Numbers 18K18111, 17H01791 and 16K16009).

Author information

Authors and Affiliations

Kyushu Institute of Technology, Kitakyushu, Japan
Tatsuya Ohno, Yoshimasa Takabatake, Tomohiro I & Hiroshi Sakamoto
Fujitsu Laboratories Ltd., Kawasaki, Japan
Keisuke Goto

Authors

Tatsuya Ohno
View author publications
You can also search for this author in PubMed Google Scholar
Keisuke Goto
View author publications
You can also search for this author in PubMed Google Scholar
Yoshimasa Takabatake
View author publications
You can also search for this author in PubMed Google Scholar
Tomohiro I
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi Sakamoto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hiroshi Sakamoto .

Editor information

Editors and Affiliations

Department of Informatics, King’s College London, London, United Kingdom
Costas Iliopoulos
Department of Computer Science, School of Computing, National University of Singapore, Singapore, Singapore
Hon Wai Leong
Department of Computer Science, School of Computing, National University of Singapore, Singapore, Singapore
Wing-Kin Sung

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ohno, T., Goto, K., Takabatake, Y., I, T., Sakamoto, H. (2018). LZ-ABT: A Practical Algorithm for $\alpha $-Balanced Grammar Compression. In: Iliopoulos, C., Leong, H., Sung, WK. (eds) Combinatorial Algorithms. IWOCA 2018. Lecture Notes in Computer Science(), vol 10979. Springer, Cham. https://doi.org/10.1007/978-3-319-94667-2_27

Download citation

DOI: https://doi.org/10.1007/978-3-319-94667-2_27
Published: 04 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94666-5
Online ISBN: 978-3-319-94667-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

LZ-ABT: A Practical Algorithm for \(\alpha \)-Balanced Grammar Compression

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

On Two LZ78-style Grammars: Compression Bounds and Compressed-Space Computation

LZD Factorization: Simple and Practical Online Grammar Compression with Variable-to-Fixed Encoding

Grammar-Based Tree Compression

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

LZ-ABT: A Practical Algorithm for \(\alpha \)-Balanced Grammar Compression

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

On Two LZ78-style Grammars: Compression Bounds and Compressed-Space Computation

LZD Factorization: Simple and Practical Online Grammar Compression with Variable-to-Fixed Encoding

Grammar-Based Tree Compression

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation