Abstract
Given a straight line program of size n, we are interested in constructing the LZ78 factorization of the corresponding text. We show how to perform such conversion in \(\mathcal{O}(n+m\log m)\) time, where m is the number of LZ78 codewords. This improves on the previously known \(\mathcal{O}(n\sqrt{N}+m\log N)\) solution [Bannai et al., SPIRE 2012]. The main tool in our algorithm is a data structure which allows us to efficiently operate on labels of the paths in a growing trie, and a certain method of recompressing the parse whenever it leads to decreasing its size.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Alstrup, S., Holm, J.: Improved algorithms for finding level ancestors in dynamic trees. In: Welzl, E., Montanari, U., Rolim, J.D.P. (eds.) ICALP 2000. LNCS, vol. 1853, pp. 73–84. Springer, Heidelberg (2000)
Bannai, H., Inenaga, S., Takeda, M.: Efficient LZ78 factorization of grammar compressed text. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 86–98. Springer, Heidelberg (2012)
Cole, R., Hariharan, R.: Dynamic LCA queries on trees. In: Proc. SODA 1999, pp. 235–244 (1999)
Crochemore, M., Landau, G.M., Ziv-Ukelson, M.: A subquadratic sequence alignment algorithm for unrestricted scoring matrices. SIAM J. Comput. 32(6), 1654–1673 (2003)
Gasieniec, L., Rytter, W.: Almost optimal fully LZW-compressed pattern matching. In: DCC 1999: Proceedings of the Conference on Data Compression, p. 316. IEEE Computer Society, Washington, DC (1999)
Gawrychowski, P.: Faster algorithm for computing the edit distance between SLP-compressed strings. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 229–236. Springer, Heidelberg (2012)
Gawrychowski, P.: Tying up the loose ends in fully LZW-compressed pattern matching. In: Proc. STACS 2012, pp. 624–635 (2012)
Goto, K., Bannai, H., Inenaga, S., Takeda, M.: Fast q-gram mining on SLP compressed strings. Journal of Discrete Algorithms 18, 89–99 (2013)
Hermelin, D., Landau, G.M., Landau, S., Weimann, O.: A unified algorithm for accelerating edit-distance computation via text-compression. In: Proc. STACS 2009, pp. 529–540 (2009)
Jeż, A.: Faster fully compressed pattern matching by recompression. In: Czumaj, A., Mehlhorn, K., Pitts, A., Wattenhofer, R. (eds.) ICALP 2012, Part I. LNCS, vol. 7391, pp. 533–544. Springer, Heidelberg (2012)
Karpinski, M., Rytter, W., Shinohara, A.: An efficient pattern-matching algorithm for strings with short descriptions. Nordic Journal of Computing 4, 172–186 (1997)
Larsson, N.J., Moffat, A.: Off-line dictionary-based compression. Proceedings of the IEEE 88(11), 1722–1732 (2000)
Li, M., Chen, X., Li, X., Ma, B., Vitányi, P.M.B.: The similarity metric. IEEE Transactions on Information Theory 50(12), 3250–3264 (2004)
Li, M., Sleep, R.: Genre classification via an LZ78-based string kernel. In: Proc. ISMIR 2005, pp. 252–259 (2005)
Li, M., Sleep, R.: An LZ78 based string kernel. In: Li, X., Wang, S., Dong, Z.Y. (eds.) ADMA 2005. LNCS (LNAI), vol. 3584, pp. 678–689. Springer, Heidelberg (2005)
Li, M., Zhu, Y.: Image classification via LZ78 based string kernel: A comparative study. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 704–712. Springer, Heidelberg (2006)
Lifshits, Y.: Processing compressed texts: A tractability border. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 228–240. Springer, Heidelberg (2007)
Miyazaki, M., Shinohara, A., Takeda, M.: An improved pattern matching algorithm for strings in terms of straight-line programs. Journal of Discrete Algorithms 1(1), 187–204 (2000)
Nevill-Manning, C.G., Witten, I.H., Maulsby, D.L.: Compression by induction of hierarchical grammars. In: Proc. DCC 1994. pp. 244–253 (1994)
Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theoretical Computer Science 302(1-3), 211–222 (2003)
Takeda, M., Shibata, Y., Matsumoto, T., Kida, T., Shinohara, A., Fukamachi, S., Shinohara, T., Arikawa, S.: Speeding up string pattern matching by text compression: The dawn of a new era. Transactions of Information Processing Society of Japan 42(3), 370–384 (2001)
Yamamoto, T., Bannai, H., Inenaga, S., Takeda, M.: Faster subsequence and don’t-care pattern matching on compressed texts. In: Giancarlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp. 309–322. Springer, Heidelberg (2011)
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory IT-23(3), 337–349 (1977)
Ziv, J., Lempel, A.: Compression of individual sequences via variable-length coding. IEEE Transactions on Information Theory 24(5), 530–536 (1978)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bannai, H., Gawrychowski, P., Inenaga, S., Takeda, M. (2013). Converting SLP to LZ78 in almost Linear Time. In: Fischer, J., Sanders, P. (eds) Combinatorial Pattern Matching. CPM 2013. Lecture Notes in Computer Science, vol 7922. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38905-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-38905-4_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-38904-7
Online ISBN: 978-3-642-38905-4
eBook Packages: Computer ScienceComputer Science (R0)