Skip to main content

Converting SLP to LZ78 in almost Linear Time

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7922))

Abstract

Given a straight line program of size n, we are interested in constructing the LZ78 factorization of the corresponding text. We show how to perform such conversion in \(\mathcal{O}(n+m\log m)\) time, where m is the number of LZ78 codewords. This improves on the previously known \(\mathcal{O}(n\sqrt{N}+m\log N)\) solution [Bannai et al., SPIRE 2012]. The main tool in our algorithm is a data structure which allows us to efficiently operate on labels of the paths in a growing trie, and a certain method of recompressing the parse whenever it leads to decreasing its size.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   49.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alstrup, S., Holm, J.: Improved algorithms for finding level ancestors in dynamic trees. In: Welzl, E., Montanari, U., Rolim, J.D.P. (eds.) ICALP 2000. LNCS, vol. 1853, pp. 73–84. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  2. Bannai, H., Inenaga, S., Takeda, M.: Efficient LZ78 factorization of grammar compressed text. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 86–98. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  3. Cole, R., Hariharan, R.: Dynamic LCA queries on trees. In: Proc. SODA 1999, pp. 235–244 (1999)

    Google Scholar 

  4. Crochemore, M., Landau, G.M., Ziv-Ukelson, M.: A subquadratic sequence alignment algorithm for unrestricted scoring matrices. SIAM J. Comput. 32(6), 1654–1673 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  5. Gasieniec, L., Rytter, W.: Almost optimal fully LZW-compressed pattern matching. In: DCC 1999: Proceedings of the Conference on Data Compression, p. 316. IEEE Computer Society, Washington, DC (1999)

    Google Scholar 

  6. Gawrychowski, P.: Faster algorithm for computing the edit distance between SLP-compressed strings. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 229–236. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  7. Gawrychowski, P.: Tying up the loose ends in fully LZW-compressed pattern matching. In: Proc. STACS 2012, pp. 624–635 (2012)

    Google Scholar 

  8. Goto, K., Bannai, H., Inenaga, S., Takeda, M.: Fast q-gram mining on SLP compressed strings. Journal of Discrete Algorithms 18, 89–99 (2013)

    Article  MathSciNet  Google Scholar 

  9. Hermelin, D., Landau, G.M., Landau, S., Weimann, O.: A unified algorithm for accelerating edit-distance computation via text-compression. In: Proc. STACS 2009, pp. 529–540 (2009)

    Google Scholar 

  10. Jeż, A.: Faster fully compressed pattern matching by recompression. In: Czumaj, A., Mehlhorn, K., Pitts, A., Wattenhofer, R. (eds.) ICALP 2012, Part I. LNCS, vol. 7391, pp. 533–544. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  11. Karpinski, M., Rytter, W., Shinohara, A.: An efficient pattern-matching algorithm for strings with short descriptions. Nordic Journal of Computing 4, 172–186 (1997)

    MathSciNet  MATH  Google Scholar 

  12. Larsson, N.J., Moffat, A.: Off-line dictionary-based compression. Proceedings of the IEEE 88(11), 1722–1732 (2000)

    Article  Google Scholar 

  13. Li, M., Chen, X., Li, X., Ma, B., Vitányi, P.M.B.: The similarity metric. IEEE Transactions on Information Theory 50(12), 3250–3264 (2004)

    Article  Google Scholar 

  14. Li, M., Sleep, R.: Genre classification via an LZ78-based string kernel. In: Proc. ISMIR 2005, pp. 252–259 (2005)

    Google Scholar 

  15. Li, M., Sleep, R.: An LZ78 based string kernel. In: Li, X., Wang, S., Dong, Z.Y. (eds.) ADMA 2005. LNCS (LNAI), vol. 3584, pp. 678–689. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  16. Li, M., Zhu, Y.: Image classification via LZ78 based string kernel: A comparative study. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, pp. 704–712. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  17. Lifshits, Y.: Processing compressed texts: A tractability border. In: Ma, B., Zhang, K. (eds.) CPM 2007. LNCS, vol. 4580, pp. 228–240. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  18. Miyazaki, M., Shinohara, A., Takeda, M.: An improved pattern matching algorithm for strings in terms of straight-line programs. Journal of Discrete Algorithms 1(1), 187–204 (2000)

    MathSciNet  Google Scholar 

  19. Nevill-Manning, C.G., Witten, I.H., Maulsby, D.L.: Compression by induction of hierarchical grammars. In: Proc. DCC 1994. pp. 244–253 (1994)

    Google Scholar 

  20. Rytter, W.: Application of Lempel-Ziv factorization to the approximation of grammar-based compression. Theoretical Computer Science 302(1-3), 211–222 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  21. Takeda, M., Shibata, Y., Matsumoto, T., Kida, T., Shinohara, A., Fukamachi, S., Shinohara, T., Arikawa, S.: Speeding up string pattern matching by text compression: The dawn of a new era. Transactions of Information Processing Society of Japan 42(3), 370–384 (2001)

    MathSciNet  Google Scholar 

  22. Yamamoto, T., Bannai, H., Inenaga, S., Takeda, M.: Faster subsequence and don’t-care pattern matching on compressed texts. In: Giancarlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp. 309–322. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  23. Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory IT-23(3), 337–349 (1977)

    Article  MathSciNet  MATH  Google Scholar 

  24. Ziv, J., Lempel, A.: Compression of individual sequences via variable-length coding. IEEE Transactions on Information Theory 24(5), 530–536 (1978)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bannai, H., Gawrychowski, P., Inenaga, S., Takeda, M. (2013). Converting SLP to LZ78 in almost Linear Time. In: Fischer, J., Sanders, P. (eds) Combinatorial Pattern Matching. CPM 2013. Lecture Notes in Computer Science, vol 7922. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-38905-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-38905-4_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-38904-7

  • Online ISBN: 978-3-642-38905-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics