Skip to main content

Context-Sensitive Grammar Transform: Compression and Pattern Matching

  • Conference paper
String Processing and Information Retrieval (SPIRE 2008)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5280))

Included in the following conference series:

Abstract

A framework of context-sensitive grammar transform is proposed. A greedy compression algorithm with the transform model is presented as well as a Knuth-Morris-Pratt (KMP)-type compressed pattern matching (CPM) algorithm. The compression performance is a match for gzip and Re-Pair. The search speed of our CPM algorithm is almost twice faster than the KMP type CPM algorithm on Byte-Pair-Encoding by Shibata et al. (2000), and in the case of short patterns, faster than the Boyer-Moore-Horspool algorithm with the stopper encoding by Rautio et al. (2002), which is regarded as one of the best combinations that allows a practically fast search.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Amir, A., Benson, G.: Efficient two-dimensional compressed matching. In: Proc. Data Compression Conference 1992 (DCC 1992), p. 279 (1992)

    Google Scholar 

  2. Amir, A., Benson, G., Farach, M.: Let sleeping files lie: Pattern matching in Z-compressed files. J. Comput. Syst. Sci. 52(2), 299–307 (1996)

    Article  MathSciNet  MATH  Google Scholar 

  3. Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Transactions on Information Theory 51(7), 2554–2576 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  4. Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings. Cambridge University Press, Cambridge (2007)

    Book  MATH  Google Scholar 

  5. Kida, T., Matsumoto, T., Shibata, Y., Takeda, M., Shinohara, A., Arikawa, S.: Collage systems: a unifying framework for compressed pattern matching. Theoret. Comput. Sci. 298(1), 253–272 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  6. Knuth, D.E.: The Art of Computer Programming. Seminumerical Algorithms, vol. II. Addison-Wesley, Reading (1981)

    MATH  Google Scholar 

  7. Larsson, N.J., Moffat, A.: Off-line dictionary-based compression. Proceedings of the IEEE 88(11), 1722–1732 (2000)

    Article  Google Scholar 

  8. Lothaire, M.: Combinatorics on Words. Cambridge University Press, Cambridge (1983)

    MATH  Google Scholar 

  9. Manber, U.: A text compression scheme that allows fast searching directly in the compressed file. ACM Transactions on Information Systems 15(2), 124–136 (1997)

    Article  Google Scholar 

  10. Matsumoto, T., Hagio, K., Takeda, M.: A run-time efficient implementation of compressed pattern matching automata. In: Ibarra, O., Ravikumar, B. (eds.) CIAA 2008. LNCS, vol. 5148. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  11. Navarro, G., Raffinot, M.: Practical and flexible pattern matching over Ziv-Lempel compressed text. J. Discrete Algorithms 2(3), 347–371 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  12. Rautio, J., Tanninen, J., Tarhio, J.: String matching with stopper encoding and code splitting. In: Apostolico, A., Takeda, M. (eds.) CPM 2002. LNCS, vol. 2373, pp. 42–52. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  13. Rytter, W.: Application of lempel-ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci. 302(1-3), 211–222 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  14. Sakamoto, H.: A fully linear-time approximation algorithm for grammar-based compression. J. Discrete Algorithms 3(2-4), 416–430 (2005)

    Article  MathSciNet  MATH  Google Scholar 

  15. Sakamoto, H., Kida, T., Shimozono, S.: A space-saving linear-time algorithm for grammar-based compression. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 218–229. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  16. Shibata, Y., Kida, T., Fukamachi, S., Takeda, M., Shinohara, A., Shinohara, T., Arikawa, S.: Speeding up pattern matching by text compression. In: Bongiovanni, G., Petreschi, R., Gambosi, G. (eds.) CIAC 2000. LNCS, vol. 1767, pp. 306–315. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  17. Shibata, Y., Matsumoto, T., Takeda, M., Shinohara, A., Arikawa, S.: A Boyer-Moore type algorithm for compressed pattern matching. In: Giancarlo, R., Sankoff, D. (eds.) CPM 2000. LNCS, vol. 1848, pp. 181–194. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  18. Yang, E.-H., He, D.-K.: Efficient universal lossless data compression algorithms based on a greedy sequential grammar transform - part two: With context models. IEEE Transactions on Information Theory 49(11), 2874–2894 (2003)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Maruyama, S., Tanaka, Y., Sakamoto, H., Takeda, M. (2008). Context-Sensitive Grammar Transform: Compression and Pattern Matching. In: Amir, A., Turpin, A., Moffat, A. (eds) String Processing and Information Retrieval. SPIRE 2008. Lecture Notes in Computer Science, vol 5280. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89097-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-89097-3_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-89096-6

  • Online ISBN: 978-3-540-89097-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics