Abstract
A framework of context-sensitive grammar transform is proposed. A greedy compression algorithm with the transform model is presented as well as a Knuth-Morris-Pratt (KMP)-type compressed pattern matching (CPM) algorithm. The compression performance is a match for gzip and Re-Pair. The search speed of our CPM algorithm is almost twice faster than the KMP type CPM algorithm on Byte-Pair-Encoding by Shibata et al. (2000), and in the case of short patterns, faster than the Boyer-Moore-Horspool algorithm with the stopper encoding by Rautio et al. (2002), which is regarded as one of the best combinations that allows a practically fast search.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Amir, A., Benson, G.: Efficient two-dimensional compressed matching. In: Proc. Data Compression Conference 1992 (DCC 1992), p. 279 (1992)
Amir, A., Benson, G., Farach, M.: Let sleeping files lie: Pattern matching in Z-compressed files. J. Comput. Syst. Sci. 52(2), 299–307 (1996)
Charikar, M., Lehman, E., Liu, D., Panigrahy, R., Prabhakaran, M., Sahai, A., Shelat, A.: The smallest grammar problem. IEEE Transactions on Information Theory 51(7), 2554–2576 (2005)
Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings. Cambridge University Press, Cambridge (2007)
Kida, T., Matsumoto, T., Shibata, Y., Takeda, M., Shinohara, A., Arikawa, S.: Collage systems: a unifying framework for compressed pattern matching. Theoret. Comput. Sci. 298(1), 253–272 (2003)
Knuth, D.E.: The Art of Computer Programming. Seminumerical Algorithms, vol. II. Addison-Wesley, Reading (1981)
Larsson, N.J., Moffat, A.: Off-line dictionary-based compression. Proceedings of the IEEE 88(11), 1722–1732 (2000)
Lothaire, M.: Combinatorics on Words. Cambridge University Press, Cambridge (1983)
Manber, U.: A text compression scheme that allows fast searching directly in the compressed file. ACM Transactions on Information Systems 15(2), 124–136 (1997)
Matsumoto, T., Hagio, K., Takeda, M.: A run-time efficient implementation of compressed pattern matching automata. In: Ibarra, O., Ravikumar, B. (eds.) CIAA 2008. LNCS, vol. 5148. Springer, Heidelberg (2008)
Navarro, G., Raffinot, M.: Practical and flexible pattern matching over Ziv-Lempel compressed text. J. Discrete Algorithms 2(3), 347–371 (2004)
Rautio, J., Tanninen, J., Tarhio, J.: String matching with stopper encoding and code splitting. In: Apostolico, A., Takeda, M. (eds.) CPM 2002. LNCS, vol. 2373, pp. 42–52. Springer, Heidelberg (2002)
Rytter, W.: Application of lempel-ziv factorization to the approximation of grammar-based compression. Theor. Comput. Sci. 302(1-3), 211–222 (2003)
Sakamoto, H.: A fully linear-time approximation algorithm for grammar-based compression. J. Discrete Algorithms 3(2-4), 416–430 (2005)
Sakamoto, H., Kida, T., Shimozono, S.: A space-saving linear-time algorithm for grammar-based compression. In: Apostolico, A., Melucci, M. (eds.) SPIRE 2004. LNCS, vol. 3246, pp. 218–229. Springer, Heidelberg (2004)
Shibata, Y., Kida, T., Fukamachi, S., Takeda, M., Shinohara, A., Shinohara, T., Arikawa, S.: Speeding up pattern matching by text compression. In: Bongiovanni, G., Petreschi, R., Gambosi, G. (eds.) CIAC 2000. LNCS, vol. 1767, pp. 306–315. Springer, Heidelberg (2000)
Shibata, Y., Matsumoto, T., Takeda, M., Shinohara, A., Arikawa, S.: A Boyer-Moore type algorithm for compressed pattern matching. In: Giancarlo, R., Sankoff, D. (eds.) CPM 2000. LNCS, vol. 1848, pp. 181–194. Springer, Heidelberg (2000)
Yang, E.-H., He, D.-K.: Efficient universal lossless data compression algorithms based on a greedy sequential grammar transform - part two: With context models. IEEE Transactions on Information Theory 49(11), 2874–2894 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Maruyama, S., Tanaka, Y., Sakamoto, H., Takeda, M. (2008). Context-Sensitive Grammar Transform: Compression and Pattern Matching. In: Amir, A., Turpin, A., Moffat, A. (eds) String Processing and Information Retrieval. SPIRE 2008. Lecture Notes in Computer Science, vol 5280. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-89097-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-540-89097-3_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-89096-6
Online ISBN: 978-3-540-89097-3
eBook Packages: Computer ScienceComputer Science (R0)