Abstract
Byte codes are a practical alternative to the traditional bit-oriented compression approaches when large alphabets are being used, and trade away a small amount of compression effectiveness for a relatively large gain in decoding efficiency. Byte codes also have the advantage of being searchable using standard string matching techniques. Here we describe methods for searching in byte-coded compressed text and investigate the impact of large alphabets on traditional string matching techniques. We also describe techniques for phrase-based searching in a restricted type of byte code, and present experimental results that compare our adapted methods with previous approaches.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Brisaboa, N.R., Fariña, A., Navarro, G., Esteller, M.F.: (S,C)-dense coding: An optimized compression code for natural language text databases. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 122–136. Springer, Heidelberg (2003)
Brisaboa, N.R., Iglesias, E.L., Navarro, G., Paramá, J.: An efficient compression code for text databases. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 468–481. Springer, Heidelberg (2003)
Culpepper, J.S., Moffat, A.: Enhanced byte codes with restricted prefix properties. In: Consens, M.P., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 1–12. Springer, Heidelberg (2005)
de Moura, E.S., Navarro, G., Ziviani, N., Baeza-Yates, R.: Fast and flexible word searching on compressed text. ACM Transactions on Information Systems 18(2), 113–139 (2000)
Fariña, A.: New compression codes for text databases. PhD thesis, Universidade de Coruña (April 2005)
Larsson, N.J., Moffat, A.: Offline dictionary-based compression. Proceedings of the IEEE 88(11), 1722–1732 (2000)
Manber, U.: A text compression scheme that allows fast searching directly in the compressed file. ACM Transactions on Information Systems 5(2), 124–136 (1997)
Navarro, G., Raffinot, M.: Flexible pattern matching in strings, 1st edn. Cambridge University Press, Cambridge (2002)
Seuss, D.: Fox in socks. Random House, 1st edn., (Written by T. Geisel) (1965)
Spink, A., Wolfram, D., Jansen, B.J., Saracevic, T.: Searching the web: The public and their queries. Journal of the American Society for Information Science 52(3), 226–234 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Culpepper, J.S., Moffat, A. (2006). Phrase-Based Pattern Matching in Compressed Text. In: Crestani, F., Ferragina, P., Sanderson, M. (eds) String Processing and Information Retrieval. SPIRE 2006. Lecture Notes in Computer Science, vol 4209. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11880561_28
Download citation
DOI: https://doi.org/10.1007/11880561_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45774-9
Online ISBN: 978-3-540-45775-6
eBook Packages: Computer ScienceComputer Science (R0)