Skip to main content

Simple Compression Code Supporting Random Access and Fast String Matching

  • Conference paper
Experimental Algorithms (WEA 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4525))

Included in the following conference series:

Abstract

Given a sequence S of n symbols over some alphabet Σ, we develop a new compression method that is (i) very simple to implement; (ii) provides O(1) time random access to any symbol of the original sequence; (iii) allows efficient pattern matching over the compressed sequence. Our simplest solution uses at most 2h + o(h) bits of space, where h = n (H 0(S) + 1), and H 0(S) is the zeroth-order empirical entropy of S. We discuss a number of improvements and trade-offs over the basic method. The new method is applied to text compression. We also propose average case optimal string matching algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Amir, A., Benson, G.: Two-dimensional periodicity and its applications. In: Proceedings of SODA’92, pp. 440–452 (1992)

    Google Scholar 

  • Baeza-Yates, R.A., Gonnet, G.H.: A new approach to text searching. Commun. ACM 35(10), 74–82 (1992)

    Article  Google Scholar 

  • Brisaboa, N., Iglesias, E., Navarro, G., Paramá, J.: An efficient compression code for text databases. In: Sebastiani, F. (ed.) ECIR 2003. LNCS, vol. 2633, pp. 468–481. Springer, Heidelberg (2003)

    Google Scholar 

  • Brown, J.L.: Zeckendorf’s theorem and some applications. Fib. Quart. 2:163–168

    Google Scholar 

  • Brown, J.L.: A new characterization of the Fibonacci numbers. Fib. Quart. 3, 1–8 (1965)

    MATH  Google Scholar 

  • Burrows, M., Wheeler, D.: A block sorting lossless data compression algorithm. Technical Report 124, Digital Equipment Corporation (1994)

    Google Scholar 

  • Clark, D.R.: Compact Pat Trees. PhD thesis, University of Waterloo, Ontario, Canada (1998)

    Google Scholar 

  • Elias, P.: Universal codeword sets and representation of the integers. IEEE Transactions on Information Theory 21(2), 194–203 (1975)

    Article  MATH  MathSciNet  Google Scholar 

  • Fredriksson, K.: Shift–or string matching with super-alphabets. Information Processing Letters 87(1), 201–204 (2003)

    Article  MathSciNet  Google Scholar 

  • González, R., Navarro, G.: Statistical encoding of succinct data structures. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 295–306. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  • Grabowski, S., Navarro, G., Przywarski, R., Salinger, A., Mäkinen, V.: A simple alphabet-independent FM-index. International Journal of Foundations of Computer Science (IJFCS) 17(6), 1365–1384 (2006)

    Article  MATH  Google Scholar 

  • Heaps, H.S.: Information retrieval: theoretical and computational aspects. Academic Press, New York (1978)

    MATH  Google Scholar 

  • Horspool, R.N.: Practical fast searching in strings. Softw. Pract. Exp. 10(6), 501–506 (1980)

    Article  Google Scholar 

  • Huffman, D.A.: A method for the construction of minimum redundancy codes. Proceedings of I.R.E 40, 1098–1101 (1951)

    Article  Google Scholar 

  • Jacobson, G.: Succinct static data structures. PhD thesis, Carnegie Mellon University (1989)

    Google Scholar 

  • Mäkinen, V., Navarro, G.: Rank and select revisited and extended. Theoretical Computer Science, Special issue on The Burrows-Wheeler Transform and its Applications (To appear 2006)

    Google Scholar 

  • Manber, U.: A text compression scheme that allows fast searching directly in the compressed file. ACM Trans. Inform. Syst. 15(2), 124–136 (1997)

    Article  Google Scholar 

  • Moura, E., Navarro, G., Ziviani, N., Baeza-Yates, R.: Fast and flexible word searching on compressed text. ACM Transactions on Information Systems (TOIS) 18(2), 113–139 (2000)

    Article  Google Scholar 

  • Munro, J.I.: Tables. In: Chandru, V., Vinay, V. (eds.) Foundations of Software Technology and Theoretical Computer Science. LNCS, vol. 1180, pp. 37–42. Springer, Heidelberg (1996)

    Google Scholar 

  • Navarro, G., Raffinot, M.: Fast and flexible string matching by combining bit-parallelism and suffix automata. ACM Journal of Experimental Algorithmics (JEA), vol. 5(4) (2000)

    Google Scholar 

  • Okanohara, D., Sadakane, K.: Practical entropy-compressed rank/select dictionary. In: Proceedings of ALENEX’07, ACM Press, New York (2007)

    Google Scholar 

  • Pagh, R.: Low redundancy in static dictionaries with o(1) worst case lookup time. In: Wiedermann, J., van Emde Boas, P., Nielsen, M. (eds.) ICALP 1999. LNCS, vol. 1644, pp. 595–604. Springer, Heidelberg (1999)

    Google Scholar 

  • Raman, R., Raman, V., Rao, S.S.: Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. In: Proceedings of SODA’02, pp. 233–242. ACM Press, New York (2002)

    Google Scholar 

  • Sadakane, K., Grossi, R.: Squeezing succinct data structures into entropy bounds. In: Proceedings of SODA’06, pp. 1230–1239. ACM Press, New York (2006)

    Google Scholar 

  • Sunday, D.M.: A very fast substring search algorithm. Commun. ACM 33(8), 132–142 (1990)

    Article  Google Scholar 

  • Witten, I.H., Neal, R.M., Cleary, J.G.: Arithmetic coding for data compression. Communications of the ACM 30(6), 520 (1987)

    Article  Google Scholar 

  • Yao, A.C.: The complexity of pattern matching for a random string. SIAM J. Comput. 8(3), 368–387 (1979)

    Article  MATH  MathSciNet  Google Scholar 

  • Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Trans. Inf. Theory 23, 337–343 (1977)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Camil Demetrescu

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Fredriksson, K., Nikitin, F. (2007). Simple Compression Code Supporting Random Access and Fast String Matching. In: Demetrescu, C. (eds) Experimental Algorithms. WEA 2007. Lecture Notes in Computer Science, vol 4525. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-72845-0_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-72845-0_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-72844-3

  • Online ISBN: 978-3-540-72845-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics