Skip to main content

Genetic-Programming Based Prediction of Data Compression Saving

  • Conference paper
Book cover Artifical Evolution (EA 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5975))

Abstract

We use Genetic Programming (GP) to generate programs that predict the data compression ratio for compression algorithms. GP evolves programs with multiple components. One component analyses statistical features extracted from the files’ byte frequency distribution to come up with a compression ratio prediction. Another component does the same but by analysing statistical features extracted from the files’ raw ASCII representation. A further (evolved) component acts as a decision tree to determine the overall output (compression ratio estimation) returned by an individual. The decision tree produces its result based on a series of comparisons among statistical features extracted from the files and the outputs of the two prediction components. The evolved decision tree has the choice to select either the outputs of the two compression prediction trees or alternatively, to integrate them into an evolved mathematical formula. Experiments with the proposed approach show that GP is able to accurately estimate the compression ratio of unseen files thereby avoiding the need to run multiple compressions on a file to decide which one provide best results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Hsu, W.H., Zwarico, E.E.: Automatic Synthesis of Compression Techniques for Heterogeneous Files. SOFTPREX: Software–Practice and Experience 25 (1995)

    Google Scholar 

  2. Chlhane, W.: Statistical Measures as Predictors of Compression Savings, The Ohio State University, Department of Computer Science and Engineering, Honors Theses (2008)

    Google Scholar 

  3. Kattan, A., Poli, R.: Evolutionary lossless compression with GP-ZIP*. In: Proceedings of the 10th annual conference on Genetic and evolutionary computation, Atlanta, Georgia, USA, pp. 1211–1218 (2008)

    Google Scholar 

  4. Poli, R., Langdon, W.B., McPhee, N.: A field guide to genetic programming (2008), http://lulu.com

  5. Mitchell, T.M.: McGRAW-HILL International Editions, ch. 3 (1997)

    Google Scholar 

  6. Muni, D.P., Pal, N.R., Das, J.: A novel approach to design classifiers using genetic programming. IEEE Transactions on Evolutionary Computation 8(2), 183–196 (2004)

    Article  Google Scholar 

  7. Boric, N., Estevez, P.A.: Genetic Programming-Based Clustering Using an Information Theoretic Fitness Measure. In: IEEE Congress on Evolutionary Computation, September 25-28, pp. 31–38 (2007)

    Google Scholar 

  8. Cleary, J.G., Witten, I.H.: Unbounded length contexts for PPM. In: Data Compression Conference, Snowbird, UT, USA, March 28-30, pp. 52–61 (1995)

    Google Scholar 

  9. Witten, I.H., Neal, R.M., Cleary, J.G.: Arithmetic coding for data compression. Communications of the ACM 30(6), 520–541 (1987)

    Article  Google Scholar 

  10. Kattan, A.: Universal Lossless Data Compression with built in Encryption, Master Thesis, Ed. University of Essex (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kattan, A., Poli, R. (2010). Genetic-Programming Based Prediction of Data Compression Saving. In: Collet, P., Monmarché, N., Legrand, P., Schoenauer, M., Lutton, E. (eds) Artifical Evolution. EA 2009. Lecture Notes in Computer Science, vol 5975. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14156-0_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14156-0_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14155-3

  • Online ISBN: 978-3-642-14156-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics