Abstract
Data compression is very important today and it will be even more important in the future. Textual data use only limited alphabet - total number of used symbols (letters, numbers, diacritics, dots, spaces, etc.). In most languages, letters are joined into syllables and words. Both these approaches has pros and cons, but none of them is the best for any file. This paper describes a variant of algorithm for evolving alphabet from characters and 2-grams, which is optimal for compressed text files. The efficiency of the new variant will be tested on three compression algorithms and a new compression algorithm based on LZ77 will be also used with this new approach.
This work was supported by the Grant Agency of the Czech Republic, under the grant no. P202/11/P142.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abramson, N.: Information Theory and Coding. McGraw-Hill, New York (1963)
Andres, J.: On a conjecture about the fractal structure of language (2008) (preprint)
Arnold, R., Bell, T.: A corpus for the evaluation of lossless compression algorithms. In: Storer, J.A., Cohn, M. (eds.) Proc. 1997 IEEE Data Compression Conference, pp. 201–210. IEEE Computer Society Press, Los Alamitos (1997)
Bentley, J.L., Sleator, D.D., Tarjan, R.E., Wei, V.K.: A locally adaptive data compression scheme. Commun. ACM 29(4), 320–330 (1986)
Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Tech. rep., Digital SRC Research Report (1994)
Cleary, J.G., Witten, I.H.: Data compression using adaptive coding and partial string matching. IEEE Transactions on Communications 32, 396–402 (1984)
Glover, F., McMillan, C.: The general employee scheduling problem: an integration of ms and ai. Comput. Oper. Res. 13, 563–573 (1986), http://dl.acm.org/citation.cfm?id=15310.15313
Huffman, D.A.: A method for the construction of minimum-redundancy codes. Institute of Radio Engineers 40(9), 1098–1101 (1952)
Isal, R.Y.K., Moffat, A.: Parsing strategies for bwt compression. In: DCC 2001: Proceedings of the Data Compression Conference, p. 429. IEEE Computer Society, Washington, DC (2001)
Koza, J.: Genetic programming: A paradigm for genetically breeding populations of computer programs to solve problems. Technical Report STAN-CS-90-1314, Dept. of Computer Science, Stanford University (1990)
Kuthan, T., Lansky, J.: Genetic algorithms in syllable-based text compression. In: Pokorný, J., Snásel, V., Richta, K. (eds.) DATESO. CEUR Workshop Proceedings, vol. 235. CEUR-WS.org (2007)
Lansky, J., Chernik, K., Vlickova, Z.: Comparison of text models for bwt. In: DCC 2007: Proceedings of the 2007 Data Compression Conference, p. 389. IEEE Computer Society, Washington, DC (2007)
Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press, Cambridge (1996)
Moffat, A.: Implementing the ppm data compression scheme. IEEE Transactions on Communications 38(11), 1917–1921 (1990)
Platos, J., Kromer, P.: Optimizing alphabet using genetic algorithms. In: 11th International Conference on Intelligent Systems Design and Applications (ISDA 2011), pp. 498–503 (November 2011)
Platos, J., Kromer, P.: Reducing Alphabet Using Genetic Algorithms. In: Snasel, V., Platos, J., El-Qawasmeh, E. (eds.) ICDIPC 2011, Part II. CCIS, vol. 189, pp. 82–92. Springer, Heidelberg (2011), http://dx.doi.org/10.1007/978-3-642-22410-2_7 , doi:10.1007/978-3-642-22410-2_7
Platos, J., Kromer, P.: Reducing Alphabet Using Genetic Algorithms. In: Snasel, V., Platos, J., El-Qawasmeh, E. (eds.) ICDIPC 2011, Part II. CCIS, vol. 189, pp. 82–92. Springer, Heidelberg (2011), http://dx.doi.org/10.1007/978-3-642-22410-2_7 , doi:10.1007/978-3-642-22410-2_7
Rissanen, J.: Generalized kraft inequality and arithmetic coding. IBM Journal of Research and Development 20(3), 198–203 (1976)
Rissanen, J., Langdon Jr., G.G.: Arithmetic coding. IBM Journal of Research and Development 23(2), 149–162 (1979)
Salomon, D.: Data Compression - The Complete Reference, 4th edn. Springer-Verlag London Limited (2007)
Shannon, C.E.: A mathematical theory of communication. Bell System Technical Journal 27, 379–423, 623–656 (1948)
Shannon, C.E.: Prediction and entropy of printed english. Bell Systems Technical Journal 30, 50–64 (1951)
Storer, J.A., Szymanski, T.G.: Data compression via textual substitution. Journal of the ACM 26(26/82), 928–951 (1982)
Üçoluk, G., Toroslu, I.H.: A genetic algorithm approach for verification of the syllable-based text compression technique. Journal of Information Science 23(5), 365–372 (1997), http://jis.sagepub.com/content/23/5/365.abstract
Welch, T.: A technique for high-performance data compression. Computer 17(6), 8–19 (1984)
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory IT-23(3), 337–343 (1977)
Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory IT-24(5), 530–536 (1978)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Platos, J., Kromer, P. (2012). Improving Evolved Alphabet Using Tabu Set. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, SB. (eds) Hybrid Artificial Intelligent Systems. HAIS 2012. Lecture Notes in Computer Science(), vol 7208. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28942-2_59
Download citation
DOI: https://doi.org/10.1007/978-3-642-28942-2_59
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28941-5
Online ISBN: 978-3-642-28942-2
eBook Packages: Computer ScienceComputer Science (R0)