Abstract
In the past, several approaches for data compression were developed. The base approach use characters as basic compression unit, but syllable-based and word based approaches were also developed. These approaches define strict borders between basic units. These borders are valid only for tested collections. Moreover, there may be words, which are not syllables, but it is useful to use them even in syllable based approach or in character based approach. Of course, testing of all possibilities is not realizable in finite time. Therefor, a optimization technique may be used as possible solution. This paper describes first steps in the way to optimal compression alphabet - designing the basic algorithms for alphabet reduction using genetic algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abramson, N.: Information Theory and Coding. McGraw-Hill, New York (1963)
Arnold, R., Bell, T.: A corpus for the evaluation of lossless compression algorithms. In: Storer, J.A., Cohn, M. (eds.) Proc. 1997 IEEE Data Compression Conference, pp. 201–210. IEEE Computer Society Press, Los Alamitos (1997)
Bentley, J.L., Sleator, D.D., Tarjan, R.E., Wei, V.K.: A locally adaptive data compression scheme. Commun. ACM 29(4), 320–330 (1986)
Burrows, M., Wheeler, D.J.: A block-sorting lossless data compression algorithm. Technical report, Digital SRC Research Report (1994)
Cleary, J.G., Ian, Witten, H.: Data compression using adaptive coding and partial string matching. IEEE Transactions on Communications 32, 396–402 (1984)
Dvorský, J., Pokorný, J., Snášel, V.: Word-based compression methods and indexing for text retrieval systems. In: Eder, J., Rozman, I., Welzer, T. (eds.) ADBIS 1999. LNCS, vol. 1691, pp. 75–84. Springer, Heidelberg (1999)
Horspool, R.N.: Constructing word-based text compression algorithms. In: Proc. IEEE Data Compression Conference, pp. 62–81. IEEE Computer Society Press, Los Alamitos (1992)
Huffman, D.A.: A method for the construction of minimum-redundancy codes. Institute of Radio Engineers 40(9), 1098–1101 (1952)
Koza, J.: Genetic programming: A paradigm for genetically breeding populations of computer programs to solve problems. Technical Report STAN-CS-90-1314, Dept. of Computer Science, Stanford University (1990)
Kuthan, T., Lansky, J.: Genetic algorithms in syllable-based text compression. In: Pokorný, J., Snásel, V., Richta, K. (eds.) CEUR Workshop Proceedings. DATESO, vol. 235 (2007), CEUR-WS.org
Lánský, J.: Slabiková komprese. Master’s thesis, Charles University in Prague, in czech language (April 2005)
Lansky, J., Chernik, K., Vlickova, Z.: Comparison of text models for bwt. In: DCC 2007: Proceedings of the 2007 Data Compression Conference, p. 389. IEEE Computer Society, Washington, DC, USA (2007)
Lansky, J., Zemlicka, M.: Text compression: Syllables. In: Richta, K., Snásel, V., Pokorný, J. (eds.) CEUR Workshop Proceedings. DATESO, vol. 129, pp. 32–45 (2005), CEUR-WS.org
Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press, Cambridge (1996)
Moffat, A.: Implementing the ppm data compression scheme. IEEE Transactions on Communications 38(11), 1917–1921 (1990)
Moffat, A., Isal, R.Y.K.: Word-based text compression using the burrows-wheeler transform. Inf. Process. Manage. 41(5), 1175–1192 (2005)
Rissanen, J.: Generalized kraft inequality and arithmetic coding. IBM Journal of Research and Development 20(3), 198–203 (1976)
Rissanen, J., Langgon Jr, G.G.: Arithmetic coding. IBM Journal of Research and Development 23(2), 149–162 (1979)
Salomon, D.: Data Compression - The Complete Reference, 4th edn. Springer, London (2007)
Shannon, C.E.: A mathematical theory of communication. Bell System Technical Journal 27, 379–423, 623–656 (1948)
Shannon, C.E.: Prediction and entropy of printed english. Bell Systems Technical Journal 30, 50–64 (1951)
Üçoluk, G., Toroslu, I.H.: A genetic algorithm approach for verification of the syllable-based text compression technique. Journal of Information Science 23(5), 365–372 (1997)
Witten, I., Moffat, A., Bell, T.: Managing Gigabytes: Compressing and Indexing Documents and Images. Van Nostrand Reinhold (1994)
Ziv, J., Lempel, A.: A universal algorithm for sequential data compression. IEEE Transactions on Information Theory IT-23(3), 337–343 (1977)
Ziv, J., Lempel, A.: Compression of individual sequences via variable-rate coding. IEEE Transactions on Information Theory IT-24(5), 530–536 (1978)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Platos, J., Kromer, P. (2011). Reducing Alphabet Using Genetic Algorithms. In: Snasel, V., Platos, J., El-Qawasmeh, E. (eds) Digital Information Processing and Communications. ICDIPC 2011. Communications in Computer and Information Science, vol 189. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22410-2_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-22410-2_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22409-6
Online ISBN: 978-3-642-22410-2
eBook Packages: Computer ScienceComputer Science (R0)