Abstract
The Arabic language is the sixth most used language in the world today. It is also used by United Nation. Moreover, the Arabic alphabet is the second most widely used alphabet around the world. Therefore, the computer processing of Arabic language or Arabic alphabet is more and more important task. In the past, several books about analyzing of the Arabic language were published. But the language analysis is only one step in the language processing. Several approaches to the text compression were developed in the field of text compression. The first and most intuitive is character based compression which is suitable for small files. Another approach called word-based compression become very suitable for very long files. The third approach is called syllable-based, it use syllable as basic element. Algorithms for the syllabification of the English, German or other European language are well known, but syllabification algorithms for Arabic and their usage in text compression has not been deeply investigated. This paper describes a new and very simple algorithm for syllabification of Arabic and its usage in text compression.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Bell, T., Witten, I.H., Cleary, J.G.: Modeling for text compression. ACM Comput. Surv. 21(4), 557–591 (1989)
Bloom, C.: Solving the problems of context modeling (March 1998), http://www.cbloom.com/papers/index.html
Buckwalter, T.: Issues in arabic morphological analysis. In: Ide, N., Veronis, J., Soudi, A., Bosch, A.v.d., Neumann, G. (eds.) Arabic Computational Morphology, Text, Speech and Language Technology, vol. 38, pp. 23–41. Springer, Netherlands (2007), http://dx.doi.org/10.1007/978-1-4020-6046-5_3 , doi:10.1007/978-1-4020-6046-5_3
Cameron, R.: Source encoding using syntactic information source models. IEEE Transactions on Information Theory 34(4), 843–850 (1988)
Cleary, J.G., Teahan, W.J., Witten, I.H.: Unbounded length contexts for ppm. In: DCC 1995: Proceedings of the Conference on Data Compression, p. 52. IEEE Computer Society, Washington, DC, USA (1995)
Cormack, G.V., Horspool, R.N.S.: Data compression using dynamic markov modelling. Comput. J. 30(6), 541–550 (1987)
Drinic, M., Kirovski, D., Potkonjak, M.: Ppm model cleaning. In: Proceedings of Data Compression Conference DCC 2003, pp. 163–172 (March 2003)
Dvorsky, J., Pokorny, J., Snasel, V.: Word-based compression methods and indexing for text retrieval systems. In: Eder, J., Rozman, I., Welzer, T. (eds.) ADBIS 1999. LNCS, vol. 1691, pp. 75–84. Springer, Heidelberg (1999)
Dvorsky, J., Snasel, V.: Modifications in burrows-wheeler compression algorithm. In: Proceedings of ISM 2001 (2001)
Encyclopaedia Britannica Online: Alphabet. (February 2011), http://www.britannica.com/EBchecked/topic/17212/alphabet
Gillies, A., Erl, E., Trenkle, J., Schlosser, S.: Arabic text recognition system. In: Proceedings of the Symposium on Document Image Understanding Technology (1999)
Habash, N.Y.: Introduction to arabic natural language processing. Synthesis Lectures on Human Language Technologies 3(1), 1–187 (2010), http://www.morganclaypool.com/doi/abs/10.2200/S00277ED1V01Y201008HLT010
Horspool, R.N.S., Cormack, G.V.: Dynamic markov modeling - a prediction technique, pp. 700–707 (1986)
Horspool, R.N.: Constructing word-based text compression algorithms. In: Proc. IEEE Data Compression Conference, pp. 62–81. IEEE Computer Society Press, Los Alamitos (1992)
Katajainen, J., Penttonen, M., Teuhola, J.: Syntax-directed compression of program files. Softw. Pract. Exper. 16(3), 269–276 (1986)
Kuthan, T., Lansky, J.: Genetic algorithms in syllable-based text compression. In: DATESO (2007)
Lansky, J., Chernik, K., Vlickova, Z.: Comparison of text models for bwt. In: DCC 2007: Proceedings of the 2007 Data Compression Conference, p. 389. IEEE Computer Society, Washington, DC, USA (2007)
Lansky, J., Zemlicka, M.: Compression of small text files using syllables. In: DCC 2006: Proceedings of the Data Compression Conference, p. 458. IEEE Computer Society, Washington, DC, USA (2006)
Lelewer, D., Hirschberg, D.: Streamlining context models for data compression. In: Data Compression Conference, DCC 1991, pp. 313–322 (April 1991)
Maamouri, M., Bies, A., Kulick, S.: Diacritization: A challenge to arabic treebank annotation and parsing. In: Proceedings Of The British Computer Society Arabic Nlp/Mt Conference (2006)
Mahoney, M.V.: Fast text compression with neural networks. In: Proceedings of the Thirteenth International Florida Artificial Intelligence Research Society Conference, pp. 230–234. AAAI Press, Menlo Park (2000)
Moffat, A., Isal, R.Y.K.: Word-based text compression using the burrows-wheeler transform. Inf. Process. Manage 41(5), 1175–1192 (2005)
Platos, J., Dvorsky, J.: Word-based text compression. CoRR abs/0804.3680, 7 (2008)
Platos, J., Dvorsky, J., Martinovic, J.: Using Clustering to Improve WLZ77 Compression. In: First International Conference on the Applications of Digital Information and Web Technologies, Ostrava, CZECH REPUBLIC, August 04-06, vol. 1&2, pp. 315–320. IEEE Commun Soc., IEEE, New York (2008)
Rissanen, J., Langdon, G.J.: Universal modeling and coding. IEEE Transactions on Information Theory 27(1), 12–23 (1981)
Shannon, C.E.: Prediction and Entropy of Printed English. Bell System Technical Journal 30, 50–64 (1951)
Shkarin, D.: Ppm: one step to practicality. In: Proceedings of Data Compression Conference, DCC 2002, pp. 202–211 (2002)
Skibinski, P., Grabowski, S.: Variable-length contexts for ppm. Data Compression Conference, 409 (2004)
Skibinski, P., Grabowski, S., Deorowicz, S.: Revisiting dictionary-based compression: Research articles. Softw. Pract. Exper. 35(15), 1455–1476 (2005)
Trenkle, J., Gilles, A., Eriandson, E., Schlosser, S., Cavin, S.: Advances in arabic text recognition. In: Symposium on Document Image Understanding Technology, pp. 159–168 (April 2001)
Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann, San Francisco (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Soori, H., Platos, J., Snasel, V., Abdulla, H. (2011). Simple Rules for Syllabification of Arabic Texts. In: Snasel, V., Platos, J., El-Qawasmeh, E. (eds) Digital Information Processing and Communications. ICDIPC 2011. Communications in Computer and Information Science, vol 188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22389-1_9
Download citation
DOI: https://doi.org/10.1007/978-3-642-22389-1_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22388-4
Online ISBN: 978-3-642-22389-1
eBook Packages: Computer ScienceComputer Science (R0)