Skip to main content

Simple Rules for Syllabification of Arabic Texts

  • Conference paper
Digital Information Processing and Communications (ICDIPC 2011)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 188))

Abstract

The Arabic language is the sixth most used language in the world today. It is also used by United Nation. Moreover, the Arabic alphabet is the second most widely used alphabet around the world. Therefore, the computer processing of Arabic language or Arabic alphabet is more and more important task. In the past, several books about analyzing of the Arabic language were published. But the language analysis is only one step in the language processing. Several approaches to the text compression were developed in the field of text compression. The first and most intuitive is character based compression which is suitable for small files. Another approach called word-based compression become very suitable for very long files. The third approach is called syllable-based, it use syllable as basic element. Algorithms for the syllabification of the English, German or other European language are well known, but syllabification algorithms for Arabic and their usage in text compression has not been deeply investigated. This paper describes a new and very simple algorithm for syllabification of Arabic and its usage in text compression.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Bell, T., Witten, I.H., Cleary, J.G.: Modeling for text compression. ACM Comput. Surv. 21(4), 557–591 (1989)

    Article  Google Scholar 

  2. Bloom, C.: Solving the problems of context modeling (March 1998), http://www.cbloom.com/papers/index.html

  3. Buckwalter, T.: Issues in arabic morphological analysis. In: Ide, N., Veronis, J., Soudi, A., Bosch, A.v.d., Neumann, G. (eds.) Arabic Computational Morphology, Text, Speech and Language Technology, vol. 38, pp. 23–41. Springer, Netherlands (2007), http://dx.doi.org/10.1007/978-1-4020-6046-5_3 , doi:10.1007/978-1-4020-6046-5_3

    Chapter  Google Scholar 

  4. Cameron, R.: Source encoding using syntactic information source models. IEEE Transactions on Information Theory 34(4), 843–850 (1988)

    Article  MathSciNet  Google Scholar 

  5. Cleary, J.G., Teahan, W.J., Witten, I.H.: Unbounded length contexts for ppm. In: DCC 1995: Proceedings of the Conference on Data Compression, p. 52. IEEE Computer Society, Washington, DC, USA (1995)

    Google Scholar 

  6. Cormack, G.V., Horspool, R.N.S.: Data compression using dynamic markov modelling. Comput. J. 30(6), 541–550 (1987)

    Article  MathSciNet  Google Scholar 

  7. Drinic, M., Kirovski, D., Potkonjak, M.: Ppm model cleaning. In: Proceedings of Data Compression Conference DCC 2003, pp. 163–172 (March 2003)

    Google Scholar 

  8. Dvorsky, J., Pokorny, J., Snasel, V.: Word-based compression methods and indexing for text retrieval systems. In: Eder, J., Rozman, I., Welzer, T. (eds.) ADBIS 1999. LNCS, vol. 1691, pp. 75–84. Springer, Heidelberg (1999)

    Chapter  Google Scholar 

  9. Dvorsky, J., Snasel, V.: Modifications in burrows-wheeler compression algorithm. In: Proceedings of ISM 2001 (2001)

    Google Scholar 

  10. Encyclopaedia Britannica Online: Alphabet. (February 2011), http://www.britannica.com/EBchecked/topic/17212/alphabet

  11. Gillies, A., Erl, E., Trenkle, J., Schlosser, S.: Arabic text recognition system. In: Proceedings of the Symposium on Document Image Understanding Technology (1999)

    Google Scholar 

  12. Habash, N.Y.: Introduction to arabic natural language processing. Synthesis Lectures on Human Language Technologies 3(1), 1–187 (2010), http://www.morganclaypool.com/doi/abs/10.2200/S00277ED1V01Y201008HLT010

    Article  Google Scholar 

  13. Horspool, R.N.S., Cormack, G.V.: Dynamic markov modeling - a prediction technique, pp. 700–707 (1986)

    Google Scholar 

  14. Horspool, R.N.: Constructing word-based text compression algorithms. In: Proc. IEEE Data Compression Conference, pp. 62–81. IEEE Computer Society Press, Los Alamitos (1992)

    Google Scholar 

  15. Katajainen, J., Penttonen, M., Teuhola, J.: Syntax-directed compression of program files. Softw. Pract. Exper. 16(3), 269–276 (1986)

    Article  Google Scholar 

  16. Kuthan, T., Lansky, J.: Genetic algorithms in syllable-based text compression. In: DATESO (2007)

    Google Scholar 

  17. Lansky, J., Chernik, K., Vlickova, Z.: Comparison of text models for bwt. In: DCC 2007: Proceedings of the 2007 Data Compression Conference, p. 389. IEEE Computer Society, Washington, DC, USA (2007)

    Google Scholar 

  18. Lansky, J., Zemlicka, M.: Compression of small text files using syllables. In: DCC 2006: Proceedings of the Data Compression Conference, p. 458. IEEE Computer Society, Washington, DC, USA (2006)

    Google Scholar 

  19. Lelewer, D., Hirschberg, D.: Streamlining context models for data compression. In: Data Compression Conference, DCC 1991, pp. 313–322 (April 1991)

    Google Scholar 

  20. Maamouri, M., Bies, A., Kulick, S.: Diacritization: A challenge to arabic treebank annotation and parsing. In: Proceedings Of The British Computer Society Arabic Nlp/Mt Conference (2006)

    Google Scholar 

  21. Mahoney, M.V.: Fast text compression with neural networks. In: Proceedings of the Thirteenth International Florida Artificial Intelligence Research Society Conference, pp. 230–234. AAAI Press, Menlo Park (2000)

    Google Scholar 

  22. Moffat, A., Isal, R.Y.K.: Word-based text compression using the burrows-wheeler transform. Inf. Process. Manage 41(5), 1175–1192 (2005)

    Article  MATH  Google Scholar 

  23. Platos, J., Dvorsky, J.: Word-based text compression. CoRR abs/0804.3680,  7 (2008)

    Google Scholar 

  24. Platos, J., Dvorsky, J., Martinovic, J.: Using Clustering to Improve WLZ77 Compression. In: First International Conference on the Applications of Digital Information and Web Technologies, Ostrava, CZECH REPUBLIC, August 04-06, vol. 1&2, pp. 315–320. IEEE Commun Soc., IEEE, New York (2008)

    Google Scholar 

  25. Rissanen, J., Langdon, G.J.: Universal modeling and coding. IEEE Transactions on Information Theory 27(1), 12–23 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  26. Shannon, C.E.: Prediction and Entropy of Printed English. Bell System Technical Journal 30, 50–64 (1951)

    Article  MATH  Google Scholar 

  27. Shkarin, D.: Ppm: one step to practicality. In: Proceedings of Data Compression Conference, DCC 2002, pp. 202–211 (2002)

    Google Scholar 

  28. Skibinski, P., Grabowski, S.: Variable-length contexts for ppm. Data Compression Conference, 409 (2004)

    Google Scholar 

  29. Skibinski, P., Grabowski, S., Deorowicz, S.: Revisiting dictionary-based compression: Research articles. Softw. Pract. Exper. 35(15), 1455–1476 (2005)

    Article  Google Scholar 

  30. Trenkle, J., Gilles, A., Eriandson, E., Schlosser, S., Cavin, S.: Advances in arabic text recognition. In: Symposium on Document Image Understanding Technology, pp. 159–168 (April 2001)

    Google Scholar 

  31. Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann, San Francisco (1999)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Soori, H., Platos, J., Snasel, V., Abdulla, H. (2011). Simple Rules for Syllabification of Arabic Texts. In: Snasel, V., Platos, J., El-Qawasmeh, E. (eds) Digital Information Processing and Communications. ICDIPC 2011. Communications in Computer and Information Science, vol 188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22389-1_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-22389-1_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-22388-4

  • Online ISBN: 978-3-642-22389-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics