Simple Rules for Syllabification of Arabic Texts

Soori, Hussein; Platos, Jan; Snasel, Vaclav; Abdulla, Hussam

doi:10.1007/978-3-642-22389-1_9

Hussein Soori³,
Jan Platos³,
Vaclav Snasel³ &
…
Hussam Abdulla³

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 188))

Included in the following conference series:

International Conference on Digital Information Processing and Communications

1145 Accesses
1 Citations

Abstract

The Arabic language is the sixth most used language in the world today. It is also used by United Nation. Moreover, the Arabic alphabet is the second most widely used alphabet around the world. Therefore, the computer processing of Arabic language or Arabic alphabet is more and more important task. In the past, several books about analyzing of the Arabic language were published. But the language analysis is only one step in the language processing. Several approaches to the text compression were developed in the field of text compression. The first and most intuitive is character based compression which is suitable for small files. Another approach called word-based compression become very suitable for very long files. The third approach is called syllable-based, it use syllable as basic element. Algorithms for the syllabification of the English, German or other European language are well known, but syllabification algorithms for Arabic and their usage in text compression has not been deeply investigated. This paper describes a new and very simple algorithm for syllabification of Arabic and its usage in text compression.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

An Approach for Arabic Diacritization

Parsing Arabic using induced probabilistic context free grammar

Article 04 September 2015

ComputerAutomatic Robust Rule-Based Phonetization of Standard Arabic

References

Bell, T., Witten, I.H., Cleary, J.G.: Modeling for text compression. ACM Comput. Surv. 21(4), 557–591 (1989)
Article Google Scholar
Bloom, C.: Solving the problems of context modeling (March 1998), http://www.cbloom.com/papers/index.html
Buckwalter, T.: Issues in arabic morphological analysis. In: Ide, N., Veronis, J., Soudi, A., Bosch, A.v.d., Neumann, G. (eds.) Arabic Computational Morphology, Text, Speech and Language Technology, vol. 38, pp. 23–41. Springer, Netherlands (2007), http://dx.doi.org/10.1007/978-1-4020-6046-5_3 , doi:10.1007/978-1-4020-6046-5_3
Chapter Google Scholar
Cameron, R.: Source encoding using syntactic information source models. IEEE Transactions on Information Theory 34(4), 843–850 (1988)
Article MathSciNet Google Scholar
Cleary, J.G., Teahan, W.J., Witten, I.H.: Unbounded length contexts for ppm. In: DCC 1995: Proceedings of the Conference on Data Compression, p. 52. IEEE Computer Society, Washington, DC, USA (1995)
Google Scholar
Cormack, G.V., Horspool, R.N.S.: Data compression using dynamic markov modelling. Comput. J. 30(6), 541–550 (1987)
Article MathSciNet Google Scholar
Drinic, M., Kirovski, D., Potkonjak, M.: Ppm model cleaning. In: Proceedings of Data Compression Conference DCC 2003, pp. 163–172 (March 2003)
Google Scholar
Dvorsky, J., Pokorny, J., Snasel, V.: Word-based compression methods and indexing for text retrieval systems. In: Eder, J., Rozman, I., Welzer, T. (eds.) ADBIS 1999. LNCS, vol. 1691, pp. 75–84. Springer, Heidelberg (1999)
Chapter Google Scholar
Dvorsky, J., Snasel, V.: Modifications in burrows-wheeler compression algorithm. In: Proceedings of ISM 2001 (2001)
Google Scholar
Encyclopaedia Britannica Online: Alphabet. (February 2011), http://www.britannica.com/EBchecked/topic/17212/alphabet
Gillies, A., Erl, E., Trenkle, J., Schlosser, S.: Arabic text recognition system. In: Proceedings of the Symposium on Document Image Understanding Technology (1999)
Google Scholar
Habash, N.Y.: Introduction to arabic natural language processing. Synthesis Lectures on Human Language Technologies 3(1), 1–187 (2010), http://www.morganclaypool.com/doi/abs/10.2200/S00277ED1V01Y201008HLT010
Article Google Scholar
Horspool, R.N.S., Cormack, G.V.: Dynamic markov modeling - a prediction technique, pp. 700–707 (1986)
Google Scholar
Horspool, R.N.: Constructing word-based text compression algorithms. In: Proc. IEEE Data Compression Conference, pp. 62–81. IEEE Computer Society Press, Los Alamitos (1992)
Google Scholar
Katajainen, J., Penttonen, M., Teuhola, J.: Syntax-directed compression of program files. Softw. Pract. Exper. 16(3), 269–276 (1986)
Article Google Scholar
Kuthan, T., Lansky, J.: Genetic algorithms in syllable-based text compression. In: DATESO (2007)
Google Scholar
Lansky, J., Chernik, K., Vlickova, Z.: Comparison of text models for bwt. In: DCC 2007: Proceedings of the 2007 Data Compression Conference, p. 389. IEEE Computer Society, Washington, DC, USA (2007)
Google Scholar
Lansky, J., Zemlicka, M.: Compression of small text files using syllables. In: DCC 2006: Proceedings of the Data Compression Conference, p. 458. IEEE Computer Society, Washington, DC, USA (2006)
Google Scholar
Lelewer, D., Hirschberg, D.: Streamlining context models for data compression. In: Data Compression Conference, DCC 1991, pp. 313–322 (April 1991)
Google Scholar
Maamouri, M., Bies, A., Kulick, S.: Diacritization: A challenge to arabic treebank annotation and parsing. In: Proceedings Of The British Computer Society Arabic Nlp/Mt Conference (2006)
Google Scholar
Mahoney, M.V.: Fast text compression with neural networks. In: Proceedings of the Thirteenth International Florida Artificial Intelligence Research Society Conference, pp. 230–234. AAAI Press, Menlo Park (2000)
Google Scholar
Moffat, A., Isal, R.Y.K.: Word-based text compression using the burrows-wheeler transform. Inf. Process. Manage 41(5), 1175–1192 (2005)
Article MATH Google Scholar
Platos, J., Dvorsky, J.: Word-based text compression. CoRR abs/0804.3680, 7 (2008)
Google Scholar
Platos, J., Dvorsky, J., Martinovic, J.: Using Clustering to Improve WLZ77 Compression. In: First International Conference on the Applications of Digital Information and Web Technologies, Ostrava, CZECH REPUBLIC, August 04-06, vol. 1&2, pp. 315–320. IEEE Commun Soc., IEEE, New York (2008)
Google Scholar
Rissanen, J., Langdon, G.J.: Universal modeling and coding. IEEE Transactions on Information Theory 27(1), 12–23 (1981)
Article MathSciNet MATH Google Scholar
Shannon, C.E.: Prediction and Entropy of Printed English. Bell System Technical Journal 30, 50–64 (1951)
Article MATH Google Scholar
Shkarin, D.: Ppm: one step to practicality. In: Proceedings of Data Compression Conference, DCC 2002, pp. 202–211 (2002)
Google Scholar
Skibinski, P., Grabowski, S.: Variable-length contexts for ppm. Data Compression Conference, 409 (2004)
Google Scholar
Skibinski, P., Grabowski, S., Deorowicz, S.: Revisiting dictionary-based compression: Research articles. Softw. Pract. Exper. 35(15), 1455–1476 (2005)
Article Google Scholar
Trenkle, J., Gilles, A., Eriandson, E., Schlosser, S., Cavin, S.: Advances in arabic text recognition. In: Symposium on Document Image Understanding Technology, pp. 159–168 (April 2001)
Google Scholar
Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann, San Francisco (1999)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Faculty of Electrical Engineering and Computer Science, VSB-Technical University of Ostrava, 17. listopadu 15, 70833, Ostrava, Czech Republic
Hussein Soori, Jan Platos, Vaclav Snasel & Hussam Abdulla

Authors

Hussein Soori
View author publications
You can also search for this author in PubMed Google Scholar
Jan Platos
View author publications
You can also search for this author in PubMed Google Scholar
Vaclav Snasel
View author publications
You can also search for this author in PubMed Google Scholar
Hussam Abdulla
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Electrical Engineering and Computer Science, VSB-Technical University of Ostrava, VŠB-TUO, 17. listopadu 15, 708 33, Ostrava-Poruba, Czech Republic
Vaclav Snasel & Jan Platos &
Information Systems Department, King Saud University, 11543, Riyadh, Saudi Arabia
Eyas El-Qawasmeh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Soori, H., Platos, J., Snasel, V., Abdulla, H. (2011). Simple Rules for Syllabification of Arabic Texts. In: Snasel, V., Platos, J., El-Qawasmeh, E. (eds) Digital Information Processing and Communications. ICDIPC 2011. Communications in Computer and Information Science, vol 188. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-22389-1_9

Download citation

DOI: https://doi.org/10.1007/978-3-642-22389-1_9
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-22388-4
Online ISBN: 978-3-642-22389-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Simple Rules for Syllabification of Arabic Texts

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

An Approach for Arabic Diacritization

Parsing Arabic using induced probabilistic context free grammar

ComputerAutomatic Robust Rule-Based Phonetization of Standard Arabic

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Simple Rules for Syllabification of Arabic Texts

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

An Approach for Arabic Diacritization

Parsing Arabic using induced probabilistic context free grammar

ComputerAutomatic Robust Rule-Based Phonetization of Standard Arabic

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation