ABSTRACT
We present in this paper an Arabic morpho-syntactic analyzer (Morphar+) built on top of the free Arabic Morphological analyzer (AraMorph). It is known that AraMorph produces a large number of morphological solutions, but little information to select the appropriate morphological solution for words in context. For this purpose, we start characterizing/describing all particles of the Arabic language, broken noun patterns, and most nominal and verbal sentence structures; next, we formulated dozens of rules associated with these descriptions and then programmed them in a simple and efficient manner to help deducing not only the appropriate solution but also both case and ending case marks. We divided the Arabic particles into groups according to their grammatical functions for extracting the exact and final morphological function of words. Appropriate contextual rules have been stated based on the above descriptions; after applying our contextual rules on the output produced by AraMorph, we obtained an improvement of about 6% in the number of correct words with an accurate morphological function. Our goal is to reduce the error rate to less than 5% in order to integrate this very fast and accurate morpho-syntactic analyzer into a system to translate Arabic written text into Arabic sign language; we believe that this will give enough information to achieve a quick and effective translation.
- Alansary S, Nagi M, Adly N. Towards Analyzing the International Corpus of Arabic (ICA): Progress of Morphological Stage. 8th International Conference on Language Engineering, Egypt, December 2008.Google Scholar
- Atwell E, Al-Sulaiti L, Al-Osaimi S, Abu-Shawar B. A Review of Arabic Corpus Analysis Tools. Proceedings of JEP-TALN'04 Arabic Language Processing, Fez, April 2004.Google Scholar
- Tim Buckwalter. Buckwalter Arabic Morphological Analyzer Version 2.0. LDC Catalog No. LDC2004L02, Linguistic Data Consortium, 2004, www.ldc.upenn.edu/Catalog.Google Scholar
- Beesley K. Xerox Arabic Morphological Analyzer Surface-Language (Unicode) documentation. Xerox Research Center Europe, 2003.Google Scholar
- Berri J, Zidom H, Atif Y. Web-based Arabic Morphological Analyzer. In Gelbukh, A (Ed): CICLing 2001, LNCS 2004, pp 216--225, Springer-Verlag Berlin Heidlberg. Google ScholarDigital Library
- Darwish K. Building a shallow Arabic Morphological Analyzer in One Day. ACL02 Workshop on Computer Processing of Semitic Languages, 2002. Google ScholarDigital Library
- Sakhr's Morphological Analyzer.Google Scholar
- RDI Arabic Morphological Analyzer.Google Scholar
- Al-Khalil morphological analyzer.Google Scholar
- Jaccarini A., Mourad G., Gaubert C, Dijioua B. Un logiciel pour la mise au point de grammaires pour le filtrage d'information en Arabe. TALN03, Batz-sur-Mer, 11--14 juin 2003.Google Scholar
- Habash, Nizar, Owen Rambow and Ryan Roth. MADA+TOKAN: A Toolkit for Arabic Tokenization, Diacritization, Morphological Disambiguation, POS Tagging, Stemming and Lemmatization. In Proceedings of the 2nd International Conference on Arabic Language Resources and Tools (MEDAR), Cairo, Egypt, 2009.Google Scholar
- Boudlal, R. Belahbib, A. Lakhouaja, A. Mazroui, A. Meziane, M. Bebah. A Markovian Approach for Arabic Root Extraction. The International Arab Journal of Information Technology, Vol. 8, No. 1, January 2011.Google Scholar
- Y.O.M. Elhadj, A. M. Alansari, LA AlSughayeir. Using Statistical Models for Automatic Recognition of Arabic Terms Properties (in Arabic). International Journal of Computer Science and Engineering in Arabic, Vol. 3, No 2, 2010.Google Scholar
- Y.O.M. Elhadj, Z. Zemirli. Virtual Translator from Arabic text to Saudi Sign-Language (A2SaL). Annual Technical Report of the Project Number: 08-INF432-8, KACST, KSA, 2011.Google Scholar
Index Terms
- Morphar+: an Arabic morphosyntactic analyzer
Recommendations
Revision for recognizing Chinese handwritten sentences based on lexical, syntactical and corpus rules
ROCLING '11: ROCLING 2011 Poster PapersRecognition of off-line handwritten Chinese character had been an important problem. Because of the variation and vagueness derived from different users' handwritings, it was hard to recognize handwriting characters via statistical features obtained ...
Empirical studies in strategies for Arabic retrieval
SIGIR '02: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrievalThis work evaluates a few search strategies for Arabic monolingual and cross-lingual retrieval, using the TREC Arabic corpus as the test-bed. The release by NIST in 2001 of an Arabic corpus of nearly 400k documents with both monolingual and cross-...
Constructing lexicon with morpho-syntactic features from untagged corpora
ECC'09: Proceedings of the 3rd international conference on European computing conferenceThis article presents a computational method of morpho-syntactic rules which automatically creates a lexicon with morphological features after disambiguation and PoS tagging in large non annotated corpora. The method is tested and implemented in two ...
Comments