Abstract
Arabic parsing is an important task in several NLP applications. Indeed to obtain a robust, efficient and extensible parser treating several phenomena, several issues (i.e., ambiguity and embedded structures) must be resolved. In this context, we will build an Arabic parser based on a deep linguistic study done with a new vision allowing the problem division and on a transducer cascade implemented in the NooJ linguistic platform. This parser is accomplished through our designed dictionaries, morphological grammars and transducers recognizing different sentence forms. The constructed parser is applied to two test corpora containing more than 5900 sentences with different structures. The parser outputs are XML annotated sentences. To evaluate the obtained results, we calculated the measure values of the precision, the recall and the f-measure, and compare them with those obtained by recursive transducer parser. The calculated measure values show that these results are encouraging.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abney, S.: Partial parsing via finite-state cascades. Nat. Lang. Eng. 2(4), 337–344 (1996)
Boukedi, S., Haddar, K.: HPSG grammar for Arabic coordination experimented with LKB system. In: Proceedings of the Twenty-Seventh International Florida Artificial Intelligence Research Society Conference, FLAIRS 2014, Pensacola Beach, Florida, 21–23 May 2014, pp. 166–169 (2014)
Ghezaiel, N., Haddar, K.: Parsing Arabic nominal sentences with transducers to annotate corpora. Computación y Sistemas, 21(4), 647–656 (2017). Advances in Human Language Technologies (Guest Editor: A. Gelbukh)
Hammouda, N.G., Haddar, K.: Integration of a segmentation tool for Arabic corpora in NooJ platform to build an automatic annotation tool. In: Barone, L., Monteleone, M., Silberztein, M. (eds.) NooJ 2016. CCIS, vol. 667, pp. 89–100. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-55002-2_8
Hammouda, N.G., Haddar, K.: Arabic NooJ parser: nominal sentence case. In: Mbarki, S., Mourchid, M., Silberztein, M. (eds.) NooJ 2017. CCIS, vol. 811, pp. 69–80. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73420-0_6
Maamouri, M., Bies, A., Buckwalter, T., Mekki, W.: The Penn Arabic Treebank: building a large-scale annotated Arabic corpus. In: NEMLAR Conference on Arabic Language Resources and Tools, vol. 27, pp. 466–467 (2004)
Mesmia, F.B., Zid, F., Haddar, K., Maurel, D.: ASRextractor: a tool extracting semantic relations between Arabic named entities. In: 3rd International Conference on Arabic Computational Linguistics, ACLing 2017, 5–6 November 2017, Dubai (2017)
Pasha, A., Al-Badrashiny, M., Diab, M.T., El Kholy, A., Eskander, R., Habash, N., Roth, R.: MADAMIRA: a fast, comprehensive tool for morphological analysis and disambiguation of Arabic. In: Proceedings of LREC, Reykjavik, vol. 14, pp. 1094–1101 (2014)
Schiehlen, M.: A cascaded finite-state parser for German. In: Proceedings of EACL 2003, vol. 2, pp. 163–166 (2003)
Silberztein, M.: A new linguistic engine for NooJ: parsing context-sensitive grammars with finite-state machines. In: Mbarki, S., Mourchid, M., Silberztein, M. (eds.) NooJ 2017. CCIS, vol. 811, pp. 240–250. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-73420-0_20
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Ghezaiel Hammouda, N., Torjmen, R., Haddar, K. (2018). Transducer Cascade to Parse Arabic Corpora. In: Silberztein, M., Atigui, F., Kornyshova, E., Métais, E., Meziane, F. (eds) Natural Language Processing and Information Systems. NLDB 2018. Lecture Notes in Computer Science(), vol 10859. Springer, Cham. https://doi.org/10.1007/978-3-319-91947-8_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-91947-8_22
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91946-1
Online ISBN: 978-3-319-91947-8
eBook Packages: Computer ScienceComputer Science (R0)