Abstract
Language resources such as grammars or dictionaries are very important to any natural language processing application. Unfortunately, the manual construction of these resources is laborious and time-consuming. The use of annotated corpora as a knowledge database might be a solution to a fast construction of a grammar for a given language. In this paper, we present our framework to automatically induce a syntactic grammar from an Arabic annotated corpus (The Penn Arabic TreeBank), a probabilistic context free grammar in our case. The developed system allows the user to build a probabilistic context free grammar from the annotated corpus syntactic trees. It’s also offer the possibility to parse Arabic sentences using the generated resource. Finally, we present evaluation results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Khoufi, N., Boudokhane, M.: Statistical-based system for morphological annotation of Arabic texts. In: Proceedings of the Recent Advances in Natural Language Processing (RANLP 2013), Hissar, Bulgaria, pp. 100–106 (2013)
McCord, M.C., Cavalli-Sforza, V.: An Arabic slot grammar parser. In: Proceedings of the 2007 Workshop on Computational Approaches to Semitic Languages: Common Issues and Resources, pp. 81–88. Association for Computational Linguistics (2007)
Buckwalter, T.: Buckwalter Arabic morphological analyzer version 2.0 (2004)
Bataineh, B.M., Bataineh, E.A.: An efficient recursive transition network parser for Arabic language. In: Proceedings of the World Congress on Engineering, vol. 2, pp. 1–3 (2009)
Klein, D., Manning, C.D.: Fast exact inference with a factored model for natural language parsing. In: Advances in Neural Information Processing Systems 15 (NIPS 2002), pp. 3–10. MIT Press, Cambridge (2003)
Green, S., Manning, C.D.: Better Arabic parsing: baselines, evaluations, and analysis. In: Proceedings of the 23rd International Conference on Computational Linguistics, pp. 394–402. Association for Computational Linguistics, August 2010
Al-Taani, A., Msallam, M., Wedian, S.: A top-down chart parser for analyzing Arabic sentences. Int. Arab J. Inf. Technol. 9, 109–116 (2012)
Alqrainy, S., Muaidi, H., Alkoffash, M.S.: Context-free grammar analysis for Arabic sentences. Int. J. Comput. Appl. 53(3), 7–11 (2012)
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. O’Reilly Media Inc., Sebastopol (2009)
Khoufi, N., Aloulou, C., Hadrich Belguith, L.: Parsing Arabic using induced probabilistic context free grammar. Int. J. Speech Technol. 19, 1–11 (2015). https://doi.org/10.1007/s10772-015-9300-x
Habash, N.Y.: Introduction to Arabic Natural Language Processing: Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers, San Rafael (2010). G. Hirst (Series ed.) 3(1)
Hajic, J., VidovĂ¡-HladkĂ¡, B., Pajas, P.: The Prague dependency treebank: annotation structure and support. In: Proceedings of the IRCS Workshop on Linguistic Databases, pp. 105–114 (2001)
Habash, N.Y., Roth, R.M.: CATiB: The Columbia Arabic Treebank. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 221–224. Association for Computational Linguistics, Stroudsburg, August 2009
Maamouri, M., Bies, A., Buckwalter, T., Mekki, W.: The Penn Arabic Treebank: building a large-scale annotated Arabic corpus. In: The NEMLAR Conference on Arabic Language Resources and Tools, pp. 102–109, September 2004
Maamouri, M., Bies, A., Kulick, S.: Enhancing the Arabic Treebank: a collaborative effort toward new annotation guidelines. In: Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC 2008), Marrakech, Morocco, 28–30 May 2008
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Khoufi, N., Aloulou, C., Hadrich Belguith, L. (2018). A Framework for Language Resource Construction and Syntactic Analysis: Case of Arabic. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9623. Springer, Cham. https://doi.org/10.1007/978-3-319-75477-2_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-75477-2_25
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75476-5
Online ISBN: 978-3-319-75477-2
eBook Packages: Computer ScienceComputer Science (R0)