Abstract
In this paper, we present a method based on machine learning for Arabic native language identification task. We expose a hybrid method that combines surface analysis in texts with an automatic learning method. Unlike the few techniques found in the state of the art, the features selection phase allowed improving performances. We also show the impact of syntactic features for native language identification task. Therefore, the obtained results outperformed those provided by the best methods used for Arabic native language detection.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Malmasi, S., Dras, M.: Arabic native language identification. In: Proceedings of the Arabic Natural Language Processing Workshop, Doha, Qatar (2014)
Koppel, M., Schler, J., Zigdon, K.: Automatically determining an anonymous author’s native language. In: International Conference on Intelligence and Security Informatics, pp. 209–217. Springer, Heidelberg (2005)
Argamon, S., Koppel, M., Pennebaker, J.W., Schler, J.: Automatically profiling the author of an anonymous text. Commun. ACM 52(2), 119–123 (2009)
Wong, S.M.J., Dras, M.: Contrastive analysis and native language identification. In: Proceedings of the Australasian Language Technology Association Workshop, pp. 53–61 (2009)
Kochmar, E.: Identification of a writer’s native language by error analysis. Doctoral dissertation, Master’s thesis, University of Cambridge (2011)
Bykh, S., Meurers, D.: Native language identification using recurring n-grams–investigating abstraction and domain dependence. In: Proceedings of COLING 2012, pp. 425–440 (2012)
Ionescu, R.T., Popescu, M., Cahill, A.: Can characters reveal your native language? A language-independent approach to native language identification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1363–1373 (2014)
Jiang, X., Guo, Y., Geertzen, J., Alexopoulou, D., Sun, L., Korhonen, A.: Native language identification using large, longitudinal data. In: LREC, pp. 3309–3312 (2014)
Nisioi, S.: Feature analysis for native language identification. In: International Conference on Intelligent Text Processing and Computational Linguistics, pp. 644–657. Springer, Cham (2015)
Malmasi, S., Dras, M., Temnikova, I.: Norwegian native language identification. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, pp. 404–412 (2015)
Lan, W., Hayato, Y.: Robust Chinese native language identification with skip-gram. In: DEIM Forum (2016)
Boudlal, A., Lakhouaja, A., Mazroui, A., Meziane, A., Bebah, M.O.A.O., Shoul, M.: Alkhalil morpho sys1: a morphosyntactic analysis system for arabic texts. In: International Arab Conference on Information Technology, Benghazi, Libya, pp. 1–6 (2010)
Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol. 1, pp. 423–430. Association for Computational Linguistics (2003)
Habash, N.Y.: Introduction to Arabic natural language processing. In: Hirst, G. (ed.) Synthesis Lectures on Human Language Technologies, vol. 3, no. 1 (2010)
Hajic, J., Vidová-Hladká, B., Pajas, P.: The Prague dependency treebank: annotation structure and support. In: Proceedings of the IRCS Workshop on Linguistic Databases, pp. 105–114 (2001)
Habash, N.Y., Roth, R.M.: CATiB: the Columbia Arabic treebank. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 221–224. Association for Computational Linguistics, Stroudsburg (2009)
Maamouri, M., Bies, A., Buckwalter, T., Mekki, W.: The Penn Arabic treebank: building a large-scale annotated Arabic corpus. In: The NEMLAR Conference on Arabic Language Resources and Tools, pp. 102–109 (2004)
Alfaifi, A.Y.G., Atwell, E., Hedaya, I.: Arabic learner corpus (ALC) v2: a new written and spoken corpus of Arabic learners. In: Proceedings of Learner Corpus Studies in Asia and the World 2014, vol. 2, pp. 77–89. Kobe International Communication Center (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Mechti, S., Khoufi, N., Hadrich Belguith, L. (2020). Improving Native Language Identification Model with Syntactic Features: Case of Arabic. In: Abraham, A., Cherukuri, A., Melin, P., Gandhi, N. (eds) Intelligent Systems Design and Applications. ISDA 2018 2018. Advances in Intelligent Systems and Computing, vol 941. Springer, Cham. https://doi.org/10.1007/978-3-030-16660-1_20
Download citation
DOI: https://doi.org/10.1007/978-3-030-16660-1_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16659-5
Online ISBN: 978-3-030-16660-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)