Improving Native Language Identification Model with Syntactic Features: Case of Arabic

Mechti, Seifeddine; Khoufi, Nabil; Hadrich Belguith, Lamia

doi:10.1007/978-3-030-16660-1_20

Improving Native Language Identification Model with Syntactic Features: Case of Arabic

Seifeddine Mechti¹⁸,
Nabil Khoufi¹⁹ &
Lamia Hadrich Belguith²⁰

Conference paper
First Online: 14 April 2019

1081 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 941))

Abstract

In this paper, we present a method based on machine learning for Arabic native language identification task. We expose a hybrid method that combines surface analysis in texts with an automatic learning method. Unlike the few techniques found in the state of the art, the features selection phase allowed improving performances. We also show the impact of syntactic features for native language identification task. Therefore, the obtained results outperformed those provided by the best methods used for Arabic native language detection.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
http://www.csie.ntu.edu.tw/~cjlin/libsvm.

References

Malmasi, S., Dras, M.: Arabic native language identification. In: Proceedings of the Arabic Natural Language Processing Workshop, Doha, Qatar (2014)
Google Scholar
Koppel, M., Schler, J., Zigdon, K.: Automatically determining an anonymous author’s native language. In: International Conference on Intelligence and Security Informatics, pp. 209–217. Springer, Heidelberg (2005)
Chapter Google Scholar
Argamon, S., Koppel, M., Pennebaker, J.W., Schler, J.: Automatically profiling the author of an anonymous text. Commun. ACM 52(2), 119–123 (2009)
Article Google Scholar
Wong, S.M.J., Dras, M.: Contrastive analysis and native language identification. In: Proceedings of the Australasian Language Technology Association Workshop, pp. 53–61 (2009)
Google Scholar
Kochmar, E.: Identification of a writer’s native language by error analysis. Doctoral dissertation, Master’s thesis, University of Cambridge (2011)
Google Scholar
Bykh, S., Meurers, D.: Native language identification using recurring n-grams–investigating abstraction and domain dependence. In: Proceedings of COLING 2012, pp. 425–440 (2012)
Google Scholar
Ionescu, R.T., Popescu, M., Cahill, A.: Can characters reveal your native language? A language-independent approach to native language identification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1363–1373 (2014)
Google Scholar
Jiang, X., Guo, Y., Geertzen, J., Alexopoulou, D., Sun, L., Korhonen, A.: Native language identification using large, longitudinal data. In: LREC, pp. 3309–3312 (2014)
Google Scholar
Nisioi, S.: Feature analysis for native language identification. In: International Conference on Intelligent Text Processing and Computational Linguistics, pp. 644–657. Springer, Cham (2015)
Chapter Google Scholar
Malmasi, S., Dras, M., Temnikova, I.: Norwegian native language identification. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, pp. 404–412 (2015)
Google Scholar
Lan, W., Hayato, Y.: Robust Chinese native language identification with skip-gram. In: DEIM Forum (2016)
Google Scholar
Boudlal, A., Lakhouaja, A., Mazroui, A., Meziane, A., Bebah, M.O.A.O., Shoul, M.: Alkhalil morpho sys1: a morphosyntactic analysis system for arabic texts. In: International Arab Conference on Information Technology, Benghazi, Libya, pp. 1–6 (2010)
Google Scholar
Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol. 1, pp. 423–430. Association for Computational Linguistics (2003)
Google Scholar
Habash, N.Y.: Introduction to Arabic natural language processing. In: Hirst, G. (ed.) Synthesis Lectures on Human Language Technologies, vol. 3, no. 1 (2010)
Article Google Scholar
Hajic, J., Vidová-Hladká, B., Pajas, P.: The Prague dependency treebank: annotation structure and support. In: Proceedings of the IRCS Workshop on Linguistic Databases, pp. 105–114 (2001)
Google Scholar
Habash, N.Y., Roth, R.M.: CATiB: the Columbia Arabic treebank. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 221–224. Association for Computational Linguistics, Stroudsburg (2009)
Google Scholar
Maamouri, M., Bies, A., Buckwalter, T., Mekki, W.: The Penn Arabic treebank: building a large-scale annotated Arabic corpus. In: The NEMLAR Conference on Arabic Language Resources and Tools, pp. 102–109 (2004)
Google Scholar
Alfaifi, A.Y.G., Atwell, E., Hedaya, I.: Arabic learner corpus (ALC) v2: a new written and spoken corpus of Arabic learners. In: Proceedings of Learner Corpus Studies in Asia and the World 2014, vol. 2, pp. 77–89. Kobe International Communication Center (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

LARODEC Laboratory, ISG of Tunis, University of Tunis, Tunis, Tunisia
Seifeddine Mechti
ANLP Research Group, MIRACL Laboratory, IHE of Sfax, University of Sfax, Sfax, Tunisia
Nabil Khoufi
ANLP Research Group, MIRACL Laboratory, FSEG of Sfax, University of Sfax, Sfax, Tunisia
Lamia Hadrich Belguith

Authors

Seifeddine Mechti
View author publications
You can also search for this author in PubMed Google Scholar
Nabil Khoufi
View author publications
You can also search for this author in PubMed Google Scholar
Lamia Hadrich Belguith
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nabil Khoufi .

Editor information

Editors and Affiliations

Machine Intelligence Research Labs, Auburn, WA, USA
Ajith Abraham
School of Information Technology and Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India
Aswani Kumar Cherukuri
Tijuana Institute of Technology, Tijuana, Mexico
Patricia Melin
Machine Intelligence Research Labs, Auburn, WA, USA
Niketa Gandhi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mechti, S., Khoufi, N., Hadrich Belguith, L. (2020). Improving Native Language Identification Model with Syntactic Features: Case of Arabic. In: Abraham, A., Cherukuri, A., Melin, P., Gandhi, N. (eds) Intelligent Systems Design and Applications. ISDA 2018 2018. Advances in Intelligent Systems and Computing, vol 941. Springer, Cham. https://doi.org/10.1007/978-3-030-16660-1_20

Download citation

DOI: https://doi.org/10.1007/978-3-030-16660-1_20
Published: 14 April 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-16659-5
Online ISBN: 978-3-030-16660-1
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics