Skip to main content

Improving Native Language Identification Model with Syntactic Features: Case of Arabic

  • Conference paper
  • First Online:
  • 1081 Accesses

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 941))

Abstract

In this paper, we present a method based on machine learning for Arabic native language identification task. We expose a hybrid method that combines surface analysis in texts with an automatic learning method. Unlike the few techniques found in the state of the art, the features selection phase allowed improving performances. We also show the impact of syntactic features for native language identification task. Therefore, the obtained results outperformed those provided by the best methods used for Arabic native language detection.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.csie.ntu.edu.tw/~cjlin/libsvm.

References

  1. Malmasi, S., Dras, M.: Arabic native language identification. In: Proceedings of the Arabic Natural Language Processing Workshop, Doha, Qatar (2014)

    Google Scholar 

  2. Koppel, M., Schler, J., Zigdon, K.: Automatically determining an anonymous author’s native language. In: International Conference on Intelligence and Security Informatics, pp. 209–217. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  3. Argamon, S., Koppel, M., Pennebaker, J.W., Schler, J.: Automatically profiling the author of an anonymous text. Commun. ACM 52(2), 119–123 (2009)

    Article  Google Scholar 

  4. Wong, S.M.J., Dras, M.: Contrastive analysis and native language identification. In: Proceedings of the Australasian Language Technology Association Workshop, pp. 53–61 (2009)

    Google Scholar 

  5. Kochmar, E.: Identification of a writer’s native language by error analysis. Doctoral dissertation, Master’s thesis, University of Cambridge (2011)

    Google Scholar 

  6. Bykh, S., Meurers, D.: Native language identification using recurring n-grams–investigating abstraction and domain dependence. In: Proceedings of COLING 2012, pp. 425–440 (2012)

    Google Scholar 

  7. Ionescu, R.T., Popescu, M., Cahill, A.: Can characters reveal your native language? A language-independent approach to native language identification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1363–1373 (2014)

    Google Scholar 

  8. Jiang, X., Guo, Y., Geertzen, J., Alexopoulou, D., Sun, L., Korhonen, A.: Native language identification using large, longitudinal data. In: LREC, pp. 3309–3312 (2014)

    Google Scholar 

  9. Nisioi, S.: Feature analysis for native language identification. In: International Conference on Intelligent Text Processing and Computational Linguistics, pp. 644–657. Springer, Cham (2015)

    Chapter  Google Scholar 

  10. Malmasi, S., Dras, M., Temnikova, I.: Norwegian native language identification. In: Proceedings of the International Conference Recent Advances in Natural Language Processing, pp. 404–412 (2015)

    Google Scholar 

  11. Lan, W., Hayato, Y.: Robust Chinese native language identification with skip-gram. In: DEIM Forum (2016)

    Google Scholar 

  12. Boudlal, A., Lakhouaja, A., Mazroui, A., Meziane, A., Bebah, M.O.A.O., Shoul, M.: Alkhalil morpho sys1: a morphosyntactic analysis system for arabic texts. In: International Arab Conference on Information Technology, Benghazi, Libya, pp. 1–6 (2010)

    Google Scholar 

  13. Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol. 1, pp. 423–430. Association for Computational Linguistics (2003)

    Google Scholar 

  14. Habash, N.Y.: Introduction to Arabic natural language processing. In: Hirst, G. (ed.) Synthesis Lectures on Human Language Technologies, vol. 3, no. 1 (2010)

    Article  Google Scholar 

  15. Hajic, J., Vidová-Hladká, B., Pajas, P.: The Prague dependency treebank: annotation structure and support. In: Proceedings of the IRCS Workshop on Linguistic Databases, pp. 105–114 (2001)

    Google Scholar 

  16. Habash, N.Y., Roth, R.M.: CATiB: the Columbia Arabic treebank. In: Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pp. 221–224. Association for Computational Linguistics, Stroudsburg (2009)

    Google Scholar 

  17. Maamouri, M., Bies, A., Buckwalter, T., Mekki, W.: The Penn Arabic treebank: building a large-scale annotated Arabic corpus. In: The NEMLAR Conference on Arabic Language Resources and Tools, pp. 102–109 (2004)

    Google Scholar 

  18. Alfaifi, A.Y.G., Atwell, E., Hedaya, I.: Arabic learner corpus (ALC) v2: a new written and spoken corpus of Arabic learners. In: Proceedings of Learner Corpus Studies in Asia and the World 2014, vol. 2, pp. 77–89. Kobe International Communication Center (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nabil Khoufi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Mechti, S., Khoufi, N., Hadrich Belguith, L. (2020). Improving Native Language Identification Model with Syntactic Features: Case of Arabic. In: Abraham, A., Cherukuri, A., Melin, P., Gandhi, N. (eds) Intelligent Systems Design and Applications. ISDA 2018 2018. Advances in Intelligent Systems and Computing, vol 941. Springer, Cham. https://doi.org/10.1007/978-3-030-16660-1_20

Download citation

Publish with us

Policies and ethics