skip to main content
10.1145/3485557.3485563acmotherconferencesArticle/Chapter ViewAbstractPublication PagesarabwicConference Proceedingsconference-collections
research-article

Authorship Attribution of Modern Standard Arabic Short Texts

Published: 29 November 2021 Publication History

Abstract

Text data, including short texts, constitute a major share of web content. The availability of this data to billions of users triggers frequent plagiarism attacks. Authorship Attribution (AA) seeks to identify the most probable author of a given text based on similarity to the writing style of potential authors. In this paper, we approach AA as a writing style profile generation process, where we group text instances for each author into a single profile. We use Twitter as the source for our short Modern Standard Arabic (MSA) texts. Numerous experiments with various training approaches, tools and features allowed us to settle on a text representation method that relies on text concatenation of Arabic tweets to form chunks, which are then duplicated to reach a precalculated length. These chunks are used to train machine learning models for our 45 author profiles. This allowed us to achieve accuracies up to 99%, which compares favorably with the best results reported in the literature.

References

[1]
[n.d.]. Twitter Usage Statistics - Internet Live Stats. https://www.internetlivestats.com/twitter-statistics/#trend
[2]
Abdelali, A., Darwish, K., Durrani, N., and Mubarak, H. 2016. Farasa: A Fast and Furious Segmenter for Arabic. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, 2016. Association for Computational Linguistics, San Diego, California, 11–16. https://doi.org/10.18653/v1/n16-3003
[3]
Abu Hammad, Y., Addabe, A., and Ayyad, N.2021. Kottabi. https://youtu.be/BGHxAxJGpTc
[4]
Addabe, A., Abu Hammad, Y., Ayyad, N., and Yahya, A.2021. A Dataset for Authorship Analysis of Short Modern Arabic Text. In FADA: Birzeit University Institutional Repository Dataset Collection.BZU-ECE Department. http://hdl.handle.net/20.500.11889/6743
[5]
Al-falahi, A., Ramdani, M., and Bellafkih, M.2019. Arabic Poetry Authorship Attribution using Machine Learning Techniques. Journal of Computer Science (07 2019), 1012–1021. https://doi.org/10.3844/jcssp.2019.1012.1021
[6]
Al-Sarem, M., Emara, A., and Abdel Wahab, A.2020. Performance of Authorship Attribution Classifiers With Short Texts: Application of Religious Arabic Fatwas. International Journal of Data Mining, Modelling and Management 12, 3 (01 2020), 350–364.
[7]
Altakrori, M., Iqbal, F., Fung, B., Ding, S., and Tubaishat, A.2018. Arabic Authorship Attribution: An Extensive Study on Twitter Posts. ACM Transactions on Asian and Low-Resource Language Information Processing 18, 1 (11 2018), 1–51.
[8]
Belvisi, N., Muhammad, N., and Alonso-Fernandez, F.2020. Forensic Authorship Analysis of Microblogging Texts Using N-Grams and Stylometric Features. 2020 8th International Workshop on Biometrics and Forensics (IWBF) (04 2020), 1–6. https://doi.org/10.1109/IWBF49977.2020.9107953
[9]
Howedi, F., Mohd, M., Aborawi, Z., and Jowan, S.2020. Authorship Attribution of Short Historical Arabic Texts using Stylometric Features and a KNN Classifier with Limited Training Data. Journal of Computer Science 16 (10 2020), 1334–1345.
[10]
Ioannis, K. and S. Efstathios. 2011. Author Identification Using Semi-supervised Learning - Notebook for PAN at CLEF 2011., Vol. 1177. Amsterdam, The Netherlands, 19–22.
[11]
Ishihara, S.2011. A Forensic Authorship Classification in SMS Messages: A Likelihood Ratio Based Approach Using N-gram. In Proceedings of the Australasian Language Technology Association Workshop 2011. Canberra, Australia, 47–56. https://www.aclweb.org/anthology/U11-1008
[12]
Layton, R., Watters, P., and Dazeley, R.2010. Authorship Attribution for Twitter in 140 Characters or Less. Cybercrime and Trustworthy Computing, Workshop 0 (07 2010), 1–8. https://doi.org/10.1109/CTC.2010.17
[13]
Rabab’ah, A., Al-Ayyoub, M., Jararweh, Y., and Aldwairi, M.2016. Authorship Attribution of Arabic tweets. In 2016 IEEE/ACS 13th International Conference of Computer Systems and Applications (AICCSA), Vol. 1. IEEE Computer Society, Los Alamitos, CA, USA, 1–6. https://doi.org/10.1109/AICCSA.2016.7945818
[14]
Saha, N., Das, P., and Saha, H.2017. Authorship Attribution of Short Texts using Multi Layer Perceptron. International Journal of Applied Pattern Recognition 5 (09 2017), 251–259.
[15]
Schwartz, R., Tsur, O., Rappoport, A., and M. Koppel. 2013. Authorship Attribution of Micro-Messages. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Seattle, Washington, USA, 1880–1891. https://aclanthology.org/D13-1193

Cited By

View all
  • (2024)Authorship Attribution in Less-Resourced Languages: A Hybrid Transformer Approach for RomanianApplied Sciences10.3390/app1407270014:7(2700)Online publication date: 23-Mar-2024
  • (2024)Automatic authorship attribution in Albanian textsPLOS ONE10.1371/journal.pone.031005719:10(e0310057)Online publication date: 22-Oct-2024
  • (2024)A Comprehensive Analysis Dashboard for Detecting Similar Saudi Twitter Accounts by Using Stylometric FeaturesACM Transactions on Asian and Low-Resource Language Information Processing10.1145/370500224:2(1-18)Online publication date: 21-Nov-2024
  • Show More Cited By

Index Terms

  1. Authorship Attribution of Modern Standard Arabic Short Texts
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image ACM Other conferences
          ArabWIC 2021: The 7th Annual International Conference on Arab Women in Computing in Conjunction with the 2nd Forum of Women in Research
          August 2021
          145 pages
          ISBN:9781450384186
          DOI:10.1145/3485557
          Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          Published: 29 November 2021

          Permissions

          Request permissions for this article.

          Check for updates

          Author Tags

          1. Arabic Authorship Attribution
          2. Arabic Tweets
          3. Authorship Attribution Datasets
          4. Machine Learning
          5. Social Media
          6. Support Vector Classification

          Qualifiers

          • Research-article
          • Research
          • Refereed limited

          Conference

          ArabWIC 2021

          Acceptance Rates

          Overall Acceptance Rate 20 of 36 submissions, 56%

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)5
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 27 Feb 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)Authorship Attribution in Less-Resourced Languages: A Hybrid Transformer Approach for RomanianApplied Sciences10.3390/app1407270014:7(2700)Online publication date: 23-Mar-2024
          • (2024)Automatic authorship attribution in Albanian textsPLOS ONE10.1371/journal.pone.031005719:10(e0310057)Online publication date: 22-Oct-2024
          • (2024)A Comprehensive Analysis Dashboard for Detecting Similar Saudi Twitter Accounts by Using Stylometric FeaturesACM Transactions on Asian and Low-Resource Language Information Processing10.1145/370500224:2(1-18)Online publication date: 21-Nov-2024
          • (2023)A Transformer-Based Approach to Authorship Attribution in Classical Arabic TextsApplied Sciences10.3390/app1312725513:12(7255)Online publication date: 18-Jun-2023
          • (2023)Albanian Authorship Attribution Model2023 12th Mediterranean Conference on Embedded Computing (MECO)10.1109/MECO58584.2023.10155046(1-5)Online publication date: 6-Jun-2023
          • (2022)A Survey on Authorship Analysis Tasks and TechniquesSEEU Review10.2478/seeur-2022-010017:2(153-167)Online publication date: 30-Dec-2022
          • (2022)Authorship Analysis with Machine LearningEncyclopedia of Machine Learning and Data Science10.1007/978-1-4899-7502-7_986-1(1-4)Online publication date: 6-Apr-2022

          View Options

          Login options

          View options

          PDF

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format.

          HTML Format

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media