Skip to main content

Effect of Machine Translation on Authorship Attribution

  • Conference paper
  • First Online:
Evolution in Computational Intelligence (FICTA 2023)

Part of the book series: Smart Innovation, Systems and Technologies ((SIST,volume 370))

  • 292 Accesses

Abstract

In this investigation, the effect of automatic machine translation on stylometry is investigated. For that purpose, an Arabic corpus called Hundred of Arabic Travelers (HAT), containing 100 authors, is used. The idea is to translate all the texts of this corpus, which are written in Arabic, to the French language by using Microsoft office translate. An authorship attribution system is applied on both datasets in order to attribute the author identity for each text before and after translation. Thus, a comparative evaluation based on the author attribution score is made between the two datasets. Several types of features are tested, namely: rare words, words, word bi-gram, word tri-gram, character bi-gram, character tri-gram, and character tetra-gram. Those features are used with a centroid nearest neighbor distance for classification. The experimental results have shown that the effect of machine translation reduces the stylometric identification performances, but preserves some characteristics of the author, which makes the identification still possible even after translation. The accuracy of authorship attribution, with 100 authors, on the translated documents is about 80% of correct attribution, while the best accuracy obtained on the original documents is 97%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Farahian, M., Avarzamani, F., Rezaee, M.: Plagiarism in higher education across nations: a case of language students. J. Appl. Res. High. Educ. (2021)

    Google Scholar 

  2. Sabeeh, M., Khaled, F.: Plagiarism detection methods and tools: an overview. Iraqi J. Sci. 2771–2783 (2021)

    Google Scholar 

  3. Pradhan, I., Mishra, S.P., Nayak, A.K.: A collation of machine translation approaches with exemplified comparison of Google and Bing translators. In: International Conference on Intelligent Computing and Communication Technologies, pp. 854–860. Springer, Singapore (2019)

    Google Scholar 

  4. Hedegaard, S., Simonsen, J.G.: Lost in translation: authorship attribution using frame semantics. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 65–70 (2011)

    Google Scholar 

  5. Caliskan, A., Greenstadt, R.: Translate once, translate twice, translate thrice and attribute: identifying authors and machine translation tools in translated text. In: 2012 IEEE Sixth International Conference on Semantic Computing, pp. 121–125 (2012). https://doi.org/10.1109/ICSC.2012.46

  6. Murauer, B., Tschuggnall, M., Specht, G.: On the influence of machine translation on language origin obfuscation (2021). arXiv preprint arXiv:2106.12830

  7. Sayoud, H., Ouamour, S.: HAT-A new corpus for experimental stylometric evaluation in Arabic. In: ExLing 2021, pp. 205–208 (2021)

    Google Scholar 

  8. Microsoft: Translate text into a different language (2022). https://support.microsoft.com/en-us/office/translate-text-into-a-different-language-287380e4-a56c-48a1-9977-f2dca89ce93f. Last visit in July 2022

  9. Eder, M.: Does size matter? Authorship attribution, short samples, big problem. In: Digital Humanities 2010 Conference, pp. 132–135 (2010)

    Google Scholar 

  10. Sayoud, H., Hadjadj, H.: Authorship identification of seven Arabic religious books—a fusion approach. HDSKD J. 6(1), 137–157 (2021). ISSN 2437-069X. https://doi.org/10.5281/zenodo.6353805

  11. Ouamour, S., Sayoud, H.: A comparative survey of authorship attribution on short Arabic texts. In: International Conference on Speech and Computer, pp. 479–489. Springer, Cham (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. Ouamour .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ouamour, S., Sayoud, H. (2023). Effect of Machine Translation on Authorship Attribution. In: Bhateja, V., Yang, XS., Ferreira, M.C., Sengar, S.S., Travieso-Gonzalez, C.M. (eds) Evolution in Computational Intelligence. FICTA 2023. Smart Innovation, Systems and Technologies, vol 370. Springer, Singapore. https://doi.org/10.1007/978-981-99-6702-5_5

Download citation

Publish with us

Policies and ethics