Abstract
In this investigation, the effect of automatic machine translation on stylometry is investigated. For that purpose, an Arabic corpus called Hundred of Arabic Travelers (HAT), containing 100 authors, is used. The idea is to translate all the texts of this corpus, which are written in Arabic, to the French language by using Microsoft office translate. An authorship attribution system is applied on both datasets in order to attribute the author identity for each text before and after translation. Thus, a comparative evaluation based on the author attribution score is made between the two datasets. Several types of features are tested, namely: rare words, words, word bi-gram, word tri-gram, character bi-gram, character tri-gram, and character tetra-gram. Those features are used with a centroid nearest neighbor distance for classification. The experimental results have shown that the effect of machine translation reduces the stylometric identification performances, but preserves some characteristics of the author, which makes the identification still possible even after translation. The accuracy of authorship attribution, with 100 authors, on the translated documents is about 80% of correct attribution, while the best accuracy obtained on the original documents is 97%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Farahian, M., Avarzamani, F., Rezaee, M.: Plagiarism in higher education across nations: a case of language students. J. Appl. Res. High. Educ. (2021)
Sabeeh, M., Khaled, F.: Plagiarism detection methods and tools: an overview. Iraqi J. Sci. 2771–2783 (2021)
Pradhan, I., Mishra, S.P., Nayak, A.K.: A collation of machine translation approaches with exemplified comparison of Google and Bing translators. In: International Conference on Intelligent Computing and Communication Technologies, pp. 854–860. Springer, Singapore (2019)
Hedegaard, S., Simonsen, J.G.: Lost in translation: authorship attribution using frame semantics. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 65–70 (2011)
Caliskan, A., Greenstadt, R.: Translate once, translate twice, translate thrice and attribute: identifying authors and machine translation tools in translated text. In: 2012 IEEE Sixth International Conference on Semantic Computing, pp. 121–125 (2012). https://doi.org/10.1109/ICSC.2012.46
Murauer, B., Tschuggnall, M., Specht, G.: On the influence of machine translation on language origin obfuscation (2021). arXiv preprint arXiv:2106.12830
Sayoud, H., Ouamour, S.: HAT-A new corpus for experimental stylometric evaluation in Arabic. In: ExLing 2021, pp. 205–208 (2021)
Microsoft: Translate text into a different language (2022). https://support.microsoft.com/en-us/office/translate-text-into-a-different-language-287380e4-a56c-48a1-9977-f2dca89ce93f. Last visit in July 2022
Eder, M.: Does size matter? Authorship attribution, short samples, big problem. In: Digital Humanities 2010 Conference, pp. 132–135 (2010)
Sayoud, H., Hadjadj, H.: Authorship identification of seven Arabic religious books—a fusion approach. HDSKD J. 6(1), 137–157 (2021). ISSN 2437-069X. https://doi.org/10.5281/zenodo.6353805
Ouamour, S., Sayoud, H.: A comparative survey of authorship attribution on short Arabic texts. In: International Conference on Speech and Computer, pp. 479–489. Springer, Cham (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Ouamour, S., Sayoud, H. (2023). Effect of Machine Translation on Authorship Attribution. In: Bhateja, V., Yang, XS., Ferreira, M.C., Sengar, S.S., Travieso-Gonzalez, C.M. (eds) Evolution in Computational Intelligence. FICTA 2023. Smart Innovation, Systems and Technologies, vol 370. Springer, Singapore. https://doi.org/10.1007/978-981-99-6702-5_5
Download citation
DOI: https://doi.org/10.1007/978-981-99-6702-5_5
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-6701-8
Online ISBN: 978-981-99-6702-5
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)