Skip to main content
Log in

Improved score aggregation for authorship verification

  • Regular Paper
  • Published:
Knowledge and Information Systems Aims and scope Submit manuscript

Abstract

The Impostors method is one of the most successful solvers of author verification problems. Given a pair of texts, it aims to find whether the same author wrote them or not. This paper describes a proposed approach with the primary objective of achieving a higher classification accuracy. This higher accuracy is achieved by modifying the vector representations of input texts, such that the effect of them possibly being in different domains is reduced. Such vector modification factors are obtained by the addition of a computational step that empirically estimates the expected difference, or ratio, between the questioned texts’ similarity scores against their in-domain samples. Our evaluation confirms that our proposed approach is capable of achieving higher classification accuracy than the original method. Despite the size of the evaluation dataset, some of the increases in the classification accuracy are large enough to allow for observing statistically significant, very significant, and highly significant gains.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Availability of data and materials

The tool used to extract the features from the considered dataset is available at: https://gitlab.com/mmaakh/fextractor The dataset supporting the conclusions of this article is available in the PAN Data repository, https://pan.webis.de/data.html.

Notes

  1. The word shapes are based on three properties: characters case (e.g., lower/upper case), characters type (whether it is a letter or a number), and words length. For example, the word “School” is represented as the gram “Cccccc,” “2017” is represented as the gram “NNNN,” and “x86” is represented as the gram “cNN.”

  2. Part of speech tags such as NN, NNS, NNP, VB, VBD, and VBG. For example, if the word “school” existed in a text as a noun, then it would be represented as the gram “NN.” A comprehensive list of such tags can be found in https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html.

  3. Combinations of words and their respective POS tags. For example, if the word “saw” was a noun, then it would be represented as the gram “saw-NN,” and if the word was a verb, then it would be represented as the gram “saw-VBD.”

Abbreviations

AA:

Author attribution

AP:

Author profiling

AR:

Approximate randomization

AV:

Author verification

References

  1. Afroz S, Brennan M, Greenstadt R (2012) Detecting hoaxes, frauds, and deception in writing style online, In: ‘Proceedings of the 2012 IEEE symposium on security and privacy’, SP ’12, IEEE Computer Society, Washington, DC, USA, pp 461–475

  2. Boenninghoff B, Hessler S, Kolossa D, Nickel RM (2019) Explainable authorship verification in social media via attention-based similarity learning, In: ‘The IEEE International conference on big data’, pp 36–45

  3. Boenninghoff BT, Rupp J, Nickel RM, Kolossa D (2020) Deep bayes factor scoring for authorship verification, In: ‘Working notes of CLEF 2020—conference and labs of the evaluation forum’, Vol 2696

  4. Castillo E, Cervantes O, Vilariño D (2019) Authorship verification using a graph knowledge discovery approach. J Intell Fuzzy Syst 36(6):6075–6087

    Article  Google Scholar 

  5. Daelemans W (2013) Explanation in Computational Stylometry, In: ‘Proceedings of the 14th International conference on computational linguistics and intelligent text processing’, Vol 2 of CICLing’13, Springer-Verlag, Berlin, Heidelberg, pp 451–462

  6. Halvani O, Graner L, Regev R (2020) Taveer: an interpretable topic-agnostic authorship verification method, In: ‘The 15th international conference on availability, reliability and security’, pp 41:1–41:10

  7. Halvani O, Winter C, Pflug A (2016) Authorship verification for different languages, genres and topics. Digit Investig 16:S33–S43

    Article  Google Scholar 

  8. Joula P, Stamatatos E (2013) Overview of the author identification task at PAN 2013, In: ‘Conference and Labs of the Evaluation Forum’

  9. Kestemont M, Stover J, Koppel M, Karsdorp F, Daelemans W (2016) Authenticating the writings of Julius Caesar. Expert Syst Appl 63:86–96

    Article  Google Scholar 

  10. Khonji M, Iraqi Y (2014) A Slightly-modified GI-based Author-verifier with Lots of Features (ASGALF)–Notebook for PAN at CLEF 2014. In: Cappellato L, Ferro N, Halvey M, Kraaij W (eds) CLEF 2014 Evaluation Labs and Workshop—Working Notes Papers, 15–18 September, Sheffield, UK. CEUR-WS.org

  11. Khonji M, Iraqi Y (2020) Evaluating author attribution on Emirati tweets. IEEE Access 8:149531–149543

    Article  Google Scholar 

  12. Koppel M, Seidman S (2013) Automatically identifying pseudepigraphic texts, In: ‘EMNLP’, ACL, pp 1449–1454

  13. Koppel M, Winter Y (2014) Determining if two documents are written by the same author. J Assoc Inf Sci Technol 65:178–187

    Article  Google Scholar 

  14. Narayanan A, Paskov H, Gong NZ, Bethencourt J, Stefanov E, Shin ECR, Song D (2012) On the feasibility of internet-scale author identification, In: ‘Proceedings of the 2012 IEEE symposium on security and privacy’, SP ’12, IEEE Computer Society, Washington, DC, USA, pp 300–314

  15. Noreen EW (1989) Computer-intensive methods for testing hypotheses?: An Introduction. Wiley-Interscience

  16. Ouyang L, Zhang Y, Liu H, Chen Y, Wang Y (2020) Gated pos-level language model for authorship verification, In: ‘Proceedings of the twenty-ninth international joint conference on artificial intelligence’, pp 4025–4031

  17. Pokhriyal N, Tayal K, Nwogu I, Govindaraju V (2017) Cognitive-biometric recognition from language usage: a feasibility study. IEEE Trans Inf Forensics Secur 12(1):134–143

  18. Potha N, Stamatatos E (2017) An improved impostors method for authorship verification, In: ‘CLEF’, pp 138–144

  19. Potha N, Stamatatos E (2019a) Improved algorithms for extrinsic author verification, Knowl Inf Syst

  20. Potha N, Stamatatos E (2019) Improving author verification based on topic modeling. J Assoc Inf Sci Technol 70(10):1074–1088

    Article  Google Scholar 

  21. Potthast M, Hagen M, Stein B (2016) Author obfuscation: attacking the state of the art in authorship verification, In: Working notes papers of the CLEF 2016 evaluation labs’, CEUR workshop proceedings, CLEF and CEUR-WS.org. http://ceur-ws.org/Vol-1609/

  22. Rao JR, Rohatgi P (2000) Can pseudonymity really guarantee privacy?, In: ‘Proceedings of the 9th USENIX Security Symposium’, USENIX, USENIX, pp 85–96

  23. Seidman S (2013) Authorship verification using the impostors method–notebook for pan at clef, (2013). In: Forner P, Navigli R, Tufis D (eds) ‘CLEF 2013 Evaluation Labs and Workshop—Working Notes Papers, 23–26 September. Spain’, Valencia

  24. Stamatatos E, Daelemans W, Verhoeven B, Stein B, Potthast M, Juola P, Sánchez-Pérez MA, Barrón-Cedeño A (2014) Overview of the author identification task at pan 2014, In: ‘CLEF 2014 Evaluation Labs and Workshop—Working Notes Papers’. Sheffield, UK

  25. Stover J, Winter Y, Koppel M, Kestemont M (2015) Computational authorship verification method attributes a new work to a major 2nd century African author. J Assoc Inf Sci Technol 67:239–242

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Youssef Iraqi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khonji, M., Iraqi, Y. & Mekouar, L. Improved score aggregation for authorship verification. Knowl Inf Syst 65, 1317–1336 (2023). https://doi.org/10.1007/s10115-022-01798-y

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10115-022-01798-y

Keywords

Navigation