Skip to main content

Authorship Verification in the Absence of Explicit Features and Thresholds

  • Conference paper
  • First Online:
Advances in Information Retrieval (ECIR 2018)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10772))

Included in the following conference series:

Abstract

Enhancing information retrieval systems with the ability to take the writing style of people into account opens the door for a number of applications. For example, one can link articles by authorships that can help identifying authors who generate hoaxes and deliberate misinformation in news stories, distributed across different platforms. Authorship verification (AV) is a technique that can be used for this purpose. AV deals with the task to judge, whether two or more documents stem from the same author. The majority of existing AV approaches relies on machine learning concepts based on explicitly defined stylistic features and complex models that involve a fair amount of parameters. Moreover, many existing AV methods are based on explicit thresholds (needed to accept or reject a stated authorship), which are determined on training corpora. We propose a novel parameter-free AV approach, which derives its thresholds for each verification case individually and enables AV in the absence of explicit features and training corpora. In an experimental setup based on eight evaluation corpora (each one from another language) we show that our approach yields competitive results against the current state of the art and other noteworthy AV baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    One-Class Compression Authorship Verifier.

  2. 2.

    PAN is a series of scientific events and shared tasks on digital text forensics [28].

  3. 3.

    Homotopy-based Classification is used in face recognition, where the goal is to measure the contribution of known faces in the generation of an unknown face [6].

  4. 4.

    Available under http://pan.webis.de.

  5. 5.

    Area Under the ROC-Curve.

  6. 6.

    Prediction by Partial Matching. Note that d refers to the order of the PPM model.

  7. 7.

    We use Hathcock’s library (https://github.com/adamhathcock/sharpcompress).

  8. 8.

    “Large Text Compression Benchmark” (http://mattmahoney.net/dc/text.html).

  9. 9.

    Note that this measure is not a metric, as all conditions a metric must satisfy (identity, symmetry and triangle inequality) are not met.

  10. 10.

    All corpora are available upon request under http://bit.do/ECIR_2018.

  11. 11.

    Evidence for this can be seen by comparing the number of published papers across different bibliographic databases such as Google Scholar or Microsoft Academic.

  12. 12.

    We used Van Asch’s script (http://www.clips.uantwerpen.be/scripts/art).

References

  1. Bagnall, D.: Author identification using multi-headed recurrent neural networks. In: Working Notes of CLEF 2015 - Conference and Labs of the Evaluation Forum, Toulouse, France, 8–11 September 2015

    Google Scholar 

  2. Castillo, E., Cervantes, O., No, D.V., Báez, D.: Author verification using a graph-based representation. Int. J. Comput. Appl. 123(14), 1–8 (2015)

    Google Scholar 

  3. Forner, P., Navigli, R., Tufis, D., Ferro, N. (eds.): Working notes for CLEF 2013 Conference, Valencia, Spain, 23–26 September 2013, CEUR Workshop Proceedings, vol. 1179 (2014). CEUR-WS.org

  4. Halvani, O.: Enron Authorship Verification Corpus, Mendeley Data, v1 (2017)

    Google Scholar 

  5. Halvani, O., Winter, C., Graner, L.: On the usefulness of compression models for authorship verification. In: Proceedings of the 12th International Conference on Availability, Reliability and Security, ARES 2017, pp. 54:1–54:10 (2017)

    Google Scholar 

  6. Hernández, J.G.G., Casillas, J., Ledesma, P., Pineda, G.F., Ruíz, I.V.M.: Homotopy based classification for author verification task: notebook for PAN at CLEF 2015. In: Working Notes of CLEF 2015 - Conference and Labs of the Evaluation Forum, Toulouse, France, 8–11 September 2015

    Google Scholar 

  7. Hürlimann, M., Weck, B., von den Berg, E., Šuster, S., Nissim, M.: GLAD: groningen lightweight authorship detection. In: Working Notes of CLEF 2015 - Conference and Labs of the Evaluation Forum, Toulouse, France, 8–11 September 2015

    Google Scholar 

  8. Jankowska, M., Keselj, V., Milios, E.E.: Proximity based one-class classification with common N-gram dissimilarity for authorship verification task notebook for PAN at CLEF 2013. In: Forner et al. [3]

    Google Scholar 

  9. Noecker Jr., J., Ryan, M.: Distractorless authorship verification. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey, May 2012

    Google Scholar 

  10. Juola, P., Stamatatos, E.: Overview of the author identification task at PAN 2013. In: Forner et al. [3]

    Google Scholar 

  11. Kestemont, M., Stover, J.A., Koppel, M., Karsdorp, F., Daelemans, W.: Authenticating the writings of Julius Caesar. Expert Syst. Appl. 63, 86–96 (2016)

    Article  Google Scholar 

  12. Khonji, M., Iraqi, Y.: A slightly-modified GI-based author-verifier with lots of features (ASGALF). In: Working Notes for CLEF 2014 Conference, Sheffield, UK, 15–18 September 2014, pp. 977–983 (2014)

    Google Scholar 

  13. Kocher, M., Savoy, J.: A simple and efficient algorithm for authorship verification. J. Assoc. Inf. Sci. Technol. 68(1), 259–269 (2017). https://doi.org/10.1002/asi.23648

    Article  Google Scholar 

  14. Koppel, M., Schler, J.: Authorship verification as a one-class classification problem. In: Brodley, C.E. (ed.) Machine Learning, Proceedings of the Twenty-First International Conference (ICML 2004), vol. 69, Banff, Alberta, Canada, 4–8 July 2004. ACM (2004)

    Google Scholar 

  15. Koppel, M., Schler, J., Argamon, S.: Authorship attribution in the wild. Lang. Res. Eval. 45(1), 83–94 (2011)

    Article  Google Scholar 

  16. Koppel, M., Winter, Y.: Determining if two documents are written by the same author. JASIST 65(1), 178–187 (2014)

    Google Scholar 

  17. Moreau, E., Jayapal, A., Lynch, G., Vogel, C.: Author verification: basic stacked generalization applied to predictions from a set of heterogeneous learners-notebook for PAN at CLEF 2015. In: Cappellato, L., Ferro, N., Jones, G., San Juan, E. (eds.) CLEF 2015 Evaluation Labs and Workshop - Working Notes Papers, 8–11 September 2015, Toulouse, France (2015). CEUR-WS.org

  18. Nagaprasad, S., Reddy, V., Babu, A.: Authorship attribution based on data compression for telugu text. Int. J. Comput. Appl. 110(1), 1–5 (2015)

    Google Scholar 

  19. Potha, N., Stamatatos, E.: A profile-based method for authorship verification. In: Likas, A., Blekas, K., Kalles, D. (eds.) SETN 2014. LNCS (LNAI), vol. 8445, pp. 313–326. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07064-3_25

    Chapter  Google Scholar 

  20. Potha, N., Stamatatos, E.: An improved Impostors method for authorship verification. In: Jones, G.J.F., Lawless, S., Gonzalo, J., Kelly, L., Goeuriot, L., Mandl, T., Cappellato, L., Ferro, N. (eds.) CLEF 2017. LNCS, vol. 10456, pp. 138–144. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65813-1_14

    Chapter  Google Scholar 

  21. Potthast, M., Kiesel, J., Reinartz, K., Bevendorff, J., Stein, B.: A stylometric inquiry into hyperpartisan and fake news. ArXiv e-prints, February 2017

    Google Scholar 

  22. Rexha, A., Kröll, M., Ziak, H., Kern, R.: Extending scientific literature search by including the author’s writing style. In: Mayr, P., Frommholz, I., Cabanac, G. (eds.) Proceedings of the Fifth Workshop on Bibliometric-Enhanced Information Retrieval (BIR) Co-located with the 39th European Conference on Information Retrieval (ECIR 2017), Aberdeen, UK, 9th April 2017, CEUR Workshop Proceedings, vol. 1823, pp. 93–100 (2017). CEUR-WS.org

  23. Sculley, D., Brodley, C.E.: Compression and machine learning: a new perspective on feature space vectors. In: DCC, pp. 332–332. IEEE Computer Society

    Google Scholar 

  24. Seidman, S.: Authorship verification using the impostors method notebook for PAN at CLEF 2013. In: Forner et al. [3]

    Google Scholar 

  25. Shu, K., Sliva, A., Wang, S., Tang, J., Liu, H.: Fake news detection on social media: a data mining perspective. ACM SIGKDD Explor. Newsl. 19(1), 22–36 (2017)

    Article  Google Scholar 

  26. Stamatatos, E., Daelemans, W., Verhoeven, B., Juola, P., López-López, A., Potthast, M., Stein, B.: Overview of the author identification task at PAN 2015. In: Working Notes of CLEF 2015 - Conference and Labs of the Evaluation forum, Toulouse, France, 8–11 September 2015

    Google Scholar 

  27. Stamatatos, E., Daelemans, W., Verhoeven, B., Stein, B., Potthast, M., Juola, P., Sánchez-Pérez, M.A., Barrón-Cedeño, A.: Overview of the author identification task at PAN 2014. In: Working Notes for CLEF 2014 Conference, Sheffield, UK, 15–18 September 2014, pp. 877–897 (2014)

    Google Scholar 

  28. Stamatatos, E., Potthast, M., Rangel, F., Rosso, P., Stein, B.: Overview of the PAN/CLEF 2015 evaluation lab. In: Mothe, J., Savoy, J., Kamps, J., Pinel-Sauvagnat, K., Jones, G.J.F., SanJuan, E., Cappellato, L., Ferro, N. (eds.) CLEF 2015. LNCS, vol. 9283, pp. 518–538. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24027-5_49

    Chapter  Google Scholar 

  29. Stein, B., Lipka, N., Zu Eissen, S.M.: Meta analysis within authorship verification. In: 19th International Workshop on Database and Expert Systems Applications (DEXA 2008), 1–5 September 2008, Turin, Italy, pp. 34–39. IEEE Computer Society (2008)

    Google Scholar 

  30. Tax, D.M.J.: One-class classification: concept learning in the absence of counter-examples. Ph.D. thesis (2001)

    Google Scholar 

Download references

Acknowledgments

This work was supported by the German Federal Ministry of Education and Research (BMBF) under the project “DORIAN” (Scrutinise and thwart disinformation). We would like to thank Christian Winter and Felix Mayer for their valuable reviews that helped to improve the quality of this paper.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oren Halvani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Halvani, O., Graner, L., Vogel, I. (2018). Authorship Verification in the Absence of Explicit Features and Thresholds. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds) Advances in Information Retrieval. ECIR 2018. Lecture Notes in Computer Science(), vol 10772. Springer, Cham. https://doi.org/10.1007/978-3-319-76941-7_34

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-76941-7_34

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-76940-0

  • Online ISBN: 978-3-319-76941-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics