Authorship Verification in the Absence of Explicit Features and Thresholds

Halvani, Oren; Graner, Lukas; Vogel, Inna

doi:10.1007/978-3-319-76941-7_34

Oren Halvani¹⁷,
Lukas Graner¹⁷ &
Inna Vogel¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 10772))

Included in the following conference series:

European Conference on Information Retrieval

4482 Accesses
19 Citations
1 Altmetric

Abstract

Enhancing information retrieval systems with the ability to take the writing style of people into account opens the door for a number of applications. For example, one can link articles by authorships that can help identifying authors who generate hoaxes and deliberate misinformation in news stories, distributed across different platforms. Authorship verification (AV) is a technique that can be used for this purpose. AV deals with the task to judge, whether two or more documents stem from the same author. The majority of existing AV approaches relies on machine learning concepts based on explicitly defined stylistic features and complex models that involve a fair amount of parameters. Moreover, many existing AV methods are based on explicit thresholds (needed to accept or reject a stated authorship), which are determined on training corpora. We propose a novel parameter-free AV approach, which derives its thresholds for each verification case individually and enables AV in the absence of explicit features and training corpora. In an experimental setup based on eight evaluation corpora (each one from another language) we show that our approach yields competitive results against the current state of the art and other noteworthy AV baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
One-Class Compression Authorship Verifier.
2.
PAN is a series of scientific events and shared tasks on digital text forensics [28].
3.
Homotopy-based Classification is used in face recognition, where the goal is to measure the contribution of known faces in the generation of an unknown face [6].
4.
Available under http://pan.webis.de.
5.
Area Under the ROC-Curve.
6.
Prediction by Partial Matching. Note that d refers to the order of the PPM model.
7.
We use Hathcock’s library (https://github.com/adamhathcock/sharpcompress).
8.
“Large Text Compression Benchmark” (http://mattmahoney.net/dc/text.html).
9.
Note that this measure is not a metric, as all conditions a metric must satisfy (identity, symmetry and triangle inequality) are not met.
10.
All corpora are available upon request under http://bit.do/ECIR_2018.
11.
Evidence for this can be seen by comparing the number of published papers across different bibliographic databases such as Google Scholar or Microsoft Academic.
12.
We used Van Asch’s script (http://www.clips.uantwerpen.be/scripts/art).

References

Bagnall, D.: Author identification using multi-headed recurrent neural networks. In: Working Notes of CLEF 2015 - Conference and Labs of the Evaluation Forum, Toulouse, France, 8–11 September 2015
Google Scholar
Castillo, E., Cervantes, O., No, D.V., Báez, D.: Author verification using a graph-based representation. Int. J. Comput. Appl. 123(14), 1–8 (2015)
Google Scholar
Forner, P., Navigli, R., Tufis, D., Ferro, N. (eds.): Working notes for CLEF 2013 Conference, Valencia, Spain, 23–26 September 2013, CEUR Workshop Proceedings, vol. 1179 (2014). CEUR-WS.org
Halvani, O.: Enron Authorship Verification Corpus, Mendeley Data, v1 (2017)
Google Scholar
Halvani, O., Winter, C., Graner, L.: On the usefulness of compression models for authorship verification. In: Proceedings of the 12th International Conference on Availability, Reliability and Security, ARES 2017, pp. 54:1–54:10 (2017)
Google Scholar
Hernández, J.G.G., Casillas, J., Ledesma, P., Pineda, G.F., Ruíz, I.V.M.: Homotopy based classification for author verification task: notebook for PAN at CLEF 2015. In: Working Notes of CLEF 2015 - Conference and Labs of the Evaluation Forum, Toulouse, France, 8–11 September 2015
Google Scholar
Hürlimann, M., Weck, B., von den Berg, E., Šuster, S., Nissim, M.: GLAD: groningen lightweight authorship detection. In: Working Notes of CLEF 2015 - Conference and Labs of the Evaluation Forum, Toulouse, France, 8–11 September 2015
Google Scholar
Jankowska, M., Keselj, V., Milios, E.E.: Proximity based one-class classification with common N-gram dissimilarity for authorship verification task notebook for PAN at CLEF 2013. In: Forner et al. [3]
Google Scholar
Noecker Jr., J., Ryan, M.: Distractorless authorship verification. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey, May 2012
Google Scholar
Juola, P., Stamatatos, E.: Overview of the author identification task at PAN 2013. In: Forner et al. [3]
Google Scholar
Kestemont, M., Stover, J.A., Koppel, M., Karsdorp, F., Daelemans, W.: Authenticating the writings of Julius Caesar. Expert Syst. Appl. 63, 86–96 (2016)
Article Google Scholar
Khonji, M., Iraqi, Y.: A slightly-modified GI-based author-verifier with lots of features (ASGALF). In: Working Notes for CLEF 2014 Conference, Sheffield, UK, 15–18 September 2014, pp. 977–983 (2014)
Google Scholar
Kocher, M., Savoy, J.: A simple and efficient algorithm for authorship verification. J. Assoc. Inf. Sci. Technol. 68(1), 259–269 (2017). https://doi.org/10.1002/asi.23648
Article Google Scholar
Koppel, M., Schler, J.: Authorship verification as a one-class classification problem. In: Brodley, C.E. (ed.) Machine Learning, Proceedings of the Twenty-First International Conference (ICML 2004), vol. 69, Banff, Alberta, Canada, 4–8 July 2004. ACM (2004)
Google Scholar
Koppel, M., Schler, J., Argamon, S.: Authorship attribution in the wild. Lang. Res. Eval. 45(1), 83–94 (2011)
Article Google Scholar
Koppel, M., Winter, Y.: Determining if two documents are written by the same author. JASIST 65(1), 178–187 (2014)
Google Scholar
Moreau, E., Jayapal, A., Lynch, G., Vogel, C.: Author verification: basic stacked generalization applied to predictions from a set of heterogeneous learners-notebook for PAN at CLEF 2015. In: Cappellato, L., Ferro, N., Jones, G., San Juan, E. (eds.) CLEF 2015 Evaluation Labs and Workshop - Working Notes Papers, 8–11 September 2015, Toulouse, France (2015). CEUR-WS.org
Nagaprasad, S., Reddy, V., Babu, A.: Authorship attribution based on data compression for telugu text. Int. J. Comput. Appl. 110(1), 1–5 (2015)
Google Scholar
Potha, N., Stamatatos, E.: A profile-based method for authorship verification. In: Likas, A., Blekas, K., Kalles, D. (eds.) SETN 2014. LNCS (LNAI), vol. 8445, pp. 313–326. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07064-3_25
Chapter Google Scholar
Potha, N., Stamatatos, E.: An improved Impostors method for authorship verification. In: Jones, G.J.F., Lawless, S., Gonzalo, J., Kelly, L., Goeuriot, L., Mandl, T., Cappellato, L., Ferro, N. (eds.) CLEF 2017. LNCS, vol. 10456, pp. 138–144. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65813-1_14
Chapter Google Scholar
Potthast, M., Kiesel, J., Reinartz, K., Bevendorff, J., Stein, B.: A stylometric inquiry into hyperpartisan and fake news. ArXiv e-prints, February 2017
Google Scholar
Rexha, A., Kröll, M., Ziak, H., Kern, R.: Extending scientific literature search by including the author’s writing style. In: Mayr, P., Frommholz, I., Cabanac, G. (eds.) Proceedings of the Fifth Workshop on Bibliometric-Enhanced Information Retrieval (BIR) Co-located with the 39th European Conference on Information Retrieval (ECIR 2017), Aberdeen, UK, 9th April 2017, CEUR Workshop Proceedings, vol. 1823, pp. 93–100 (2017). CEUR-WS.org
Sculley, D., Brodley, C.E.: Compression and machine learning: a new perspective on feature space vectors. In: DCC, pp. 332–332. IEEE Computer Society
Google Scholar
Seidman, S.: Authorship verification using the impostors method notebook for PAN at CLEF 2013. In: Forner et al. [3]
Google Scholar
Shu, K., Sliva, A., Wang, S., Tang, J., Liu, H.: Fake news detection on social media: a data mining perspective. ACM SIGKDD Explor. Newsl. 19(1), 22–36 (2017)
Article Google Scholar
Stamatatos, E., Daelemans, W., Verhoeven, B., Juola, P., López-López, A., Potthast, M., Stein, B.: Overview of the author identification task at PAN 2015. In: Working Notes of CLEF 2015 - Conference and Labs of the Evaluation forum, Toulouse, France, 8–11 September 2015
Google Scholar
Stamatatos, E., Daelemans, W., Verhoeven, B., Stein, B., Potthast, M., Juola, P., Sánchez-Pérez, M.A., Barrón-Cedeño, A.: Overview of the author identification task at PAN 2014. In: Working Notes for CLEF 2014 Conference, Sheffield, UK, 15–18 September 2014, pp. 877–897 (2014)
Google Scholar
Stamatatos, E., Potthast, M., Rangel, F., Rosso, P., Stein, B.: Overview of the PAN/CLEF 2015 evaluation lab. In: Mothe, J., Savoy, J., Kamps, J., Pinel-Sauvagnat, K., Jones, G.J.F., SanJuan, E., Cappellato, L., Ferro, N. (eds.) CLEF 2015. LNCS, vol. 9283, pp. 518–538. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24027-5_49
Chapter Google Scholar
Stein, B., Lipka, N., Zu Eissen, S.M.: Meta analysis within authorship verification. In: 19th International Workshop on Database and Expert Systems Applications (DEXA 2008), 1–5 September 2008, Turin, Italy, pp. 34–39. IEEE Computer Society (2008)
Google Scholar
Tax, D.M.J.: One-class classification: concept learning in the absence of counter-examples. Ph.D. thesis (2001)
Google Scholar

Download references

Acknowledgments

This work was supported by the German Federal Ministry of Education and Research (BMBF) under the project “DORIAN” (Scrutinise and thwart disinformation). We would like to thank Christian Winter and Felix Mayer for their valuable reviews that helped to improve the quality of this paper.

Author information

Authors and Affiliations

Fraunhofer Institute for Secure Information Technology, Rheinstraße 75, 64295, Darmstadt, Germany
Oren Halvani, Lukas Graner & Inna Vogel

Authors

Oren Halvani
View author publications
You can also search for this author in PubMed Google Scholar
Lukas Graner
View author publications
You can also search for this author in PubMed Google Scholar
Inna Vogel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Oren Halvani .

Editor information

Editors and Affiliations

Department of Informatics, Systems, and Communication, University of Milano-Bicocca, Milan, Italy
Gabriella Pasi
LIP6 – UPMC/CNRS, University Pierre et Marie Curie, Paris, France
Benjamin Piwowarski
University of Glasgow, Glasgow, United Kingdom
Leif Azzopardi
Technical University of Vienna, Vienna, Austria
Allan Hanbury

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Halvani, O., Graner, L., Vogel, I. (2018). Authorship Verification in the Absence of Explicit Features and Thresholds. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds) Advances in Information Retrieval. ECIR 2018. Lecture Notes in Computer Science(), vol 10772. Springer, Cham. https://doi.org/10.1007/978-3-319-76941-7_34

Download citation

DOI: https://doi.org/10.1007/978-3-319-76941-7_34
Published: 01 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-76940-0
Online ISBN: 978-3-319-76941-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Authorship Verification in the Absence of Explicit Features and Thresholds