Abstract
Enhancing information retrieval systems with the ability to take the writing style of people into account opens the door for a number of applications. For example, one can link articles by authorships that can help identifying authors who generate hoaxes and deliberate misinformation in news stories, distributed across different platforms. Authorship verification (AV) is a technique that can be used for this purpose. AV deals with the task to judge, whether two or more documents stem from the same author. The majority of existing AV approaches relies on machine learning concepts based on explicitly defined stylistic features and complex models that involve a fair amount of parameters. Moreover, many existing AV methods are based on explicit thresholds (needed to accept or reject a stated authorship), which are determined on training corpora. We propose a novel parameter-free AV approach, which derives its thresholds for each verification case individually and enables AV in the absence of explicit features and training corpora. In an experimental setup based on eight evaluation corpora (each one from another language) we show that our approach yields competitive results against the current state of the art and other noteworthy AV baselines.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
One-Class Compression Authorship Verifier.
- 2.
PAN is a series of scientific events and shared tasks on digital text forensics [28].
- 3.
Homotopy-based Classification is used in face recognition, where the goal is to measure the contribution of known faces in the generation of an unknown face [6].
- 4.
Available under http://pan.webis.de.
- 5.
Area Under the ROC-Curve.
- 6.
Prediction by Partial Matching. Note that d refers to the order of the PPM model.
- 7.
We use Hathcock’s library (https://github.com/adamhathcock/sharpcompress).
- 8.
“Large Text Compression Benchmark” (http://mattmahoney.net/dc/text.html).
- 9.
Note that this measure is not a metric, as all conditions a metric must satisfy (identity, symmetry and triangle inequality) are not met.
- 10.
All corpora are available upon request under http://bit.do/ECIR_2018.
- 11.
Evidence for this can be seen by comparing the number of published papers across different bibliographic databases such as Google Scholar or Microsoft Academic.
- 12.
We used Van Asch’s script (http://www.clips.uantwerpen.be/scripts/art).
References
Bagnall, D.: Author identification using multi-headed recurrent neural networks. In: Working Notes of CLEF 2015 - Conference and Labs of the Evaluation Forum, Toulouse, France, 8–11 September 2015
Castillo, E., Cervantes, O., No, D.V., Báez, D.: Author verification using a graph-based representation. Int. J. Comput. Appl. 123(14), 1–8 (2015)
Forner, P., Navigli, R., Tufis, D., Ferro, N. (eds.): Working notes for CLEF 2013 Conference, Valencia, Spain, 23–26 September 2013, CEUR Workshop Proceedings, vol. 1179 (2014). CEUR-WS.org
Halvani, O.: Enron Authorship Verification Corpus, Mendeley Data, v1 (2017)
Halvani, O., Winter, C., Graner, L.: On the usefulness of compression models for authorship verification. In: Proceedings of the 12th International Conference on Availability, Reliability and Security, ARES 2017, pp. 54:1–54:10 (2017)
Hernández, J.G.G., Casillas, J., Ledesma, P., Pineda, G.F., Ruíz, I.V.M.: Homotopy based classification for author verification task: notebook for PAN at CLEF 2015. In: Working Notes of CLEF 2015 - Conference and Labs of the Evaluation Forum, Toulouse, France, 8–11 September 2015
Hürlimann, M., Weck, B., von den Berg, E., Šuster, S., Nissim, M.: GLAD: groningen lightweight authorship detection. In: Working Notes of CLEF 2015 - Conference and Labs of the Evaluation Forum, Toulouse, France, 8–11 September 2015
Jankowska, M., Keselj, V., Milios, E.E.: Proximity based one-class classification with common N-gram dissimilarity for authorship verification task notebook for PAN at CLEF 2013. In: Forner et al. [3]
Noecker Jr., J., Ryan, M.: Distractorless authorship verification. In: Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey, May 2012
Juola, P., Stamatatos, E.: Overview of the author identification task at PAN 2013. In: Forner et al. [3]
Kestemont, M., Stover, J.A., Koppel, M., Karsdorp, F., Daelemans, W.: Authenticating the writings of Julius Caesar. Expert Syst. Appl. 63, 86–96 (2016)
Khonji, M., Iraqi, Y.: A slightly-modified GI-based author-verifier with lots of features (ASGALF). In: Working Notes for CLEF 2014 Conference, Sheffield, UK, 15–18 September 2014, pp. 977–983 (2014)
Kocher, M., Savoy, J.: A simple and efficient algorithm for authorship verification. J. Assoc. Inf. Sci. Technol. 68(1), 259–269 (2017). https://doi.org/10.1002/asi.23648
Koppel, M., Schler, J.: Authorship verification as a one-class classification problem. In: Brodley, C.E. (ed.) Machine Learning, Proceedings of the Twenty-First International Conference (ICML 2004), vol. 69, Banff, Alberta, Canada, 4–8 July 2004. ACM (2004)
Koppel, M., Schler, J., Argamon, S.: Authorship attribution in the wild. Lang. Res. Eval. 45(1), 83–94 (2011)
Koppel, M., Winter, Y.: Determining if two documents are written by the same author. JASIST 65(1), 178–187 (2014)
Moreau, E., Jayapal, A., Lynch, G., Vogel, C.: Author verification: basic stacked generalization applied to predictions from a set of heterogeneous learners-notebook for PAN at CLEF 2015. In: Cappellato, L., Ferro, N., Jones, G., San Juan, E. (eds.) CLEF 2015 Evaluation Labs and Workshop - Working Notes Papers, 8–11 September 2015, Toulouse, France (2015). CEUR-WS.org
Nagaprasad, S., Reddy, V., Babu, A.: Authorship attribution based on data compression for telugu text. Int. J. Comput. Appl. 110(1), 1–5 (2015)
Potha, N., Stamatatos, E.: A profile-based method for authorship verification. In: Likas, A., Blekas, K., Kalles, D. (eds.) SETN 2014. LNCS (LNAI), vol. 8445, pp. 313–326. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07064-3_25
Potha, N., Stamatatos, E.: An improved Impostors method for authorship verification. In: Jones, G.J.F., Lawless, S., Gonzalo, J., Kelly, L., Goeuriot, L., Mandl, T., Cappellato, L., Ferro, N. (eds.) CLEF 2017. LNCS, vol. 10456, pp. 138–144. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-65813-1_14
Potthast, M., Kiesel, J., Reinartz, K., Bevendorff, J., Stein, B.: A stylometric inquiry into hyperpartisan and fake news. ArXiv e-prints, February 2017
Rexha, A., Kröll, M., Ziak, H., Kern, R.: Extending scientific literature search by including the author’s writing style. In: Mayr, P., Frommholz, I., Cabanac, G. (eds.) Proceedings of the Fifth Workshop on Bibliometric-Enhanced Information Retrieval (BIR) Co-located with the 39th European Conference on Information Retrieval (ECIR 2017), Aberdeen, UK, 9th April 2017, CEUR Workshop Proceedings, vol. 1823, pp. 93–100 (2017). CEUR-WS.org
Sculley, D., Brodley, C.E.: Compression and machine learning: a new perspective on feature space vectors. In: DCC, pp. 332–332. IEEE Computer Society
Seidman, S.: Authorship verification using the impostors method notebook for PAN at CLEF 2013. In: Forner et al. [3]
Shu, K., Sliva, A., Wang, S., Tang, J., Liu, H.: Fake news detection on social media: a data mining perspective. ACM SIGKDD Explor. Newsl. 19(1), 22–36 (2017)
Stamatatos, E., Daelemans, W., Verhoeven, B., Juola, P., López-López, A., Potthast, M., Stein, B.: Overview of the author identification task at PAN 2015. In: Working Notes of CLEF 2015 - Conference and Labs of the Evaluation forum, Toulouse, France, 8–11 September 2015
Stamatatos, E., Daelemans, W., Verhoeven, B., Stein, B., Potthast, M., Juola, P., Sánchez-Pérez, M.A., Barrón-Cedeño, A.: Overview of the author identification task at PAN 2014. In: Working Notes for CLEF 2014 Conference, Sheffield, UK, 15–18 September 2014, pp. 877–897 (2014)
Stamatatos, E., Potthast, M., Rangel, F., Rosso, P., Stein, B.: Overview of the PAN/CLEF 2015 evaluation lab. In: Mothe, J., Savoy, J., Kamps, J., Pinel-Sauvagnat, K., Jones, G.J.F., SanJuan, E., Cappellato, L., Ferro, N. (eds.) CLEF 2015. LNCS, vol. 9283, pp. 518–538. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24027-5_49
Stein, B., Lipka, N., Zu Eissen, S.M.: Meta analysis within authorship verification. In: 19th International Workshop on Database and Expert Systems Applications (DEXA 2008), 1–5 September 2008, Turin, Italy, pp. 34–39. IEEE Computer Society (2008)
Tax, D.M.J.: One-class classification: concept learning in the absence of counter-examples. Ph.D. thesis (2001)
Acknowledgments
This work was supported by the German Federal Ministry of Education and Research (BMBF) under the project “DORIAN” (Scrutinise and thwart disinformation). We would like to thank Christian Winter and Felix Mayer for their valuable reviews that helped to improve the quality of this paper.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Halvani, O., Graner, L., Vogel, I. (2018). Authorship Verification in the Absence of Explicit Features and Thresholds. In: Pasi, G., Piwowarski, B., Azzopardi, L., Hanbury, A. (eds) Advances in Information Retrieval. ECIR 2018. Lecture Notes in Computer Science(), vol 10772. Springer, Cham. https://doi.org/10.1007/978-3-319-76941-7_34
Download citation
DOI: https://doi.org/10.1007/978-3-319-76941-7_34
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-76940-0
Online ISBN: 978-3-319-76941-7
eBook Packages: Computer ScienceComputer Science (R0)