Abstract
Authorship verification is one of the most challenging tasks in style-based text categorization. Given a set of documents, all by the same author, and another document of unknown authorship the question is whether or not the latter is also by that author. Recently, in the framework of the PAN-2013 evaluation lab, a competition in authorship verification was organized and the vast majority of submitted approaches, including the best performing models, followed the instance-based paradigm where each text sample by one author is treated separately. In this paper, we show that the profile-based paradigm (where all samples by one author are treated cumulatively) can be very effective surpassing the performance of PAN-2013 winners without using any information from external sources. The proposed approach is fully-trainable and we demonstrate an appropriate tuning of parameter settings for PAN-2013 corpora achieving accurate answers especially when the cost of false negatives is high.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.C., Zhai, C.X.: A Survey of Text Classification Algorithms. In: Aggarwal, C.C., Zhai, C.X. (eds.) Mining Text Data, pp. 163–222. Springer (2012)
van Dam, M.: A Basic Character n-gram Approach to Authorship Verification – Notebook for PAN at CLEF 2013. In: Forner et al (eds.) [3] (2013)
Forner, P., Navigli, R., Tufis, D. (eds.): CLEF 2013 Evaluation Labs and Workshop –Working Notes Papers (2013)
Ghaeini, M.R.: Intrinsic Author Identification Using Modified Weighted KNN – Notebook for PAN at CLEF 2013. In: Forner et al (eds.) [3] (2013)
Halvani, O., Steinebach, M., Zimmermann, R.: Authorship Verification via k-Nearest Neighbor Estimation – Notebook for PAN at CLEF 2013. In: Forner et al (eds.) [3] (2013)
Holmes, D.I.: Authorship attribution. Computers and the Humanities 28, 87–106 (1994)
Jankowska, M., Kešelj, V., Milios, E.: Proximity based One-class Classification with Common n-Gram Dissimilarity for Authorship Verification Task – Notebook for PAN at CLEF 2013. In: Forner et al (eds.) [3] (2013)
Juola, P.: Authorship Attribution. Foundations and Trends in IR 1, 234–334 (2008)
Juola, P., Stamatatos, E.: Overview of the Author Identification Taskat PAN 2013. In Forner et al (eds.) [3] (2013)
Keselj, V., Peng, F., Cercone, N., Thomas, C.: N-gram-based Author Profiles for Authorship Attribution. In: Proc. of the Pacific Association for Computational Linguistics, pp. 255–264 (2003)
Koppel, M., Schler, J., Bonchek-Dokow, E.: Measuring Differentiability: Unmasking Pseudonymous Authors. Journal of Machine Learning Research 8, 1261–1276 (2007)
Koppel, M., Schler, J., Argamon, S.: Authorship Attribution in the Wild. Language Resources and Evaluation 45, 83–94 (2011)
Koppel, M., Schler, J., Argamon, S., Winter, Y.: The “Fundamental Problem” of Authorship Attribution. English Studies 93(3), 284–291 (2012)
Koppel, M., Winter, Y.: Determining if Two Documents are by the Same Author. Journal of the American Society for Information Science and Technology 65(1), 178–187 (2014)
Layton, R., Watters, P., Dazeley, R.: Local n-grams for Author Identification – Notebook for PAN at CLEF 2013. In: Forner et al (eds.) [3] (2013)
Sanderson, C., Guenter, S.: Short Text Authorship Attribution via Sequence Kernels, Markov Chains and Author Unmasking: An Investigation. In: Proc. of the International Conference on Empirical Methods in Natural Language Engineering, pp. 482–491 (2006)
Seidman, S.: Authorship Verification Using the Impostors Method – Notebook for PAN at CLEF 2013. In: Forner et al (ed.) [3] (2013)
Stamatatos, E., Fakotakis, N., Kokkinakis, G.: Automatic Text Categorization in Terms of Genre and Author. Computational Linguistics 26(4), 471–495 (2000)
Stamatatos. E.: Author Identification Using Imbalanced and Limited Training Texts. In: Proc. of the 4th International Workshop on Text-based Information Retrieval (2007)
Stamatatos, E.: A Survey of Modern Authorship Attribution Methods. Journal of the American Society for Information Science and Technology 60, 538–556 (2009)
Stamatatos, E.: Intrinsic Plagiarism Detection Using Character n-gram Profiles. In: Proc. of the 3rd Int. Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (2009)
Veenman, C.J., Li, Z.: Authorship Verification with Compression Features – Notebook for PAN at CLEF 2013. In: Forner et al (eds.) [3] (2013)
Vilariño, D., Pinto, D., Gómez, H., León, S., Castillo, E.: Lexical-Syntactic and Graph-Based Features for Authorship Verification – Notebook for PAN at CLEF 2013. In: Forner et al (eds.) [3] (2013)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Potha, N., Stamatatos, E. (2014). A Profile-Based Method for Authorship Verification. In: Likas, A., Blekas, K., Kalles, D. (eds) Artificial Intelligence: Methods and Applications. SETN 2014. Lecture Notes in Computer Science(), vol 8445. Springer, Cham. https://doi.org/10.1007/978-3-319-07064-3_25
Download citation
DOI: https://doi.org/10.1007/978-3-319-07064-3_25
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07063-6
Online ISBN: 978-3-319-07064-3
eBook Packages: Computer ScienceComputer Science (R0)