Authorship Attribution of Polish Newspaper Articles

Kuta, Marcin; Puto, Bartłomiej; Kitowski, Jacek

doi:10.1007/978-3-319-39384-1_41

Marcin Kuta¹⁹,
Bartłomiej Puto¹⁹ &
Jacek Kitowski¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9693))

Included in the following conference series:

International Conference on Artificial Intelligence and Soft Computing

1253 Accesses
3 Citations

Abstract

This paper examines the machine learning approach to authorship attribution of articles in the Polish language. The focus is on the effect of the data volume, number of authors and thematic homogeneity on authorship attribution quality. We study the impact of feature selection under various feature selection criteria, mainly chi square and information gain measures, as well as the effect of combining features of different types. Results are reported for the Rzeczpospolita corpus in terms of the $F_1$ measure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Using Alliteration in Authorship Attribution of Historical Texts

Open Set Authorship Attribution Toward Demystifying Victorian Periodicals

Authorship Attribution of Brazilian Literary Texts Through Machine Learning Techniques

References

Dershowitz, I., Koppel, M., Akiva, N., Dershowitz, N.: Computerized source criticism of biblical texts. J. Biblical Lit. 134(2), 253–271 (2015)
Google Scholar
Koppel, M., Schler, J., Argamon, S.: Computational methods in authorship attribution. J. Am. Soc. Inform. Sci. Technol. 60(1), 9–26 (2009)
Article Google Scholar
Koppel, M., Schler, J., Argamon, S., Messeri, E.: Authorship attribution with thousands of candidate authors. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2006, pp. 659–660 (2006)
Google Scholar
Luyckx, K., Daelemans, W.: Authorship attribution and verification with many authors and limited data. In: Proceedings of the Twenty-Second International Conference on Computational Linguistics (COLING 2008), Manchester, UK, pp. 513–520 (2008)
Google Scholar
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
MATH Google Scholar
Dasarasthy, B.: Nearest Neighbor Pattern Classification Techniques. IEEE Computer Society Press, Los Alamitos (1991)
Google Scholar
Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)
Google Scholar
McCallum, A., Nigam, K.: A comparison of event models for Naïve Bayes text classification. In: Learning for Text Categorization: Papers from the 1998 AAAI Workshop, pp. 41–48 (1998)
Google Scholar
Rumelhart, D., Hinton, G., Williams, R.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)
Article Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MathSciNet MATH Google Scholar
Bridle, J.: Training stochastic model recognition algorithms as networks can lead to maximum mutual information estimation of parameters. In: Touretzky, D. (ed.) Advances in Neural Information Processing Systems, pp. 211–217. Morgan Kaufman (1990)
Google Scholar
van Rijsbergen, C.J.: Information Retrieval. Butterworth, Newton (1979)
MATH Google Scholar
Luyckx, K., Daelemans, W.: The effect of author set size and data size in authorship attribution. Literary Linguist. Comput. 26(1), 35–55 (2011)
Article Google Scholar
Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)
MATH Google Scholar

Download references

Acknowledgments

This research is supported by AGH - University of Science and Technology (AGH-UST) Grant no. 11.11.230.124 (statutory project).

Author information

Authors and Affiliations

Department of Computer Science, Faculty of Computer Science, Electronics and Telecommunications, AGH University of Science and Technology, Al. Mickiewicza 30, 30-059, Krakow, Poland
Marcin Kuta, Bartłomiej Puto & Jacek Kitowski

Authors

Marcin Kuta
View author publications
You can also search for this author in PubMed Google Scholar
Bartłomiej Puto
View author publications
You can also search for this author in PubMed Google Scholar
Jacek Kitowski
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marcin Kuta .

Editor information

Editors and Affiliations

Częstochowa University of Technology, Czestochowa, Poland
Leszek Rutkowski
Częstochowa University of Technology, Czestochowa, Poland
Marcin Korytkowski
Częstochowa University of Technology, Czestochowa, Poland
Rafał Scherer
AGH University of Science and Technology, Krakow, Poland
Ryszard Tadeusiewicz
University of California, Berkeley, California, USA
Lotfi A. Zadeh
University of Louisville, Louisville, Kentucky, USA
Jacek M. Zurada

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kuta, M., Puto, B., Kitowski, J. (2016). Authorship Attribution of Polish Newspaper Articles. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2016. Lecture Notes in Computer Science(), vol 9693. Springer, Cham. https://doi.org/10.1007/978-3-319-39384-1_41

Download citation

DOI: https://doi.org/10.1007/978-3-319-39384-1_41
Published: 29 May 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-39383-4
Online ISBN: 978-3-319-39384-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics