Skip to main content

Authorship Attribution of Polish Newspaper Articles

  • Conference paper
  • First Online:
Artificial Intelligence and Soft Computing (ICAISC 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9693))

Included in the following conference series:

Abstract

This paper examines the machine learning approach to authorship attribution of articles in the Polish language. The focus is on the effect of the data volume, number of authors and thematic homogeneity on authorship attribution quality. We study the impact of feature selection under various feature selection criteria, mainly chi square and information gain measures, as well as the effect of combining features of different types. Results are reported for the Rzeczpospolita corpus in terms of the \(F_1\) measure.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Dershowitz, I., Koppel, M., Akiva, N., Dershowitz, N.: Computerized source criticism of biblical texts. J. Biblical Lit. 134(2), 253–271 (2015)

    Google Scholar 

  2. Koppel, M., Schler, J., Argamon, S.: Computational methods in authorship attribution. J. Am. Soc. Inform. Sci. Technol. 60(1), 9–26 (2009)

    Article  Google Scholar 

  3. Koppel, M., Schler, J., Argamon, S., Messeri, E.: Authorship attribution with thousands of candidate authors. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2006, pp. 659–660 (2006)

    Google Scholar 

  4. Luyckx, K., Daelemans, W.: Authorship attribution and verification with many authors and limited data. In: Proceedings of the Twenty-Second International Conference on Computational Linguistics (COLING 2008), Manchester, UK, pp. 513–520 (2008)

    Google Scholar 

  5. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  6. Dasarasthy, B.: Nearest Neighbor Pattern Classification Techniques. IEEE Computer Society Press, Los Alamitos (1991)

    Google Scholar 

  7. Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)

    Google Scholar 

  8. McCallum, A., Nigam, K.: A comparison of event models for Naïve Bayes text classification. In: Learning for Text Categorization: Papers from the 1998 AAAI Workshop, pp. 41–48 (1998)

    Google Scholar 

  9. Rumelhart, D., Hinton, G., Williams, R.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)

    Article  Google Scholar 

  10. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  11. Bridle, J.: Training stochastic model recognition algorithms as networks can lead to maximum mutual information estimation of parameters. In: Touretzky, D. (ed.) Advances in Neural Information Processing Systems, pp. 211–217. Morgan Kaufman (1990)

    Google Scholar 

  12. van Rijsbergen, C.J.: Information Retrieval. Butterworth, Newton (1979)

    MATH  Google Scholar 

  13. Luyckx, K., Daelemans, W.: The effect of author set size and data size in authorship attribution. Literary Linguist. Comput. 26(1), 35–55 (2011)

    Article  Google Scholar 

  14. Forman, G.: An extensive empirical study of feature selection metrics for text classification. J. Mach. Learn. Res. 3, 1289–1305 (2003)

    MATH  Google Scholar 

Download references

Acknowledgments

This research is supported by AGH - University of Science and Technology (AGH-UST) Grant no. 11.11.230.124 (statutory project).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcin Kuta .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Kuta, M., Puto, B., Kitowski, J. (2016). Authorship Attribution of Polish Newspaper Articles. In: Rutkowski, L., Korytkowski, M., Scherer, R., Tadeusiewicz, R., Zadeh, L., Zurada, J. (eds) Artificial Intelligence and Soft Computing. ICAISC 2016. Lecture Notes in Computer Science(), vol 9693. Springer, Cham. https://doi.org/10.1007/978-3-319-39384-1_41

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-39384-1_41

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-39383-4

  • Online ISBN: 978-3-319-39384-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics