Abstract
Stylometry analysis of Slavic-language texts is less explored challenging issue in direction of computational study. The aim of the paper is to develop and verify stylometric methods in a task of authorship, age, and gender of author recognition for literary texts in Ukrainian that could give a usable accuracy. Were prepared common stylistic features using the self-designed corpus. Different feature selection and classification methods were analyzed. Also, the objective of this examination is to analyze several stylometric variables to test its statistical importance with \(\chi ^2\) selection.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
https://inspired.com.ua/stream/10-facts-ukrainian-language (2015)
Bobyk, I., Lytvyn, V., Pukach, P., Uhryn, D., Vysotska, V.: Development of a method for the recognition of author’s style in the Ukrainian language texts based on linguometry, stylemetry and glottochronology. Eastern Eur. J. Enterp. Technol. 4(2), 88 (2017)
Boga, M., et al.: Evaluation of a stylometry system on various length portions of books. In: Proceedings of Student-Faculty Research Day (2012)
Buk, S.: Quantitative comparison of texts (on the material of the 1884 and 1907 editions of the novel “Boa Constrictor” by ivan franko). Ukrainian Literary Studies, pp. 179–192 (2012)
Can, M.: Authorship attribution using principal component analysis and competitive neural networks. Math. Comput. Appl. 19(1), 21–36 (2014)
Eder, M., Piasecki, M., Walkowiak, T.: Open stylometric system based on multilevel text analysis. Cogn. Stud.| Études Cognitives, 17 (2017)
Eder, M., Rybicki, J., Kestemont, M.: Stylometry with R: a package for computational text analysis. R J. 8(1), 107–121 (2016)
Evert, S., Annidis, F., Pielström, S., Schöch, C.: Towards a better understanding of burrows’s delta in literary authorship attribution. In: Proceedings of NAACL-HLT Fourth Workshop on Computational Linguistics for Literature, pp. 79–88 (2015)
Hastie, T.J., Tibshirani, R.J., Friedman, J.H.: The elements of statistical learning: data mining, inference, and prediction. Springer series in statistics, Springer, New York (2009), autres impressions : 2011 (corr.), 2013 (7e corr.)
Koppel, M., Schler, J., Argamon, S.: Computational methods in authorship attribution. J. Am. Soc. Inform. Sci. Technol. 60(1), 9–26 (2009)
Liu H., Setiono R.: Chi2: feature selection and discretization of numeric attributes. In: Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence, Herndon, VA, USA, pp. 388–391 (1995)
Mohd Nawi, N., Atomi, W., Gillani, R., Muhammad, S.: The effect of data pre-processing on optimized training of artificial neural networks, vol. 11, June 2013
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Aleksandra, S.: Zasady latynizacji jçzyka ukraińskiego. http://ksng.gugik.gov.pl/pliki/latynizacja/ukrainski.pdf (2018)
Straka, M., Straková, J.: Tokenizing, pos tagging, lemmatizing and parsing UD 2.0 with UDpipe. In: Proceedings of the CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies, pp. 88–99. Association for Computational Linguistics, Vancouver, Canada (2017)
Zeng, X., Martinez, T.R.: Distribution-balanced stratified cross-validation for accuracy estimation. J. Exp. Theoret. Artif. Intell. 12(1), 1–12 (2000). https://doi.org/10.1080/095281300146272
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Mazurko, A., Walkowiak, T. (2020). Computer Based Stylometric Analysis of Texts in Ukrainian Language. In: Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M. (eds) Artificial Intelligence and Soft Computing. ICAISC 2020. Lecture Notes in Computer Science(), vol 12416. Springer, Cham. https://doi.org/10.1007/978-3-030-61534-5_20
Download citation
DOI: https://doi.org/10.1007/978-3-030-61534-5_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61533-8
Online ISBN: 978-3-030-61534-5
eBook Packages: Computer ScienceComputer Science (R0)