Skip to main content
Log in

Analysis of the Influence of Mixed-Level Stylometric Characteristics on the Verification of Authors of Literary Works

  • Published:
Automatic Control and Computer Sciences Aims and scope Submit manuscript

Abstract

This article analyses the influence of various combinations of mixed-level stylometric characteristics on the quality of verification of the authorship of Russian, English and French prose texts. The study is carried out both for low-level stylometric characteristics based on words and characters, and for higher-level structure ones. All stylometric characteristics are calculated automatically using the ProseRhythmDetector program. This approach provides the analyses of works of a large volume and many writers at the same time. In the course of the work, character-level, word-level, and structure-level stylometric vectors are associated with each text. During the experiments, the sets of parameters of these three levels were combined with each other in all possible ways. The resulting vectors of stylometric characteristics were submitted to the input of various classifiers to perform verification and identify the most suitable classifier for solving the problem. The best results were obtained using the AdaBoost classifier. The average F-measure for all languages was over 92%. Detailed verification quality assessments are given for each author and analyzed. The use of high-level stylometric characteristics, in particular, the frequency of using N-grams of POS tags, opens the prospect of a more detailed analysis of author’s styles. The results of the experiments show that when combining the characteristics of the structure level with the characteristics of the word level and/or character level, the most accurate results of authorship verification for literary texts in Russian, English, and French are obtained. Additionally, the authors concluded that stylometric characteristics have different degrees of influence on the quality of authorship verification for different languages.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

REFERENCES

  1. Tuchkova, N.P. and Ataeva, O.M., Approaches to knowledge extraction in scientific subject domains, Inf. Mat. Tekhnol. Nauke Upr., 2020, no. 2, pp. 5–18.  https://doi.org/10.38028/ESI.2020.18.2.001

  2. Altamimi, A., Clarke, N., Furnell, S., and Li, F., Multi-platform authorship verification, CECC 2019: Proc. Third Central European Cybersecurity Conf., Munich, 2019, New York: Association for Computing Machinery, 2019, p. 13.  https://doi.org/10.1145/3360664.3360677

  3. Halvani, O., Graner, L., and Regev, R., TAVeer: An interpretable topic-agnostic authorship verification method, ARES ’20: Proc. 15th Int. Conf. on Availability, Reliability and Security, Ireland, 2020, New York: Association for Computing Machinery, 2020, p. 41.  https://doi.org/10.1145/3407023.3409194

  4. Kestemont, M., Martens, G., and Ries, T., A computational approach to authorship verification of Johann Wolfgang Goethe’s contributions to the Frankfurter gelehrte Anzeigen (1772–73), J. Eur. Periodical Stud., 2019, vol. 4, no. 1, pp. 115–143.  https://doi.org/10.21825/jeps.v4i1.10188

    Article  Google Scholar 

  5. Corbara, S., Moreo, A., Sebastiani, F., and Tavoni, M., The Epistle to Cangrande through the lens of computational authorship verification, New Trends in Image Analysis and Processing—ICIAP 2019, Cristani, M., Prati, A., Lanz, O., Messelodi, S., and Sebe, N., Eds., Lecture Notes in Computer Science, vol. 11808, Cham: Springer, 2019, pp. 148–158.  https://doi.org/10.1007/978-3-030-30754-7_15

    Book  Google Scholar 

  6. Drozdov, V.A., The authorship of the poem Ushshaq-Nama from the prospect of academic orientalist studies and modern computer technologies, Orientalistika, 2020, vol. 3, no. 5, pp. 1360–1378.  https://doi.org/10.31696/2618-7043-2020-3-5-1360-1378

    Article  Google Scholar 

  7. Kestemont, M., Manjavacas, E., Markov, I., Bevendor, J., Wiegmann, M., Stamatatos, E., Potthast, M., and Stein, B., Overview of the cross-domain authorship verification task at pan 2020, CEUR Workshop Proc., 2020, vol. 2696, p. 264.

    Google Scholar 

  8. Potha, N. and Stamatatos, E., Intrinsic author verification using topic modeling, SETN ’18: Proc. 10th Hellenic Conf. on Artificial Intelligence, Patras, Greece, 2018, New York: Association for Computing Machinery, 2018, p. 20.  https://doi.org/10.1145/3200947.3201013

  9. Adamovic, S., Miskovic, V., Milosavljevic, M., Sarac, M., and Veinovic, M., Automated language-independent authorship verification (for Indo-European languages), J. Assoc. Inf. Sci. Technol., 2019, vol. 70, no. 8, pp. 858–871.  https://doi.org/10.1002/asi.24163

    Article  Google Scholar 

  10. Boenningho, B., Hessler, S., Kolossa, D., and Nickel, R.M., Explainable authorship verification in social media via attention-based similarity learning, IEEE Int. Conf. on Big Data (Big Data), Los Angeles, 2019, IEEE, 2019, pp. 36–45.  https://doi.org/10.1109/BigData47090.2019.9005650

  11. Benzebouchi, N.E., Azizi, N., Aldwairi, M., and Farah, N., Multi-classifier system for authorship verification task using word embeddings, 2nd Int. Conf. on Natural Language and Speech Processing (ICNLSP), Algiers, Algeria, 2018, IEEE, 2018, pp. 1–6.  https://doi.org/10.1109/ICNLSP.2018.8374391

  12. Li, J.S., Chen, L.-C., Monaco, J.V., Singh, P., and Tappert, C.C., A comparison of classifiers and features for authorship authentication of social networking messages, Concurrency Comput.: Pract. Exper., 2017, vol. 29, no. 14, e3918.  https://doi.org/10.1002/cpe.3918

    Article  Google Scholar 

  13. Tuccinardi, E., An application of a profile-based method for authorship verification: Investigating the authenticity of Pliny the Younger’s letter to Trajan concerning the Christians, Digital Scholarship Humanit., 2017, vol. 32, no. 2, pp. 435–447.  https://doi.org/10.1093/llc/fqw001

    Article  Google Scholar 

  14. Reddy, P.B., Mohan, T.M., Raja, P.V.K., and Reddy, T.R., A novel approach for authorship verification, Data Engineering and Communication Technology, Raju, K., Senkerik, R., Lanka, S., and Rajagopal, V., Eds., Advances in Intelligent Systems and Computing, Singapore: Springer, 2020, pp. 441–448.  https://doi.org/10.1007/978-981-15-1097-7_37

  15. Castillo, E., Cervantes, O., and Vilarino, D., Authorship verification using a graph knowledge discovery approach, J. Intell. Fuzzy Syst., 2019, vol. 36, no. 6, pp. 6075–6087.  https://doi.org/10.3233/JIFS-181934

    Article  Google Scholar 

  16. Ahmed, H., The role of linguistic feature categories in authorship verification, Procedia Comput. Sci., 2018, vol. 142, pp. 214–221.  https://doi.org/10.1016/j.procs.2018.10.478

    Article  Google Scholar 

  17. Al-Khatib, M.A. and Al-qaoud, J.K., Authorship verification of opinion articles in online newspapers using the idiolect of author: A comparative study, Inf., Commun. Soc., 2020, vol. 24, no. 11, pp. 1603–1621.  https://doi.org/10.1080/1369118X.2020.1716039

    Article  Google Scholar 

  18. Lagutina, K., Lagutina, N., Boychuk, E., Vorontsova, I., Shliakhtina, E., Belyaeva, O., and Paramonov, I., A survey on stylometric text features, 25th Conf. of Open Innovations Association (FRUCT), Helsinki, 2019, IEEE, 2019, pp. 184–195.  https://doi.org/10.23919/FRUCT48121.2019.8981504

  19. Polin, Y., Zudilova, T., Ananchenko, I., and Voytiuk, T., Decision trees in classification problems: Application features and methods for improving the quality of classification, Sovrem. Naukoemkie Tekhnol., 2020, no. 9, pp. 59–63.  https://doi.org/10.17513/snt.38215

  20. Xu, B., Guo, X., Ye, Y., and Cheng, J., An improved random forest classifier for text categorization, J. Comput., 2012, vol. 7, no. 12, pp. 2913–2920.  https://doi.org/10.4304/jcp.7.12.2913-2920

    Article  Google Scholar 

  21. Kim, S.-B., Han, K.-S., Rim, H.-C., and Myaeng, S.H., Some effective techniques for naive Bayes text classification, IEEE Trans. Knowl. Data Eng., 2006, vol. 18, no. 11, pp. 1457–1466.  https://doi.org/10.1109/TKDE.2006.180

    Article  Google Scholar 

  22. Lagutina, K., Poletaev, A., Lagutina, N., Boychuk, E., and Paramonov, I., Automatic extraction of rhythm figures and analysis of their dynamics in prose of 19th–21st centuries, 26th Conf. of Open Innovations Association (FRUCT), Yaroslavl, 2020, IEEE, 2020, pp. 247–255.  https://doi.org/10.23919/FRUCT48808.2020.9087430

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to A. M. Manakhova or N. S. Lagutina.

Ethics declarations

The authors declare that they have no conflicts of interest.

Additional information

Translated by A. Kolemesin

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Manakhova, A.M., Lagutina, N.S. Analysis of the Influence of Mixed-Level Stylometric Characteristics on the Verification of Authors of Literary Works. Aut. Control Comp. Sci. 56, 744–761 (2022). https://doi.org/10.3103/S0146411622070148

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S0146411622070148

Keywords:

Navigation