Skip to main content
Log in

Text Classification by Genres Based on Rhythmic Characteristics

  • Published:
Automatic Control and Computer Sciences Aims and scope Submit manuscript

Abstract

This article considers the rhythm of texts of various genres: fiction novels, advertising, scientific articles, reviews, tweets and political articles. The authors identify such lexical and grammatical means in the texts as anaphora, epiphora, diacope, aposiopesis, etc., which are markers of the rhythm of a text. On their basis, statistical characteristics are calculated that describe quantitatively and structurally these rhythmic means. The resulting text model is visualized for statistical analysis using boxplots and heatmaps, which shows differences in the rhythm of various genres. The boxplots shows that almost all genres differ from each other in terms of the overall density of rhythmic characteristics. The heatmaps shows the different rhythm structure of the genres. Further, rhythmic characteristics were successfully used to classify texts by six genres. The high quality of the classification shows that rhythmic characteristics are a good marker for most genres, especially for fiction. The experiments are carried out using the ProseRhythmDetector software for Russian and English. Text corpora contain 300 texts for each language.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1.
Fig. 2.

REFERENCES

  1. Worsham, J. and Kalita, J., Genre identification and the compositional effect of genre in literature, Proc. 27th Int. Conf. on Computational Linguistics, Santa Fe, N.M., Association for Computational Linguistics, 2018, pp. 1963–1973.

  2. Melissourgou, M.N. and Frantzi, K.T., Genre identification based on SFL principles: The representation of text types and genres in English language teaching material, Corpus Pragmatics, 2017, vol. 1, no. 4, pp. 373–392.  https://doi.org/10.1007/s41701-017-0013-z

    Article  Google Scholar 

  3. Kochetova, L.A. and Popov, V.V., Research of axiological dominants in press release genre based on automatic extraction of key words from corpus, Nauchn. Dialog, 2019, no. 6, pp. 32–49.  https://doi.org/10.24224/2227-1295-2019-6-32-49

  4. Murphy, S.E., Shakespeare and his contemporaries: Designing a genre classification scheme for Early English Books Online 1560–1640, ICAME J., 2019, pp. 59–82.  https://doi.org/10.2478/icame-2019-0003

    Book  Google Scholar 

  5. Malhotra, R. and Sharma, A., Quantitative evaluation of web metrics for automatic genre classification of web pages, Int. J. Syst. Assurance Eng. Manage., 2017, vol. 8, no. 2, pp. 1567–1579.  https://doi.org/10.1007/s13198-017-0629-1

    Article  Google Scholar 

  6. Dejica, D., Understanding technical and scientific translation: A genre-based approach, Sci. Bull. Politeh. Univ. Timisoara. Trans. Mod. Language, 2020, vol. 19, no. 1, pp. 56–66.

  7. Thakur, V. and Patel, A.C., An improved dictionary based genre classification based on title and abstract of e-book using machine learning algorithms, Proceedings of Second International Conference on Computing, Communications, and Cyber-Security, Singh, P.K., Wierzchoń, S.T., Tanwar, S., Ganzha, M., and Rodrigues, J.J.P.C., Eds., Singapore: Springer, 2021, pp. 323–337.  https://doi.org/10.1007/978-981-16-0733-2_23

  8. Cimino, A., Wieling, M., Dell’Orletta, F., Montemagni, S., and Venturi, G., Identifying predictive features for textual genre classification: the key role of syntax, Proc. Fourth Italian Conf. on Computational Linguistics CLiC-it 2017, 2017, pp. 107–112.

  9. Lagutina, K., Poletaev, A., Lagutina, N., Boychuk, E., and Paramonov, I., Automatic extraction of rhythm figures and analysis of their dynamics in prose of 19th–21st centuries, 26th Conf. of Open Innovations Association (FRUCT), Yaroslavl, 2020, IEEE, 2020, pp. 247–255.  https://doi.org/10.23919/FRUCT48808.2020.9087430

  10. Lagutina, K., Lagutina, N., Boychuk, E., Larionov, V., and Paramonov, I., Authorship verification of literary texts with rhythm features, 28th Conf. of Open Innovations Association (FRUCT), Moscow, 2021, IEEE, 2021, pp. 240–251.  https://doi.org/10.23919/FRUCT50888.2021.9347649

  11. Onan, A., An ensemble scheme based on language function analysis and feature engineering for text genre classification, J. Inf. Sci., 2018, vol. 44, no. 1, pp. 28–47.  https://doi.org/10.1177/0165551516677911

    Article  Google Scholar 

  12. El-Halees, A.M., Arabic text genre classification, J. Eng. Res. Technol., 2017, vol. 4, no. 3, pp. 105–109.

    Google Scholar 

  13. Batraeva, I.A., Nartsev, A.D., and Lezgyan, A.S., Using the analysis of semantic proximity of words in solving the problem of determining the genre of texts within deep learning, Vestn. Tomskogo Gos. Univ. Upr. Vychislit. Tekh. Inf., 2020, no. 50, pp. 14–22.

  14. Barakhnin, V.B., Kozhemyakina, O.Yu., Rychkova, E.V., Pastushkov, I.S., and Borzilova, Yu.S., The extraction of lexical and metrorhytmic features which are characteristic for the genre and the style and for their combinations within the process of automated processing of texts in Russian, Sovrem. Inf. Tekhnol. IT-Obraz., 2018, vol. 14, no. 4, pp. 888–895.  https://doi.org/10.25559/SITITO.14.201804.888-895

    Article  Google Scholar 

  15. Mitrofanova, O.A. and Moskvina, A.D., On the role of prepositional statistics for genre identification of Russian texts, Int. J. Open Inf. Technol., 2020, vol. 8, no. 11, pp. 91–96.  https://doi.org/10.25559/INJOIT.2307-8162.08.202011.91-96

    Article  Google Scholar 

  16. Gorbich, L.G. and Zhivoderov, A.A., Using statistical indexes to distinguish between scientific and popular science texts on the example of the works of A. E. Fersman, Program. Prod. Sist., 2020, vol. 33, no. 4, pp. 720–725.  https://doi.org/10.15827/0236-235X.132.720-725

    Article  Google Scholar 

  17. Dubovik, A.R., Automatic text style identification in terms of statistical parameters, Komp’yut. Lingvist. Vychislit. Ontolog., 2017, no. 1, pp. 29–45.

  18. Antonova, A.Y., Klyshinskij, E.S., and Yagunova, E.V., Determination of style and genre characteristics of text collections based on part-of-speech co-occurrence, Otkrytye Sist., 2011, vol. 3, pp. 80–85.

    Google Scholar 

  19. Sokolova, M. and Lapalme, G., A systematic analysis of performance measures for classification tasks, Inf. Process. Manage., 2009, vol. 45, no. 4, pp. 427–437.  https://doi.org/10.1016/j.ipm.2009.03.002

    Article  Google Scholar 

  20. Kozlova, L., Sravnitel’naya tipologiya anglijskogo i russkogo yazykov (Comparative Typology of English and Russian Languages), Barnaul: Altaiskii Gos. Pedag. Univ., 2019.

  21. Wierzbicka, A., The Semantics of Grammar, John Benjamins Publ., 1988.

    Book  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to K. V. Lagutina, N. S. Lagutina or E. I. Boychuk.

Ethics declarations

The authors declare that they have no conflicts of interest.

Additional information

Translated by A. Kolemesin

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lagutina, K.V., Lagutina, N.S. & Boychuk, E.I. Text Classification by Genres Based on Rhythmic Characteristics. Aut. Control Comp. Sci. 56, 735–743 (2022). https://doi.org/10.3103/S0146411622070136

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3103/S0146411622070136

Keywords:

Navigation