Skip to main content

Classifying Written Texts Through Rhythmic Features

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9883))

Abstract

Rhythm analysis of written texts focuses on literary analysis and it mainly considers poetry. In this paper we investigate the relevance of rhythmic features for categorizing texts in prosaic form pertaining to different genres. Our contribution is threefold. First, we define a set of rhythmic features for written texts. Second, we extract these features from three corpora, of speeches, essays, and newspaper articles. Third, we perform feature selection by means of statistical analyses, and determine a subset of features which efficiently discriminates between the three genres. We find that using as little as eight rhythmic features, documents can be adequately assigned to a given genre with an accuracy of around 80 %, significantly higher than the 33 % baseline which results from random assignment.

This is a preview of subscription content, log in via an institution.

References

  1. Lefebvre, H.: Rhythmanalysis: Space. Time and Everyday Life. Continuum, London (2004)

    Google Scholar 

  2. Fürnkranz, J.: A study using n-gram features for text categorization. Austrian Research Institute for Artificial Intelligence, Wien (1998)

    Google Scholar 

  3. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: 14th International Conference on Machine Learning (ICML 1997), pp. 412–420. Morgan Kaufmann Publishers Inc., San Francisco (1997)

    Google Scholar 

  4. Chomsky, N., Halle, M.: The Sound Pattern of English. Harper & Row, New York (1968)

    Google Scholar 

  5. Liberman, M., Prince, A.: On stress and linguistic rhythm. Linguist. Inq. 8(2), 249–336 (1977)

    Google Scholar 

  6. Boychuk, E., Paramonov, I., Kozhemyakin, N., Kasatkina, N.: Automated approach for rhythm analysis of french literary texts. In: 15th Conference of Open Innovations Association FRUCT, pp. 15–23. IEEE, St. Petersburg (2014)

    Google Scholar 

  7. Jackendoff, R., Lerdahl, F.: A grammatical parallel between music and language. In: Clynes, M. (ed.) Music, Mind, and Brain, pp. 83–117. Springer, Heidelberg (1982)

    Chapter  Google Scholar 

  8. Barbosa, P., Bailly, G.: Characterisation of rhythmic patterns for text-to-speech synthesis. Speech Commun. 15(1–2), 127–137 (1994)

    Article  Google Scholar 

  9. Beeferman, D.: The rhythm of lexical stress in prose. In: 34th Annual Meeting of the Association for Computational Linguistics (ACL). ACL, Santa Cruz (1996)

    Google Scholar 

  10. Galves, A., Galves, C., Garcia, J., Garcia, N., Leonardi, F.: Context tree selection and linguistic rhythm retrieval from written texts. Ann. Appl. Stat. 6(1), 186–209 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  11. Buhlmann, P., Wyner, A.J.: Variable length Markov chains. Ann. Stat. 27(2), 480–513 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  12. Patel, A.D., Daniele, J.R.: An empirical comparison of rhythm in language and music. Cognition 87(1), B35–B45 (2003)

    Article  Google Scholar 

  13. Grabe, E., Low, E.L.: Durational variability in speech and the rhythm class hypothesis. In: Gussenhoven, C., Warner, N. (eds.) Papers in Laboratory Phonology, pp. 515–546. Mouton de Gruyter, Berlin (2002)

    Google Scholar 

  14. London, J., Jones, K.: Rhythmic refinements to the nPVI measure: a reanalysis of Patel & Daniele (2003a). Music Percept. Interdisc. J. 29(1), 115–120 (2011)

    Article  Google Scholar 

  15. Carlson, L., Marcu, D., Okurowski, M.E.: Building a discourse-tagged corpus in the framework of Rhetorical Structure Theory. In: 2nd SIGdial Workshop on Discourse and Dialogue (SIGDIAL 2001), vol. 16, pp. 1–10. Association for Computational Linguistics, Stroudsburg (2001)

    Google Scholar 

  16. Balint, M., Trausan-Matu, S.: A critical comparison of rhythm In music and natural language. Ann. Acad. Rom. Scientists Ser. Sci. Technol. Inf. 9(1), 43–60 (2016)

    Google Scholar 

  17. Stevens, J.P.: Applied Multivariate Statistics for the Social Sciences. Lawrence Erblaum, Mahwah (2002)

    MATH  Google Scholar 

  18. Garson, G.D.: Multivariate GLM, MANOVA, and MANCOVA. Statistical Associates Publishing, Asheboro (2015)

    Google Scholar 

  19. Klecka, W.R.: Discriminant Analysis. Quantitative Applications in the Social Sciences Series, vol. 19. Sage Publications, Thousand Oaks (1980)

    Book  Google Scholar 

Download references

Acknowledgements

The work presented in this paper was partially funded by the EC H2020 project RAGE (Realising and Applied Gaming Eco-System) http://www.rageproject.eu/ Grant agreement No 644187.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mihai Dascalu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Balint, M., Dascalu, M., Trausan-Matu, S. (2016). Classifying Written Texts Through Rhythmic Features. In: Dichev, C., Agre, G. (eds) Artificial Intelligence: Methodology, Systems, and Applications. AIMSA 2016. Lecture Notes in Computer Science(), vol 9883. Springer, Cham. https://doi.org/10.1007/978-3-319-44748-3_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-44748-3_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-44747-6

  • Online ISBN: 978-3-319-44748-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics