Abstract
Rhythm analysis of written texts focuses on literary analysis and it mainly considers poetry. In this paper we investigate the relevance of rhythmic features for categorizing texts in prosaic form pertaining to different genres. Our contribution is threefold. First, we define a set of rhythmic features for written texts. Second, we extract these features from three corpora, of speeches, essays, and newspaper articles. Third, we perform feature selection by means of statistical analyses, and determine a subset of features which efficiently discriminates between the three genres. We find that using as little as eight rhythmic features, documents can be adequately assigned to a given genre with an accuracy of around 80 %, significantly higher than the 33 % baseline which results from random assignment.
This is a preview of subscription content, log in via an institution.
References
Lefebvre, H.: Rhythmanalysis: Space. Time and Everyday Life. Continuum, London (2004)
Fürnkranz, J.: A study using n-gram features for text categorization. Austrian Research Institute for Artificial Intelligence, Wien (1998)
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: 14th International Conference on Machine Learning (ICML 1997), pp. 412–420. Morgan Kaufmann Publishers Inc., San Francisco (1997)
Chomsky, N., Halle, M.: The Sound Pattern of English. Harper & Row, New York (1968)
Liberman, M., Prince, A.: On stress and linguistic rhythm. Linguist. Inq. 8(2), 249–336 (1977)
Boychuk, E., Paramonov, I., Kozhemyakin, N., Kasatkina, N.: Automated approach for rhythm analysis of french literary texts. In: 15th Conference of Open Innovations Association FRUCT, pp. 15–23. IEEE, St. Petersburg (2014)
Jackendoff, R., Lerdahl, F.: A grammatical parallel between music and language. In: Clynes, M. (ed.) Music, Mind, and Brain, pp. 83–117. Springer, Heidelberg (1982)
Barbosa, P., Bailly, G.: Characterisation of rhythmic patterns for text-to-speech synthesis. Speech Commun. 15(1–2), 127–137 (1994)
Beeferman, D.: The rhythm of lexical stress in prose. In: 34th Annual Meeting of the Association for Computational Linguistics (ACL). ACL, Santa Cruz (1996)
Galves, A., Galves, C., Garcia, J., Garcia, N., Leonardi, F.: Context tree selection and linguistic rhythm retrieval from written texts. Ann. Appl. Stat. 6(1), 186–209 (2012)
Buhlmann, P., Wyner, A.J.: Variable length Markov chains. Ann. Stat. 27(2), 480–513 (1999)
Patel, A.D., Daniele, J.R.: An empirical comparison of rhythm in language and music. Cognition 87(1), B35–B45 (2003)
Grabe, E., Low, E.L.: Durational variability in speech and the rhythm class hypothesis. In: Gussenhoven, C., Warner, N. (eds.) Papers in Laboratory Phonology, pp. 515–546. Mouton de Gruyter, Berlin (2002)
London, J., Jones, K.: Rhythmic refinements to the nPVI measure: a reanalysis of Patel & Daniele (2003a). Music Percept. Interdisc. J. 29(1), 115–120 (2011)
Carlson, L., Marcu, D., Okurowski, M.E.: Building a discourse-tagged corpus in the framework of Rhetorical Structure Theory. In: 2nd SIGdial Workshop on Discourse and Dialogue (SIGDIAL 2001), vol. 16, pp. 1–10. Association for Computational Linguistics, Stroudsburg (2001)
Balint, M., Trausan-Matu, S.: A critical comparison of rhythm In music and natural language. Ann. Acad. Rom. Scientists Ser. Sci. Technol. Inf. 9(1), 43–60 (2016)
Stevens, J.P.: Applied Multivariate Statistics for the Social Sciences. Lawrence Erblaum, Mahwah (2002)
Garson, G.D.: Multivariate GLM, MANOVA, and MANCOVA. Statistical Associates Publishing, Asheboro (2015)
Klecka, W.R.: Discriminant Analysis. Quantitative Applications in the Social Sciences Series, vol. 19. Sage Publications, Thousand Oaks (1980)
Acknowledgements
The work presented in this paper was partially funded by the EC H2020 project RAGE (Realising and Applied Gaming Eco-System) http://www.rageproject.eu/ Grant agreement No 644187.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Balint, M., Dascalu, M., Trausan-Matu, S. (2016). Classifying Written Texts Through Rhythmic Features. In: Dichev, C., Agre, G. (eds) Artificial Intelligence: Methodology, Systems, and Applications. AIMSA 2016. Lecture Notes in Computer Science(), vol 9883. Springer, Cham. https://doi.org/10.1007/978-3-319-44748-3_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-44748-3_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-44747-6
Online ISBN: 978-3-319-44748-3
eBook Packages: Computer ScienceComputer Science (R0)