Abstract
In this paper we start from the theory of Functional Sentence Perspective developed primarily by Firbas [1], Svoboda [12] and also later by Sgall et al. [9].
We make an attempt to formulate and implement a procedure for Czech allowing to automatically recognize which sentence constituents carry information that is contextually dependent and thus known to an addressee (theme), constituents containing new information (rheme), and also constituents bearing non-thematic and non-rhematic information (transition).
The experimental implementation of the procedure uses tools developed in NLP Centre, FI MU, particularly the morphological analyzer Majka [17], disambiguator DESAMB [16] and parser SET [5].
As a starting data resource we use a small corpus of 120 Czech sentences, which at the moment does not include a free continuous text. This is motivated by the fact that we do not use syntactically pre-tagged text but perform syntactic analysis directly using the parser SET. Thus, we offer only a very basic evaluation, which captures the main FSP phenomena and shows that the task is feasible.
The toolset developed for the experiment consists of two parts: first, a chunker, which determines word-order positions from the parse tree of a sentence, second, an FSP tagger which is the implementation of the procedure. It labels the chunks with the tags of what is further called functional elements (e.g. theme proper, transition, rheme proper). An experimental version is available at http://nlp.fi.muni.cz/~xsvobo15/fsp/fsp.html .
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Firbas, J.: On the problem of non-thematic subjects in contemporary English (English summary of “k otázce nezákladových podmětů v současné angličtině”, ib. pp. 22–42 and 165–173). Časopis pro moderní filologii 39, 171–173 (1957)
Firbas, J.: Functional sentence perspective in written and spoken communication. Cambridge University Press (1992) (reprinted 1995)
Hajičová, E., Sgall, P., Skoumalová, H.: An automatic procedure for topic-focus identification. Journal of Computational Linguistics 21(1), 81–94 (1995)
Karlík, P., Svoboda, A.: Skladba češtiny pro cizince (Czech Syntax for Foreigners). Univerzita J.E. Purkyně, Faculty of Arts, Brno (1982)
Kovář, V., Horák, A., Jakubíček, M.: Syntactic analysis using finite patterns: A new parsing system for Czech. In: Human Language Technology: Challenges for Computer Science and Linguistics, pp. 161–171 (2011)
Mathesius, V.: O tak zvaném aktuálním členění větném (on the so-called functional sentence perspective). Slovo a Slovesnost 5, 171–174 (1939)
Mikulová, M., Bémová, A., Hajič, J., Hajičová, E., Havelka, J., Kolářová-řezníčková, V., Kučová, L., Lopatková, M., Pajas, P., Panevová, J., Razímová, M., Sgall, P., Štěpánek, J., Urešová, Z., Veselá, K., Žabokrtský, Z.: Annotation on the tectogrammatical layer in the Prague Dependency Treebank. Tech. rep., ÚFAL MFF UK, Prague, Czech Republic (2005), http://ufal.mff.cuni.cz/pdt2.0/doc/manuals/en/t-layer/html/index.html
Pala, K., Svoboda, O.: Semi-automatic theme-rheme identification. In: Proceedings of the Raslan Workshop, pp. 39–48. Karlova Studánka (2013)
Sgall, P.: Towards a definition of focus and topic. Prague Bulletin of Mathematical Linguistics 31, 32, 3–25, 24–32 (1979, 1980)
Steinberger, R., Bennett, P.: Automatic recognition of theme, focus and contrastive stress. In: Proceedings of the Conference Focus and NLP (1994)
Svoboda, A.: České slovosledné pozice z pohledu aktuálního členění. Slovo a slovesnost 45, 22–34, 88–103 (1984), http://kramerius.lib.cas.cz/search/i.jsp?pid=uuid:c9de3a32-530d-11e1-1418-001143e3f55c
Svoboda, A.: Kapitoly z funkční syntaxe. In: Spisy pedagogické fakulty v Ostravě. vol. 66 (1989)
Veselá, K., Havelka, J.: Anotování aktuálního členění věty v pražském závislostním korpusu, ÚFAL/CKL TR-2003-20 (2003), http://ufal.mff.cuni.cz/pdt2.0/publications/VeselaHavelkaTR2003.pdf
Zikánová, Š., Týnovský, M.: Identification of topic and focus in czech: Comparative evaluation on prague dependency treebank. In: Studies in Formal Slavic Phonology, Morphology, Syntax, Semantics and Information Structure (Formal Description of Slavic Languages 7, pp. 343–353. Peter Lang, Frankfurt am Main (2009)
Zikánová, Š., Týnovský, M., Havelka, J.: Identification of topic and focus in czech: Evaluation of manual parallel annotations. The Prague Bulletin of Mathematical Linguistics (87), 61–70 (2007)
Šmerk, P.: Unsupervised learning of rules for morphological disambiguation. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206, pp. 211–216. Springer, Heidelberg (2004)
Šmerk, P.: Majka – fast morphological analyzer. In: Proceedings of the Raslan Workshop, pp. 13–16. Masarykova Univerzita, Brno (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Pala, K., Svoboda, O. (2014). An Experiment with Theme–Rheme Identification. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2014. Lecture Notes in Computer Science(), vol 8655. Springer, Cham. https://doi.org/10.1007/978-3-319-10816-2_34
Download citation
DOI: https://doi.org/10.1007/978-3-319-10816-2_34
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10815-5
Online ISBN: 978-3-319-10816-2
eBook Packages: Computer ScienceComputer Science (R0)