An Experiment with Theme–Rheme Identification

Pala, Karel; Svoboda, Ondřej

doi:10.1007/978-3-319-10816-2_34

Karel Pala²¹ &
Ondřej Svoboda²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8655))

Included in the following conference series:

International Conference on Text, Speech, and Dialogue

1533 Accesses

Abstract

In this paper we start from the theory of Functional Sentence Perspective developed primarily by Firbas [1], Svoboda [12] and also later by Sgall et al. [9].

We make an attempt to formulate and implement a procedure for Czech allowing to automatically recognize which sentence constituents carry information that is contextually dependent and thus known to an addressee (theme), constituents containing new information (rheme), and also constituents bearing non-thematic and non-rhematic information (transition).

The experimental implementation of the procedure uses tools developed in NLP Centre, FI MU, particularly the morphological analyzer Majka [17], disambiguator DESAMB [16] and parser SET [5].

As a starting data resource we use a small corpus of 120 Czech sentences, which at the moment does not include a free continuous text. This is motivated by the fact that we do not use syntactically pre-tagged text but perform syntactic analysis directly using the parser SET. Thus, we offer only a very basic evaluation, which captures the main FSP phenomena and shows that the task is feasible.

The toolset developed for the experiment consists of two parts: first, a chunker, which determines word-order positions from the parse tree of a sentence, second, an FSP tagger which is the implementation of the procedure. It labels the chunks with the tags of what is further called functional elements (e.g. theme proper, transition, rheme proper). An experimental version is available at http://nlp.fi.muni.cz/~xsvobo15/fsp/fsp.html .

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Firbas, J.: On the problem of non-thematic subjects in contemporary English (English summary of “k otázce nezákladových podmětů v současné angličtině”, ib. pp. 22–42 and 165–173). Časopis pro moderní filologii 39, 171–173 (1957)
Google Scholar
Firbas, J.: Functional sentence perspective in written and spoken communication. Cambridge University Press (1992) (reprinted 1995)
Google Scholar
Hajičová, E., Sgall, P., Skoumalová, H.: An automatic procedure for topic-focus identification. Journal of Computational Linguistics 21(1), 81–94 (1995)
Google Scholar
Karlík, P., Svoboda, A.: Skladba češtiny pro cizince (Czech Syntax for Foreigners). Univerzita J.E. Purkyně, Faculty of Arts, Brno (1982)
Google Scholar
Kovář, V., Horák, A., Jakubíček, M.: Syntactic analysis using finite patterns: A new parsing system for Czech. In: Human Language Technology: Challenges for Computer Science and Linguistics, pp. 161–171 (2011)
Google Scholar
Mathesius, V.: O tak zvaném aktuálním členění větném (on the so-called functional sentence perspective). Slovo a Slovesnost 5, 171–174 (1939)
Google Scholar
Mikulová, M., Bémová, A., Hajič, J., Hajičová, E., Havelka, J., Kolářová-řezníčková, V., Kučová, L., Lopatková, M., Pajas, P., Panevová, J., Razímová, M., Sgall, P., Štěpánek, J., Urešová, Z., Veselá, K., Žabokrtský, Z.: Annotation on the tectogrammatical layer in the Prague Dependency Treebank. Tech. rep., ÚFAL MFF UK, Prague, Czech Republic (2005), http://ufal.mff.cuni.cz/pdt2.0/doc/manuals/en/t-layer/html/index.html
Pala, K., Svoboda, O.: Semi-automatic theme-rheme identification. In: Proceedings of the Raslan Workshop, pp. 39–48. Karlova Studánka (2013)
Google Scholar
Sgall, P.: Towards a definition of focus and topic. Prague Bulletin of Mathematical Linguistics 31, 32, 3–25, 24–32 (1979, 1980)
Google Scholar
Steinberger, R., Bennett, P.: Automatic recognition of theme, focus and contrastive stress. In: Proceedings of the Conference Focus and NLP (1994)
Google Scholar
Svoboda, A.: České slovosledné pozice z pohledu aktuálního členění. Slovo a slovesnost 45, 22–34, 88–103 (1984), http://kramerius.lib.cas.cz/search/i.jsp?pid=uuid:c9de3a32-530d-11e1-1418-001143e3f55c
Svoboda, A.: Kapitoly z funkční syntaxe. In: Spisy pedagogické fakulty v Ostravě. vol. 66 (1989)
Google Scholar
Veselá, K., Havelka, J.: Anotování aktuálního členění věty v pražském závislostním korpusu, ÚFAL/CKL TR-2003-20 (2003), http://ufal.mff.cuni.cz/pdt2.0/publications/VeselaHavelkaTR2003.pdf
Zikánová, Š., Týnovský, M.: Identification of topic and focus in czech: Comparative evaluation on prague dependency treebank. In: Studies in Formal Slavic Phonology, Morphology, Syntax, Semantics and Information Structure (Formal Description of Slavic Languages 7, pp. 343–353. Peter Lang, Frankfurt am Main (2009)
Google Scholar
Zikánová, Š., Týnovský, M., Havelka, J.: Identification of topic and focus in czech: Evaluation of manual parallel annotations. The Prague Bulletin of Mathematical Linguistics (87), 61–70 (2007)
Google Scholar
Šmerk, P.: Unsupervised learning of rules for morphological disambiguation. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2004. LNCS (LNAI), vol. 3206, pp. 211–216. Springer, Heidelberg (2004)
Chapter Google Scholar
Šmerk, P.: Majka – fast morphological analyzer. In: Proceedings of the Raslan Workshop, pp. 13–16. Masarykova Univerzita, Brno (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Natural Language Processing Centre, Faculty of Informatics, Faculty of Arts, Masaryk University, Botanická 68a, 602 00, Brno, Czech Republic
Karel Pala & Ondřej Svoboda

Authors

Karel Pala
View author publications
You can also search for this author in PubMed Google Scholar
Ondřej Svoboda
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Informatics, Masaryk University, Botanicá 6a, 60200, Brno, Czech Republic
Petr Sojka
Faculty of Informatics, Department of Information Technologies, Masaryk University, 602 00, Brno, Czech Republic
Aleš Horák , Ivan Kopeček & Karel Pala , &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pala, K., Svoboda, O. (2014). An Experiment with Theme–Rheme Identification. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2014. Lecture Notes in Computer Science(), vol 8655. Springer, Cham. https://doi.org/10.1007/978-3-319-10816-2_34

Download citation

DOI: https://doi.org/10.1007/978-3-319-10816-2_34
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10815-5
Online ISBN: 978-3-319-10816-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics