Skip to main content
Log in

Constructions at argument-structure level in the SenSem Corpora

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

In this paper we present the annotation scheme of constructions at the argument-structure level in the Spanish and Catalan Corpora SenSem. Constructions are accounted for as form-meaning pairs following the theoretical underpinning of Construction Grammar. Regarding meaning, we propose a hierarchy of constructions taking into account, at the highest level, the prominence of the logical subject in the sentence. Thus, we differentiate between topicalized and detopicalized sentences, which is an innovative proposal to solve some terminological issues related to pronominal constructions in Spanish. We further develop this classification taking into account the semantic relation of the logical subject with the verb and its coindexation, if any, with other participants. As regards form, the basic features we consider are syntagmatic categories and syntactic functions. Furthermore, we annotate the form the verb requires, that is, if it requires a pronoun in order to convey a particular meaning. Other relevant contributions are the annotation of some linguistic phenomena not taken into account in other similar resources, such as reciprocal, dative or impersonal constructions. Finally, we present the frequencies of all these constructions in Spanish.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2

Similar content being viewed by others

Notes

  1. These corpora are the result of the work carried out by the research team members for the last 10 years in five different projects (2004–2014). ‘Standardization and Transference of Lexical and Textual Resources - Ministerio de Ciencia e Innovación - FFI2011-27774’ is the current project.

  2. http://grial.uab.es/sensem/corpus.

  3. http://grial.uab.es/sensem/subcats.

  4. http://grial.uab.es/sensem/lexico.

  5. http://grial.uab.es/descarregues.

  6. In “Appendix 3” we present the translation into Catalan of the Spanish examples in Sect. 2.

  7. http://verbs.colorado.edu/~mpalmer/projects/ace.html.

  8. https://framenet.icsi.berkeley.edu/fndrupal/.

  9. The number of words of FrameNet Spanish corpus is not available.

  10. http://arts-ccr-002.bham.ac.uk/ccr/patgram/.

  11. https://framenet2.icsi.berkeley.edu/frameSQL/cxn/CxNeng/cxn00/21colorTag/index.html.

  12. The Catalan corpus was built by translating the Spanish journalistic subcorpus. In this field, the use of translation, usually from Spanish news into Catalan, is a habitual methodology in newspapers published in both languages due to the similarity between both languages.

  13. In “Appendix 2” we provide the list of books used in the Spanish literary corpus.

  14. The annotation at clause level is restricted to the subject and complements/adjuncts of the main verb. If one participant happens to be expressed by another clause, this latter is not internally annotated. In a sentence such as “Zapatero afirmó que apoyará el proyecto de reforma que salga del Parlament.” (Zapatero said he would support the reform approved by the Parliament), if we are tagging the sentence as an example of afirmó ‘say’, we would not further annotate the internal structure of the clause whose main verb is apoyará ‘will approve’. However, if we were annotating sentences that exemplify the use of the verb apoyará, ‘support’ we would not annotate any constituents of the former verb.

  15. We present just the semantic tags used in “Appendix 1”. Subcategorization patterns are not presented but can be consulted online.

  16. We have opted for including a more literal translation in those cases where there is a structural mismatch.

  17. In the Spanish sentence verb agreement corresponds to a 3rd person plural so we have added the pronoun they as subject in the literal translation. Nevertheless this subject would not refer to a specific individual or group of individuals in the real world. For this reason it is not considered a personal but an impersonal clause in Spanish. In English we have opted for using the Spanish object as subject and adding the verb need + ing in order to keep the impersonality.

  18. We use the parenthesis in order to indicate that the subject is not expressed in Spanish, as in (2b). The difference between (1b) and (2b) is that only the first one is impersonal.

  19. The underlined constituents in (2) correspond to the grammatical subject.

  20. This pronoun (se) is empty and non-lexical, since it is neither coindexed with a participant nor is it part of the verb.

  21. Traditional grammar in Spanish defines verbal periphrases as a group of verbs (a verb sequence), in which the auxiliary expresses an aspectual or modal meaning (1c). At this level of annotation in our project we use the term periphrastic form in a wider sense, including the so-called syntactic or periphrastic passives (3), since in these constructions two verbs are also used to convey the information (the verb ser ‘be’ and the lexical verb).

  22. Some works on discourse (Givon 1981) consider the term Topic as a synonym of Theme and a topicalized construction defines a construction in which the logical subject occupies the position of syntactic subject, that is, any unmarked prototypical statement regardless the order of other constituents within the clause. In these constructions the logical subject is the Theme or Topic and the verb and its complement the Rheme. Also, and again following Givon (1981), we use the term detopicalization to refer to those constructions in which the logical subject is either not expressed, because it is generalized, or else occupies a non-subject function (for example in passive construction in which the logical subject is the agent complement expressed by a PP).

  23. Note that the adjunct refunfuñando ‘grumblingly’ has not been annotated with a semantic label (a translation is provided in note 16).

  24. Cited in De Miguel and Fernández Lagunilla (2000).

  25. Both participants subject and object refer to the same world entity. In the case of the reflexive this reference is simple. In the case of the prototypical reciprocal constructions, given the complex nature of a reciprocal event, we have a crossed coindexation, i.e. the subject of event 1 occupies the spot of the object in event 2 and vice versa.

  26. For specific information about reciprocity in Spanish (see Fernández-Montraveta and Vázquez, submitted).

  27. This construction is typical of Spanish and Catalan and is not related to the construction known as Dative in English.

  28. Active sentences show a similar representation in BDS-ADESSE: 82.01 %.

  29. In the frequency of all the constructions is presented.

  30. In BDS-ADESSE reflexivity also has very small representability, 1.64 %, yet higher than in SenSem.

  31. Syntactic passive represents 1.40 % in BDS-ADESSE.

  32. In pattern 6 the figures are almost the same in both languages.

  33. It agrees in number and person with the verb in this kind of construction.

  34. In BDS-ADESSE impersonality (0.73 %) is lower than in SenSem.

  35. By default all the other verbal lexical items (those that are neither pronominal nor periphrastic) are [lexical verb forms].

  36. Some of these thematic roles have been further subclassified specially Theme, which is a rather vague semantic category, but here we only present the more general ones. The meaning of each role has been presented in Sect. 4.2, except for Perceiver, which refers to the participant who perceives physical actions, such as to see, to smell… Most of the roles used in the project are very similar to those used in VerbNet and the equivalences between them and also with other proposals (PropBank or Lirics) are available on the project website.

References

  • Alonso, L., Capilla, J. A., Castellón, I., Fernández, A., Vázquez, G. (2007). The sensem project: Syntactico-semantic annotation of sentences in Spanish. In N. Nikolov, K. Bontcheva, G. Angelova & R. Mitkov (Eds.), Recent advances in natural language processing IV. Selected papers from RANLP 2005. Current Issues in Linguistic Theory 292 (pp. 89–98). John Benjamins Publishing Co.

  • Aparicio, J., Taulé, M., & Martí, M. A. (2008). Ancora: A lexical resource for the semantic annotation of corpora. In Proceedings of 6th international conference on language resources and evaluation, pp. 797–802. Marrakesh.

  • Cifuentes, J. L. (1999). Sintaxis y semántica del movimiento. Aspectos de gramática cognitiva. Instituto de Cultura Juan Gil-Albert, Alicante.

  • Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational & Psychological Measure, 20, 37–46.

    Article  Google Scholar 

  • Comrie, B. (1976). Aspect. Cambridge: Cambridge University Press.

    Google Scholar 

  • De Miguel, E., & Fernández Lagunilla, M. (2000). El operador aspectual ‘se’. Revista Española de Lingüística, 30(1), 13–43.

    Google Scholar 

  • Fernández-Montraveta, A., & Vázquez, G. (2014). The SenSem Corpus: An annotated corpus for Spanish and Catalan with information about aspectuality, modality, polarity and factuality. Corpus Linguistics and Linguistic Theory, 10(2), 273–288.

  • Fernández-Montraveta, A., & Vázquez, G. (submitted). The event structure of reciprocal verbs and its implications for bidirectionality.

  • Fillmore, C. J. (1976). Frame semantics and the nature of language. In Annals of the New York Academy of Sciences: Conference on the origin and development of language and speech, vol. 280, pp. 20–32.

  • Fillmore, C. J., Lee-Goldman, R., & Rhodes, R. (2012). The FrameNet construction. In H. C. Boas & I. Sag (Eds.), Sign-based construction grammar (pp. 283–299). Stanford CA: CSLI.

    Google Scholar 

  • García de Miguel, J. M., & Comesaña, S. (2004). Verbs of cognition in Spanish: Constructional schemas and reference points. In A. Silva, A. Torres, & M. Gonçalves (Eds.), Linguagem, Cultura e Cogniçao: Estudos de Linguística Cognitiva (pp. 399–420). Coimbra: Almedina.

    Google Scholar 

  • Givon, T. (1981). Typology and functional domains. Studies in Language, 5(2), 163–193.

    Article  Google Scholar 

  • Goldberg, A. E. (1995). Constructions: A construction grammar approach to argument structure. Chicago: University of Chicago Press.

    Google Scholar 

  • Gràcia, L. (1989). Els verbs ergatius en català. Maó, Menorca: Institut Menorquí d’Estudis.

    Google Scholar 

  • Hale, K., & Keyser, S. J. (1986). Some transitivity alternations in English. Lexicon project working papers, 7. Center for Cognitive Science, Cambridge, Massachussetts

  • Keyser, S. J., & Roeper, T. (1984). On the middle and ergative constructions in English. Linguistic Inquiry, 15, 381–416.

    Google Scholar 

  • Kingsbur, P., Palmer, M., & Marcus, M. (2002). Adding semantic annotation to the Penn TreeBank. In: Proceedings of the human language technology conference. San Diego, California.

  • Levin, B. (1993). English verb classes and alternations. A preliminary investigation. Chicago, Londres: University of Chicago Press.

    Google Scholar 

  • Lyngfelt, B., Borin, L., Forsberg, M., Prentice, J., Rydstedt, R., Sköldberg, E., & Tingsell, S. (2012). Adding a construction to the Swedish resource network of Språkbanken. In Proceedings of KONVENS 2012 (LexSem 2012 workshop), pp. 452–461.

  • Palmer, M., Kingsbury, P., & Gildea, G. (2005). The proposition bank: An annotated corpus of semantic roles. Computational Linguistics, 31(1), 71–106.

    Article  Google Scholar 

  • Rojo, G. (2001). La explotación de la Base de Datos Sintácticos del español actual. In J. De Kock (Ed.), Lingüística con corpus (pp. 255–286). Salamanca: Universidad de Salamanca.

    Google Scholar 

  • Ruppenhofer, J., Ellsworth, M., Petruck, M. R. L., Johnson, C. R. & Scheffczyk, J. (2010). FrameNet II: Extended theory and practice. https://FrameNet2.icsi.berkeley.edu/docs/r1.5/book.pdf. Accessed February 2013.

  • Sanz, M. (1995). Telic clitics in Spanish. Rochester: University of Rochester.

    Google Scholar 

  • Subirats-Rüggeberg, C., & Petruck, M. R. L. (2003). Surprise: Spanish FrameNet! In Proceedings of the international congress of linguists (workshop on Frame Semantics), Praga. http://www.icsi.berkeley.edu/pubs/ai/subirats-petruck.pdf. Accessed February 2013.

  • Taulé, M., Martí, M. A., & Recasens, M. (2008). Ancora: Multilevel annotated corpora for Catalan and Spanish. In Proceedings of 6th international conference on language resources and evaluation, pp. 96–101.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ana Fernández-Montraveta.

Appendices

Appendix 1: Typology of constructions

  1. (a)

    Form

    1. 1.

      Identification of a syntactic subject:

      • Positive: Non-impersonal construction

      • Negative: Impersonal construction

    2. 2.

      Identification of the verbal form:Footnote 35

      • Pronominal verb form

      • Periphrastic verb form

  2. (b)

    Meaning

    1. (I)

      Identification of the logical subject (semantic function) and the syntactic subject (syntactic function):

      • Coincidence of logical subject and syntactic subject: Topicalized construction

      • Non-coincidence of logical subject and syntactic subject: Detopicalized construction

    2. (II)

      Type of semantic role (de)topicalized (according to the information provided in the lexicon):Footnote 36

      • Agent (de)topicalized construction

      • Cause (de)topicalized construction

      • Experiencer (de)topicalized construction

      • Perceiver (de)topicalized construction

      • Theme (de)topicalized construction

      • Initiator (de)topicalized construction

  3. (III)

    Specific constructions:

    1. (a)

      Aspectuality:

      • Middle construction

      • Habitual construction

      • Telic construction

    2. (b)

      Coindexation of arguments:

      • Reflexive construction

      • Reciprocal construction

    3. (c)

      Specific adjuncts:

      • Dative construction

Appendix 2: Sources

  1. (A)

    Journalistic:

    1. a.

      El Periódico de Catalunya

    2. b.

      La Vanguardia

  2. (B)

    Literary:

    See Table 8.

    Table 8 Main works that comprise the literary subcorpus in SenSem

Appendix 3: Translation of the examples into Catalan

Figures 1 and 2—(literary register)

  1. a.

    Estem d’acord que arreglin la zona perquè fa vergonya com està.

  2. b.

    En bici es pot arribar a tots els llocs.

  3. c.

    S’ha de controlar el partit.

  1. (1)
    1. a.

      El Govern basc va elaborar un pla per eradicar el problema.

    2. b.

      Va contactar amb ell i li va oferir un lloc a casa seva.

    3. c.

      – (literary register)

  2. (2)

    La víctima va ser tocada en un braç, l’abdomen, el glutis i una cama.

  3. (3)

    – (literary register)

  4. (4)

    La residència té 78 places en habitacions habitualment ocupades per investigadors estrangers.

  5. (5)

    En arribar a la porta de l’habitació de la senyora, al terra al costat de la porta, va trobar una nota.

  6. (6)

    S’ha anunciat el propòsit d’entaular un ampli diàleg polític amb els grups parlamentaris.

  7. (7)

    Amb un major nombre de glòbuls vermells es facilita l’oxigenació i es guanya en resistència i recuperació.

  8. (8)

    I què es pretén amb l’acte de demà?

  9. (9)

    No només s’hi perden diners, també es perd el temps.

  10. (10)

    Bab Boujloud és una porta monumental que s’obre sobre una explanada immensa de l’època almoràvit.

  11. (11)

    Als que ens agrada la muntanya, sovint ens apropem a les zones del Pirineu, sovint ens apropem a les zones del Pirineu per conèixer racons que encara no ha destruït la pressió urbanística.

  12. (12)

    Això el porta, mentre es menja una taronja, a fer una reflexió sobre els impostos, la gasolina, els polítics.

  13. (13)

    – (literary register)

  14. (14)

    Armstrong i Crow es van conèixer la tardor passada a Las Vegas, en una festa organitzada pel tennista Andre Agassi.

  15. (15)

    el carnet jove fa anys que no me’l renoven.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vázquez, G., Fernández-Montraveta, A. Constructions at argument-structure level in the SenSem Corpora. Lang Resources & Evaluation 49, 637–658 (2015). https://doi.org/10.1007/s10579-015-9309-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-015-9309-4

Keywords

Navigation