Abstract
Levin-style classes which capture the shared syntax and semantics of verbs have proven useful for many Natural Language Processing (NLP) tasks and applications. However, lexical resources which provide information about such classes are only available for a handful of worlds languages. Because manual development of such resources is extremely time consuming and cannot reliably capture domain variation in classification, methods for automatic induction of verb classes from texts have gained popularity. However, to date such methods have been applied to English and a handful of other, mainly resource-rich languages. In this paper, we apply the methods to Brazilian Portuguese - a language for which no VerbNet or automatic class induction work exists yet. Since Levin-style classification is said to have a strong cross-linguistic component, we use unsupervised clustering techniques similar to those developed for English without language-specific feature engineering. This yields interesting results which line up well with those obtained for other languages, demonstrating the cross-linguistic nature of this type of classification. However, we also discover and discuss issues which require specific consideration when aiming to optimise the performance of verb clustering for Brazilian Portuguese and other less-resourced languages.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Fellbaum, C.: WordNet: An electronic lexical database. MIT Press, Cambridge (1998)
Baker, C.F., Fillmore, C.J., Lowe, J.F.: The Berkeley Framenet Project. In: 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, University of Montréal, Canadá, pp. 86–90 (1998)
Palmer, M., Gildea, D., Kingsbury, P.: The Proposition Bank: A Corpus Annotated with Semantic Roles. Computational Linguistics 31(1), 71–106 (2005)
Kipper-Schuler, K.: Verbnet: A broad coverage, comprehensive verb lexicon. Doctor of philosophy, University of Pennsylvania (2005)
Levin, B.: English Verb Classes and Alternation, A Preliminary Investigation. The University of Chicago Press, Chicago (1993)
Crouch, D., King, T.H.: Unifying Lexical Resources. In: Interdisciplinary Workshop on the Identication and Representation of Verb Features and Verb, Saarbruecken, Germany, pp. 32–37 (2005)
Swier, R., Stevenson, S.: Unsupervised Semantic Role Labelling. In: EMNLP 2004, Barcelona, Spain, pp. 95–102 (2004)
Yi, S., Lopper, E., Palmer, M.: Can Semantic Roles Generalize Across Genres? In: NAACL HLT 2007, Rochester, NY, USA, pp. 548–555 (2007)
Shi, L., Mihalcea, R.: Putting Pieces Together: Combining Framenet, Verbnet and Wordnet for Robust Semantic Parsing. In: 6th International Conference on Computational Linguistics and Intelligent Text Processing, Mexico City, Mexico, pp. 99–110 (2005)
Girju, R., Roth, D., Sammons, M.: Token-level Disambiguation of Verbnet Classes. In: Interdisciplinary Workshop on the Identification and Representation of Verb Features and Verb Classes, Saarbruecken, Germany (2005)
Abend, O., Reichart, R., Rappoport, A.: A Supervised Algorithm for Verb Disambiguation into Verbnet Classes. In: LREC 2008, Manchester, UK, pp. 9–16 (2008)
Chen, L., Eugenio, B.D.: A Maximum Entropy Approach to Disambiguating Verbnet Classes. In: Proceedings of the 2nd Interdisciplinary Workshop on Verbs, The Identification and Representation of Verb Features, Pisa, Italy (2010)
Brown, S.W., Dligach, D., Palmer, M.: Verbnet Class Assignment as a WSD Task. In: IWCS 2011, Oxford, UK, pp. 85–94 (2011)
Jackendoff, R.: Semantic Structures. MIT Press, Cambridge (1990)
Taulé, M., Martí, M.A., Borrega, O.: Ancora-net: Mapping the spanish ancora-verb lexicon to verb-net. In: The Workshop on Verbs. The Identification and Representation of Verb Features, Pisa, Italy (2010)
Liu, M.C., Chiang, T.Y.: The construction of mandarim verbnet: A frame-based study of statement verbs. Language and Linguistics 9(2), 239–270 (2010)
Mousser, J.: Classifying arabic verbs using sibling classes. In: International Workshop on Computational Semantics, Oxford, UK (2011)
Kingsbury, P., Kipper-Schuler, K.: Deriving Verb-Meaning Clusters from Syntactic Strucutres. In: The Workshop on Text Meaning, in Conjunction with NAACL HLT 2003, Edmonton, Canad (2003)
Sun, L., Korhonen, A.: Hierarchical Verb Clustering Using Graph Factorization. In: EMNLP 2011, Edinburgh, UK, pp. 1023–1033 (2011)
Reichart, R., Korhonen, A.: Improved lexical acquisition through dpp-based verb clustering. In: ACL 2013, Sofia, Bulgaria (2013)
Korhonen, A., Krymolowski, Y., Collier, N.: The choice of features for classification of verbs in biomedical texts. In: COLING 2008, Manchester, UK (2008)
Guo, Y., Korhonen, A., Poibeau, T.: A weakly-supervised approach to argumentative zoning of scientific documents. In: EMNLP 2011, Edinburgh, UK (2011)
Shutova, E., Sun, L.: Unsupervised metaphor identification using hierarchical graph factorization clustering. In: NAACL 2013, Atlanta, USA (2013)
Ferrer, E.E.: Towards a semantic classification of spanish verbs based on subcategorisation information. In: The Workshop on Student Research, in Conjunction with ACL 2004, Barcelona, Spain, pp. 163–170 (2004)
Sun, L., Korhonen, A., Poibeau, T., Messiant, C.: Investigating the cross-linguistic potential of Verbnet-style classification. In: The 23rd International Conference on Computational Linguistics, Beijing, China, pp. 1056–1064 (2010)
Falk, I., Gardent, C., Lamirel, J.C.: Classifying french verbs using french and english lexical resources. In: ACL 2012, Jeju, Republic of Korea, pp. 854–863 (2012)
Sun, L., Korhonen, A., Krymolowski, Y.: Improving verb clustering with automatically acquired selectional preferences. In: EMNLP 2009, Singapore, pp. 638–647 (2009)
Merlo, P., Stevenson, S.: Automatic verb classification based on statistical distributions of argument structure. Computational Linguistics 27(3), 373–408 (2001)
Li, J., Brew, C.: Which Are the Best Features for Automatic Verb Classication? In: ACL 2008 (2008)
Joanis, E., Stevenson, S., James, D.: A General Feature Space for Automatic Verb Classication. Natural Language Engineering (2008)
Sun, L., McCarthy, D., Korhonen, A.: Diathesis alternation approximation for verb clustering. In: ACL 2013, Sofia, Bulgaria, pp. 736–741 (2013)
Sun, L., Korhonen, A., Krymolowski, Y.: Verb class discovery from rich syntactic data. In: The 9th International Conference on Computational Linguistics and Intelligent Text Processing, Haifa, Israel, pp. 16–27 (2008)
Schulte im Walde, S.: Experiments on the Automatic Induction of German Semantic Verb Classes. Computational Linguistics 32(2), 159–194 (2006)
Vázquez, G., Fernández, A., Castellón, I., Martí, M.A.: Clasificasión verbal: Alternancias de diátesis. Quaderns de Sintagma, Universitat de Lleida (2000)
Duran, M.S., Aluisio, S.M.: Propbank-br: A brazilian treebank annotated with semantic role labels. In: LREC 2012, Istanbul, Turkey (2012)
Salomao, M.M.: Framenet Brasil: Um trabalho em progresso. Revista Calidoscópio 7(3), 171–182 (2009)
Bertoldi, A., Chishman, R.: Frame semantics and legal corpora annotation: Theoretical and applied challenges. Linguistic Issues in Language Technology 7(9) (2012)
da Dias Silva, B.C., Felippo, A.D., Nunes, M.G.V.: The Automatic Mapping of Princeton Wordnet lexical-conceptual relations onto the Brazilian Portuguese Wordnet database. In: Proc. LREC 2008, Marrakech, Morocco, pp. 1535–1541 (2008)
Marrafa, P.: Portuguese wordnet: General architecture and internal semantic relations. DELTA 18, 131–146 (2002)
Marrafa, P., Amaro, R., Chaves, R.P., Lourosa, S., Martins, C., Mendes, S.: Wordnet.pt new directions. In: The Third Global WordNet Association Conference, Jeju, Republic of Korea, pp. 319–320 (2008)
Bentivogli, L., Pianta, E., Girardi, C.: Multiwordnet: Developing an aligned multilingual database. In: The First International Conference on Global WordNet Conference, Mysore, India, pp. 293–302 (2002)
Scarton, C., Aluísio, S.M.: Towards a cross-linguistic Verbnet-style lexicon to Brazilian Portuguese. In: The Workshop on Creating Cross-language Resources for Disconnected Languages and Styles, in Conjunction with LREC 2012, Istanbul, Turkey (2012)
Aluísio, S.M., Pinheiro, G.M., Manfrim, A.M.P., Genovês Jr., L.H.M., Tagnin, S.E.O.: The Lácio-web: Corpora and Tools to Advance Brazilian Portuguese Language Investigations and Computational Linguistic Tools. In: LREC 2004, Lisbon, Portugal, pp. 1779–1782 (2004)
Muniz, M., Paulovich, F.V., Minghim, R., Infante, K., Muniz, F., Vieira, R., Aluísio, S.: Taming the tiger topic: An xces compliant corpus portal to generate subcorpus based on automatic text topic identification. In: CL 2007, Birmingham, UK (2007)
Aziz, W., Specia, L.: Fully automatic compilation of a Portuguese-English parallel corpus for statistical machine translation. In: STIL 2011, Cuiabá, MT (October 2011)
Bick, E.: The Parsing System Palavras: Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. Doctor of philosophy, University of Aarhus (2005)
Zanette, A., Scarton, C., Zilio, L.: Automatic extraction of subcategorization frames from corpora: An approach to Portuguese. In: PROPOR 2012 - Demo Session, Coimbra, Portugal (2012)
Messiant, C.: A subcategorization acquisition system for French verbs. In: NAACL HLT 2008, Columbus, OH, pp. 55–60 (2008)
Zanette, A.: Aquisiçao de Subcategorization Frames para Verbos da Língua Portuguesa. Projeto de diplomação, Federal University of Rio Grande do Sul (2010)
Yang, Z., Oja, E.: Clustering by low-rank doubly stochastic matrix decomposition. In: ICML (2012)
Brew, C., Schulte im Walde, S.: Spectral clustering for german verbs. In: EMNLP 2002, pp. 117–124 (2002)
Meila, M., Shi, J.: A random walks view of spectral segmentation. In: AISTATS (2001)
McNemar, Q.: Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12(2), 153–157 (1947)
Dietterich, T.: Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation 10(7), 1895–1923 (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Scarton, C., Sun, L., Kipper-Schuler, K., Duran, M.S., Palmer, M., Korhonen, A. (2014). Verb Clustering for Brazilian Portuguese. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8403. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-54906-9_3
Download citation
DOI: https://doi.org/10.1007/978-3-642-54906-9_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-54905-2
Online ISBN: 978-3-642-54906-9
eBook Packages: Computer ScienceComputer Science (R0)