Abstract
This paper presents the process for bootstrapping a computational lexicon of multiword adverbs for Brazilian Portuguese (PT-BR) from an already existing lexicon built for the European variety of the language (PT-PT). This ongoing work aims to identify, collect, and provide a syntactical description of multiword adverbs in PT-BR, in order to produce a comprehensive lexicon of multiword adverbs in Portuguese. First, existing resources for this part-of-speech are presented, followed by the methods adopted for building this novel resource. Up to the present moment, approximately 700 new PT-BR multiword adverbs entered the lexicon, totaling, nearly 2,300 entries. We assessed this new lexical resource against a sample of 1,000 sentences, taken from a publicly available corpus collected from Brazilian Portuguese journalistic texts. Results are promising, although there is still room for improvement, given that the F-measure only reached a suboptimal 0.66 mark. We estimate that another 2,100 PT-BR adverbs will enter the lexicon, totaling +4,000 multiword adverbs in Portuguese.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
https://gitlab.com/parseme/parseme_corpus_pt (July 29, 2022). All the URL in this paper were verified on this date).
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
- 9.
References
Almeida, J.J.: Dicionário de Calão e Expressões Idiomáticas. Editora Guerra & Paz (2019)
Baptista, J.: Manhã, tarde, noite. Analysis of temporal adverbs using local grammars. Seminários de Linguística (3), 1–27 (1999)
Baptista, J., Guitart, D.C.: Compound temporal adverbs in Portuguese and in Spanish. In: Ranchhod, E., Mamede, N.J. (eds.) PorTAL 2002. LNCS (LNAI), vol. 2389, pp. 133–136. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45433-0_20
Baptista, J., Vieira, L.N., Diniz, C., Mamede, N.: Coordination of -mente ending adverbs in Portuguese: an integrated solution. In: Caseli, H., Villavicencio, A., Teixeira, A., Perdigão, F. (eds.) PROPOR 2012. LNCS (LNAI), vol. 7243, pp. 24–34. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28885-2_3
Bick, E.: The Parsing System Palavras: automatic grammatical analysis of Portuguese in a constraint grammar famework. Aarhus Universitetsforlag (2000)
Català, D.: Les adverbs composés: approches contrastives en linguistique appliquée. Ph.D. thesis, Universitat Autònoma de Barcelona, Barcelona (2003)
Català, D., Baptista, J., Palma, C.: Problèmes formels concernant la traduction des adverbes composés (espagnol/portugais). Langue(s) Parole 5, 67–82 (2020)
Constant, M., Eryigit, G., Monti, J., van der Plas, L., Ramisch, C., Rosner, M., Todirascu, A.: Multiword expression processing: a survey. Comput. Linguis. 43(4), 837–892 (2017)
Fernandes, G.: Automatic disambiguation of -mente ending Adverbs in Brazilian Portuguese. Master’s thesis, Universidade do Algarve and Universitat Autònoma de Barcelona, Faculdade de Ciências Humanas e Sociais, Faro, Portugal (2011)
Galvão, A., Baptista, J., Mamede, N.: New developments on processing European Portuguese verbal idioms. In: Prolo, C.A., de Oliveira, L.H.M. (eds.) 12th Symposium in Information and Human Language Technology, pp. 229–238. Salvador, BA (Brazil), 15–18 October 2019
Gonçalves, M., Coheur, L., Baptista, J., Mineiro, A.: Avaliação de recursos computacionais para o português. Linguamática 12(2), 51–68 (2020)
Gross, M.: Grammaire transformationnelle du français: 3 - Syntaxe de l’adverbe. ASSTRIL, Paris (1986)
Gross, M.: Lexicon-grammar. In: Brown, K., Miller, J. (eds.) Concise Encyclopedia of Syntactic Theories, pp. 244–259. Pergamon, Cambridge (1996)
Gross, M.: A bootstrap method for constructing local grammars. In: Proceedings of the Symposium on Contemporary Mathematics, pp. 229–250. University of Belgrad (1999)
Hagège, C., Baptista, J., Mamede, N.: Portuguese temporal expressions recognition: from te characterization to an effective ter module implementation. In: STIL’2009. 7th Brazilian Symposium in Information and Human Language Technology. NILC-CMSC/USP, São Carlos, Brasil (2009)
Hagège, C., Baptista, J., Mamede, N.: Caracterização e processamento de expressões temporais em Português. Linguamática 2(1), 63–76 (2010)
Jauhiainen, T., Lui, M., Zampieri, M., Baldwin, T., Lindén, K.: Automatic language identification in texts: A survey. J. Artif. Intell. Res. 65, 675–782 (2019)
Laporte, E., Voyatzi, S.: An electronic dictionary of French multiword adverbs. In: Language Resources and Evaluation conference. Workshop Towards a Shared task for Multiword Expressions, pp. 31–34 (2008)
de Macedo Rocha, C.A., Rocha, C.E.P.d.M.: Dicionário de locuções e expressões da língua portuguesa. LEXIKON Editora (2011)
Mamede, N., Baptista, J., Diniz, C., Cabarrão, V.: String - a hybrid statistical and rule-based natural language processing chain for Portuguese. In: Computational Processing of the Portuguese Language (PROPOR 2012), vol. Demo Session, p. s/p. PROPOR, PROPOR, Coimbra, Portugal, 17–20 April 2012
Marques Ranchod, E.: Analyse d’adverbes par verbes supports: exemples du portugais. Linx 34(1), 211–218 (1996)
Molinier, C., Levrier, F.: Grammaire des adverbes: description des formes en -ment. Droz, Genève (2000)
Moreno-Ortiz, A., Pérez-Hernández, C., Del-Olmo, M.: Managing multiword expressions in a lexicon-based sentiment analysis system for Spanish. In: Proceedings of the 9th Workshop on Multiword Expressions, pp. 1–10 (2013)
Neves, O.: Dicionário de Expressões Correntes. Coleção "Outros Dicionários", 2nd edn., augmented edition Editorial Notícias, Lisboa, Portugal (2000)
Palma, C.: Estudo Contrastivo Português-Espanhol de Expressões Fixas Adverbiais. Master’s thesis, Universidade do Algarve, Faculdade de Ciências Humanas e Sociais, Faro, Portugal (2009)
Paumier, S., et al.: Unitex 3.2 - User Manual. Université de Paris-Est/Marne-la-Vallée - Institut Gaspard Monge, Noisy-Champs 9 September 2021
Rademaker, A., Chalub, F., Real, L., Freitas, C., Bick, E., de Paiva, V.: Universal dependencies for Portuguese. In: Proceedings of the Fourth International Conference on Dependency Linguistics (Depling), pp. 197–206. Pisa, Italy September 2017. http://aclweb.org/anthology/W17-6523
Ranchhod, E.M.: Frozen adverbs - Comparative forms ’Como C’ in Portuguese. Linguisticae Investigationes XV(1), 141–170 (1991)
Riva, H.C.: Dicionário onomasiológico de expressões idiomáticas usuais na língua portuguesa no Brasil. Universidade Estadual Paulista (UNESP) (2009)
Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: a pain in the neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45715-1_1
Santos, A.N.: Novos Dicionários de Expressões Idiomáticas. Edições João Sá da Costa, Lisboa (1990)
Schwab, A.: Locuções Adverbiais. Fundação da Universidade Federal do Paraná, second edition edn. (1985)
Shudo, K., Kurahone, A., Tanabe, T.: A comprehensive dictionary of multiword expressions. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 161–170 (2011)
Simões, G.A.: Dicionário de Expressões Populares Portuguesas. Publicações D. Quixote, Lisboa, Portugal (1993)
Zampieri, M., Gebre, B.G.: Automatic identification of language varieties: the case of Portuguese. In: KONVENS2012-The 11th Conference on Natural Language Processing, pp. 233–237. Österreichischen Gesellschaft für Artificial Intelligende (ÖGAI) (2012)
Žižková, H.: Improving compound adverb tagging. In: RASLAN 2018 Recent Advances in Slavonic Natural Language Processing, p. 103 (2018)
Acknowledgements
Research for this paper was partially supported by national funds through FCT, Fundação para a Ciência e a Tecnologia, under project UIDB/50021/2020.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Müller, I., Mamede, N., Baptista, J. (2022). Bootstrapping a Lexicon of Multiword Adverbs for Brazilian Portuguese. In: Corpas Pastor, G., Mitkov, R. (eds) Computational and Corpus-Based Phraseology. EUROPHRAS 2022. Lecture Notes in Computer Science(), vol 13528. Springer, Cham. https://doi.org/10.1007/978-3-031-15925-1_12
Download citation
DOI: https://doi.org/10.1007/978-3-031-15925-1_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15924-4
Online ISBN: 978-3-031-15925-1
eBook Packages: Computer ScienceComputer Science (R0)