Skip to main content

Bootstrapping a Lexicon of Multiword Adverbs for Brazilian Portuguese

  • Conference paper
  • First Online:
Computational and Corpus-Based Phraseology (EUROPHRAS 2022)

Abstract

This paper presents the process for bootstrapping a computational lexicon of multiword adverbs for Brazilian Portuguese (PT-BR) from an already existing lexicon built for the European variety of the language (PT-PT). This ongoing work aims to identify, collect, and provide a syntactical description of multiword adverbs in PT-BR, in order to produce a comprehensive lexicon of multiword adverbs in Portuguese. First, existing resources for this part-of-speech are presented, followed by the methods adopted for building this novel resource. Up to the present moment, approximately 700 new PT-BR multiword adverbs entered the lexicon, totaling, nearly 2,300 entries. We assessed this new lexical resource against a sample of 1,000 sentences, taken from a publicly available corpus collected from Brazilian Portuguese journalistic texts. Results are promising, although there is still room for improvement, given that the F-measure only reached a suboptimal 0.66 mark. We estimate that another 2,100 PT-BR adverbs will enter the lexicon, totaling +4,000 multiword adverbs in Portuguese.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 54.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 69.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://gitlab.com/parseme/parseme_corpus_pt (July 29, 2022). All the URL in this paper were verified on this date).

  2. 2.

    https://gitlab.com/parseme/parseme_corpus_pt/-/raw/master/pt_gsd-ud-train.cupt.

  3. 3.

    https://typo.uni-konstanz.de/parseme/index.php/the-action/about-cost.

  4. 4.

    https://dicionario.priberam.org.

  5. 5.

    https://houaiss.uol.com.br/.

  6. 6.

    https://www.linguateca.pt/cetempublico.

  7. 7.

    https://www.linguateca.pt/acesso/corpus.php?corpus=SAOCARLOS.

  8. 8.

    https://unitexgramlab.org/.

  9. 9.

    https://gitlab.com/parseme/parseme_corpus_pt.

References

  1. Almeida, J.J.: Dicionário de Calão e Expressões Idiomáticas. Editora Guerra & Paz (2019)

    Google Scholar 

  2. Baptista, J.: Manhã, tarde, noite. Analysis of temporal adverbs using local grammars. Seminários de Linguística (3), 1–27 (1999)

    Google Scholar 

  3. Baptista, J., Guitart, D.C.: Compound temporal adverbs in Portuguese and in Spanish. In: Ranchhod, E., Mamede, N.J. (eds.) PorTAL 2002. LNCS (LNAI), vol. 2389, pp. 133–136. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45433-0_20

    Chapter  MATH  Google Scholar 

  4. Baptista, J., Vieira, L.N., Diniz, C., Mamede, N.: Coordination of -mente ending adverbs in Portuguese: an integrated solution. In: Caseli, H., Villavicencio, A., Teixeira, A., Perdigão, F. (eds.) PROPOR 2012. LNCS (LNAI), vol. 7243, pp. 24–34. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-28885-2_3

    Chapter  Google Scholar 

  5. Bick, E.: The Parsing System Palavras: automatic grammatical analysis of Portuguese in a constraint grammar famework. Aarhus Universitetsforlag (2000)

    Google Scholar 

  6. Català, D.: Les adverbs composés: approches contrastives en linguistique appliquée. Ph.D. thesis, Universitat Autònoma de Barcelona, Barcelona (2003)

    Google Scholar 

  7. Català, D., Baptista, J., Palma, C.: Problèmes formels concernant la traduction des adverbes composés (espagnol/portugais). Langue(s) Parole 5, 67–82 (2020)

    Google Scholar 

  8. Constant, M., Eryigit, G., Monti, J., van der Plas, L., Ramisch, C., Rosner, M., Todirascu, A.: Multiword expression processing: a survey. Comput. Linguis. 43(4), 837–892 (2017)

    Article  MathSciNet  Google Scholar 

  9. Fernandes, G.: Automatic disambiguation of -mente ending Adverbs in Brazilian Portuguese. Master’s thesis, Universidade do Algarve and Universitat Autònoma de Barcelona, Faculdade de Ciências Humanas e Sociais, Faro, Portugal (2011)

    Google Scholar 

  10. Galvão, A., Baptista, J., Mamede, N.: New developments on processing European Portuguese verbal idioms. In: Prolo, C.A., de Oliveira, L.H.M. (eds.) 12th Symposium in Information and Human Language Technology, pp. 229–238. Salvador, BA (Brazil), 15–18 October 2019

    Google Scholar 

  11. Gonçalves, M., Coheur, L., Baptista, J., Mineiro, A.: Avaliação de recursos computacionais para o português. Linguamática 12(2), 51–68 (2020)

    Google Scholar 

  12. Gross, M.: Grammaire transformationnelle du français: 3 - Syntaxe de l’adverbe. ASSTRIL, Paris (1986)

    Google Scholar 

  13. Gross, M.: Lexicon-grammar. In: Brown, K., Miller, J. (eds.) Concise Encyclopedia of Syntactic Theories, pp. 244–259. Pergamon, Cambridge (1996)

    Google Scholar 

  14. Gross, M.: A bootstrap method for constructing local grammars. In: Proceedings of the Symposium on Contemporary Mathematics, pp. 229–250. University of Belgrad (1999)

    Google Scholar 

  15. Hagège, C., Baptista, J., Mamede, N.: Portuguese temporal expressions recognition: from te characterization to an effective ter module implementation. In: STIL’2009. 7th Brazilian Symposium in Information and Human Language Technology. NILC-CMSC/USP, São Carlos, Brasil (2009)

    Google Scholar 

  16. Hagège, C., Baptista, J., Mamede, N.: Caracterização e processamento de expressões temporais em Português. Linguamática 2(1), 63–76 (2010)

    Google Scholar 

  17. Jauhiainen, T., Lui, M., Zampieri, M., Baldwin, T., Lindén, K.: Automatic language identification in texts: A survey. J. Artif. Intell. Res. 65, 675–782 (2019)

    Article  MathSciNet  Google Scholar 

  18. Laporte, E., Voyatzi, S.: An electronic dictionary of French multiword adverbs. In: Language Resources and Evaluation conference. Workshop Towards a Shared task for Multiword Expressions, pp. 31–34 (2008)

    Google Scholar 

  19. de Macedo Rocha, C.A., Rocha, C.E.P.d.M.: Dicionário de locuções e expressões da língua portuguesa. LEXIKON Editora (2011)

    Google Scholar 

  20. Mamede, N., Baptista, J., Diniz, C., Cabarrão, V.: String - a hybrid statistical and rule-based natural language processing chain for Portuguese. In: Computational Processing of the Portuguese Language (PROPOR 2012), vol. Demo Session, p. s/p. PROPOR, PROPOR, Coimbra, Portugal, 17–20 April 2012

    Google Scholar 

  21. Marques Ranchod, E.: Analyse d’adverbes par verbes supports: exemples du portugais. Linx 34(1), 211–218 (1996)

    Article  Google Scholar 

  22. Molinier, C., Levrier, F.: Grammaire des adverbes: description des formes en -ment. Droz, Genève (2000)

    Google Scholar 

  23. Moreno-Ortiz, A., Pérez-Hernández, C., Del-Olmo, M.: Managing multiword expressions in a lexicon-based sentiment analysis system for Spanish. In: Proceedings of the 9th Workshop on Multiword Expressions, pp. 1–10 (2013)

    Google Scholar 

  24. Neves, O.: Dicionário de Expressões Correntes. Coleção "Outros Dicionários", 2nd edn., augmented edition Editorial Notícias, Lisboa, Portugal (2000)

    Google Scholar 

  25. Palma, C.: Estudo Contrastivo Português-Espanhol de Expressões Fixas Adverbiais. Master’s thesis, Universidade do Algarve, Faculdade de Ciências Humanas e Sociais, Faro, Portugal (2009)

    Google Scholar 

  26. Paumier, S., et al.: Unitex 3.2 - User Manual. Université de Paris-Est/Marne-la-Vallée - Institut Gaspard Monge, Noisy-Champs 9 September 2021

    Google Scholar 

  27. Rademaker, A., Chalub, F., Real, L., Freitas, C., Bick, E., de Paiva, V.: Universal dependencies for Portuguese. In: Proceedings of the Fourth International Conference on Dependency Linguistics (Depling), pp. 197–206. Pisa, Italy September 2017. http://aclweb.org/anthology/W17-6523

  28. Ranchhod, E.M.: Frozen adverbs - Comparative forms ’Como C’ in Portuguese. Linguisticae Investigationes XV(1), 141–170 (1991)

    Google Scholar 

  29. Riva, H.C.: Dicionário onomasiológico de expressões idiomáticas usuais na língua portuguesa no Brasil. Universidade Estadual Paulista (UNESP) (2009)

    Google Scholar 

  30. Sag, I.A., Baldwin, T., Bond, F., Copestake, A., Flickinger, D.: Multiword expressions: a pain in the neck for NLP. In: Gelbukh, A. (ed.) CICLing 2002. LNCS, vol. 2276, pp. 1–15. Springer, Heidelberg (2002). https://doi.org/10.1007/3-540-45715-1_1

    Chapter  Google Scholar 

  31. Santos, A.N.: Novos Dicionários de Expressões Idiomáticas. Edições João Sá da Costa, Lisboa (1990)

    Google Scholar 

  32. Schwab, A.: Locuções Adverbiais. Fundação da Universidade Federal do Paraná, second edition edn. (1985)

    Google Scholar 

  33. Shudo, K., Kurahone, A., Tanabe, T.: A comprehensive dictionary of multiword expressions. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pp. 161–170 (2011)

    Google Scholar 

  34. Simões, G.A.: Dicionário de Expressões Populares Portuguesas. Publicações D. Quixote, Lisboa, Portugal (1993)

    Google Scholar 

  35. Zampieri, M., Gebre, B.G.: Automatic identification of language varieties: the case of Portuguese. In: KONVENS2012-The 11th Conference on Natural Language Processing, pp. 233–237. Österreichischen Gesellschaft für Artificial Intelligende (ÖGAI) (2012)

    Google Scholar 

  36. Žižková, H.: Improving compound adverb tagging. In: RASLAN 2018 Recent Advances in Slavonic Natural Language Processing, p. 103 (2018)

    Google Scholar 

Download references

Acknowledgements

Research for this paper was partially supported by national funds through FCT, Fundação para a Ciência e a Tecnologia, under project UIDB/50021/2020.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Izabela Müller .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Müller, I., Mamede, N., Baptista, J. (2022). Bootstrapping a Lexicon of Multiword Adverbs for Brazilian Portuguese. In: Corpas Pastor, G., Mitkov, R. (eds) Computational and Corpus-Based Phraseology. EUROPHRAS 2022. Lecture Notes in Computer Science(), vol 13528. Springer, Cham. https://doi.org/10.1007/978-3-031-15925-1_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-15925-1_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-15924-4

  • Online ISBN: 978-3-031-15925-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics