Skip to main content
Log in

Brazilian Portuguese corpora for teaching and translation: the CoMET project

  • Project Notes
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

This paper starts with an overview of corpora available for Brazilian Portuguese to subsequently focus mainly on the CoMET Project developed at the University of São Paulo. CoMET consists of three corpora: a comparable Portuguese-English technical corpus (CorTec), a Portuguese-English parallel (translation) corpus (CorTrad) and a multilingual learner corpus, (CoMAprend), all available for online queries with specific tools. CorTec offers over fifty corpora in a variety of domains, from Health Sciences to Olympic Games. CorTrad is divided into three parts: Popular Science, Technical-Scientific and Literary. Each one of CoMET’s corpora is presented in detail. Examples are also provided.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Notes

  1. COCA: https://www.english-corpora.org/coca/.

  2. Corpus do Português: NOW: https://www.corpusdoportugues.org/now/.

References

  • Almeida, G. D., & Aluísio, S. M. (2006). O que é e como se contrói um corpus? Lições aprendidas na compilação de vários corpora para pesquisa linguística (What is and how do you build a corpus? Lessons learned in compiling various corpora for linguistic research). Calidoscópio, 4(3), 156–178.

    Google Scholar 

  • Alvino, Jamilly Brandão. (2022). Ah! os neologismos seussianos que você lerá! - a criatividade de Dr. Seuss em tradução para o português e o chinês: um estudo à luz da linguística de corpus (Oh! the Seussian neologisms you´ll read! - Dr. Seuss´ creativity in translation into Portuguese and Chinese: a corpus-based study). Ms. thesis. University of São Paulo.

  • Aluísio, S. M., Plissoni, J., Marchi, A., Oliveira, L., Manenti, R., & Marquivafável, V. (2003). An account of the challenge of tagging a reference corpus of Brazilian Portuguese. In Lecture notes on artificial intelligence. Proceedings of PROPOR 2003. 1. Springer Verlag

  • Aluísio, S., Pinheiro, G. M., Manfrim, A., Oliveira, L., Genoves, L., & Tagnin, S. (2004). The lácio-web corpora and tools to advance brazilian portuguese language investigations and computational linguistic tools. In Proceedings of LREC 2004, 1779–1782

  • Bick, E. (2000). The parsing system “Palavras”: Automatic grammatical analysis of Portuguese in a constraint grammar framework. Aarhus University Press.

    Google Scholar 

  • Frankenberg-Garcia, A., & Santos, D. (2003). Introducing COMPARA, the Portuguese-English parallel translation corpus. In F. Zanettin, S. Bernardini, & D. Stewart (Eds.), Corpora in Translation Education (pp. 71–87). St. Jerome.

    Google Scholar 

  • Marqueze, G. M. (2021). Trio elétrico ou bandwagon? Um estudo das estratégias de tradução de referentes culturais em textos turísticos. (Trio elétrico or bandwagon? A study of the strategies to translate cultural referents in tourism texts). TradTerm, 37, 671–699.

    Article  Google Scholar 

  • McEnery, T., & Hardie, A. (2011). Corpus linguistics: Method, theory and practice. Cambridge University Press.

    Book  Google Scholar 

  • Navarro, S. (2021). Investigating the influence of culture on language patterns: A contrastive corpus-based study. TradTerm, 3(2), 644–670.

    Article  Google Scholar 

  • Pinheiro, G., Finger, M., Tagnin, S. E., & Nunes, M. V. (2003). The LacioWeb Project: overview and issues in Brazilian Portuguese corpora creation. In Proceedings of Corpus Linguistics 2003, 16, 14–21

  • Ramos, Barbara C. (2021). Descrição de uma metodologia desenvolvida para revisão de um léxico de palavras de emoção (Description of a methodology developed to revise a lexicon of emotion words). In Anais do Simpósio Brasileiro de tecnologia da Informação e da Linguagem Humana (STIL)—VII Jornada da Descrição do Português, 389–397

  • Rayson, P., & Garside, R. (1998). The CLAWS web tagger. ICAME Journal, 22, 121–123.

    Google Scholar 

  • Rebechi, R., & Tagnin, S. (2020). Brazilian cultural markers in translation: A model for a corpus-based glossary. Research in Corpus Linguistics, 8, 65–85.

    Article  Google Scholar 

  • Santos, D., Tagnin, S. E., & Teixeira, E. D. (2012). CorTrad and Portuguese-English translation studies: Investigating colors. In S. O. Ebeling, J. Ebeling, & H. Hasselgaard (Eds.), Varieng—Studies in variation, contacts and change in English, 12. Accessed 28, May 2021, Available at https://varieng.helsinki.fi/series/volumes/12/santos_tagnin_teixeira/

  • Santos, Diana e Simões, Alberto. (2022). Clustering emotions in Portuguese. Journal of Portuguese Linguistics 21, pp 1–33.

    Google Scholar 

  • Santos, Gabriela Pereira dos. (2022). Colocações especializadas em Harmonia Musical: um glossário bidirecional português-inglês (Specialized collocations in Musical Harmony: a bidirectional Portuguese-English glossary). PhD. dissertation. University of São Paulo.

  • Tagnin, S. E. (2007). A Identificação de equivalentes tradutórios em corpora comparáveis (Identifying translation equivalents in comparable corpora). In Anais do I Congresso Internacional da ABRAPUI. Source: https://comet.fflch.usp.br/sites/comet.fflch.usp.br/files/u30/Stella_Abrapui%202007_artigo.pdf

  • Tagnin, S. E. (2003). COMET—A multilingual corpus for teaching and translation. PALC 2001: Practical applications in language corpora (Lodz, Poland) (pp. 535–540). Peter Lang.

    Google Scholar 

  • Tagnin, S. E. (2015). Corpus-driven glossaries in translator training courses. Oslo Studies in Lamguage, 7(1), 359–377.

    Google Scholar 

  • Tagnin, S. E., & Teixeira, E. D. (2012). Translator-oriented, corpus-driven technical glossaries: The case of cooking terms. Corpora, 7(1), 51–67.

    Article  Google Scholar 

  • Tagnin, S. E., Teixeira, E. D., & Santos, D. (2009). CorTrad: A multiversion translation corpus for the Portuguese-English pair. Arena Linguistica, 4, 314–323.

    Google Scholar 

  • Teixeira, E. D. (2008). A linguística de corpus a serviço do tradutor: proposta de um dicionário de culinária voltado para a produção textual (Corpus Linguistics to serve the translator: proposal of a culinary dictionary for textual production). PhD disseertation. Faculdade de Filosofia, Letras e Ciências Humanas. Accessed May 30, 2021, Available at https://www.teses.usp.br/teses/disponiveis/8/8147/tde-16022009-141747/pt-br.php

  • Teixeira, E. D., & Tagnin, S. E. (2008). Vocabulário para Culinária inglês/português (Culinary Vocabulary English-Portuguese) Série Mil e Um Termos. SBS

  • Teixeira, E. D., Santos, D., & Tagnin, S. E. (2012). CorTrad: um novo corpus paralelo multiversão para o par de línguas português-inglês (CorTrad: a new multiversion parallel corpus for the Portuguese-English pair). In T. Shepherd, T. Berber Sardinha, & M. V. Pinto (Eds.), Caminhos na Linguística de Corpus (151–176). Mercado de Letras

  • Trindade, E. A. (2022). A Legendagem de Séries Brasileiras - Um estudo do impacto da tradução sob a ótica da Linguística de Corpus e da Análise de Sentimento (Subtitling in Brazilian Series—A study of the translation impact. Universidade de São Paulo.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stella E. O. Tagnin.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tagnin, S.E.O. Brazilian Portuguese corpora for teaching and translation: the CoMET project. Lang Resources & Evaluation 58, 347–361 (2024). https://doi.org/10.1007/s10579-023-09690-z

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-023-09690-z

Keywords

Navigation