Abstract
This article identifies and addresses the major linguistic/conceptual, as opposed to logistic, issues faced in the morphosyntactic tagging of MAC-Morpho, a 1.1 million word Brazilian Portuguese corpus of newspaper articles that has been developed in the Lacio-Web Project. Rather than simply presenting the annotated corpus and describing its tagset, we elaborate on the criteria for establishing the tagset and analyze some interesting cases amongst the linguistic problems we faced in this work.
This project is partially funded by Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq). We are grateful to E. Bick for parsing MAC-Morpho.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Marques, N.C., Lopes, J.G.P.: A Neural Network Approach to Portuguese Part-of-Speech Tagging. Anais do II Encontro para o Processamento Computacional de Português Escrito e Falado (1996) 1–9
Villavicencio, A., Viccari, R.M., Villavicencio, F.: Evaluating Part-of-Speech Taggers for the Portuguese Language. Anais do II Encontro para o Processamento Computacional de Português Escrito e Falado (1996) 159–167
Aires, R.V.X., Aluísio, S.M., Kuhn, D.C.S., Andreeta, M.L.B., Oliveira Jr., O.N.: Combining Multiple Classifiers to Improve Part of Speech Tagging: A Case Study for Brazilian Portuguese. Proceedings of SBIA’2000 (2000) 20–22
Bick, E.: The Parsing System “Palavras”: Automatic Grammatical Analysis of Portuguese in a Constraint Grammar Framework. Aarhus: Aarhus University Press (2000).
Aluísio, S. et al.: An account of the challenge of tagging a reference corpus of Brazilian Portuguese. Technical Report 188 — ICMC-USP (2003). Also Available at http://www.nilc.icmc.usp.br/~lacio_web/
Macleod, C., Ide, N., Grishman, R.: The American National Corpus: Standardized Resources for American English. Proceedings of the Second Language Resources and Evaluation Conference (LREC) (2000) 831–36
Galves, C., Britto, H.: A Construção do Corpus Anotado do Português Histórico Tycho Brahe: O sistema de anotação morfológica. Proceedings of PROPOR 99 (1999) 81–92.
Déjean, H.: How to Evaluate and Compare Tagsets? A Proposal. Proceedings of the Second Language Resources and Evaluation Conference (LREC) (2000). Also available at http://www.sfb441.uni-tuebingen.de/~dejean/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Aluísio, S., Pelizzoni, J., Marchi, A.R., de Oliveira, L., Manenti, R., Marquiafável, V. (2003). An Account of the Challenge of Tagging a Reference Corpus for Brazilian Portuguese. In: Mamede, N.J., Trancoso, I., Baptista, J., das Graças Volpe Nunes, M. (eds) Computational Processing of the Portuguese Language. PROPOR 2003. Lecture Notes in Computer Science(), vol 2721. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45011-4_17
Download citation
DOI: https://doi.org/10.1007/3-540-45011-4_17
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40436-1
Online ISBN: 978-3-540-45011-5
eBook Packages: Springer Book Archive