Performance Analysis of a POS Tagger applied to Discharge Summaries in Portuguese

Oleynik, Michel; Nohama, Percy; Cancian, Pindaro Secco; Schulz, Stefan

doi:10.3233/978-1-60750-588-4-959

Abstract

Part of speech taggers need a considerable amount of data to train their models. Such data is not readily available for medical texts in Portuguese. We evaluated the accuracy of a morphological tagger against a gold standard when trained with corpora of different sizes and domains. Accuracy was the highest with a medical corpus during the complete training process, achieving 91.5%. Training on a newswire corpus achieved 75.3% only. Furthermore, an active learning technique has been adapted to the POS tagging task. The algorithm uses a POS tagger committee to isolate the sentences with the highest disagreement indexes for manual correction. However, the method was not able to reduce training and tagging times when compared to a random selection strategy. We encourage that future works employ some effort in order to annotate a small amount of random data in the domain of study, which should be enough for higher accuracy rates.

Contact

IOS Press Copyright 2024

Contact

IOS Press Copyright 2024

This website uses cookies

This website uses cookies