As a guest user you are not logged in or recognized by your IP address. You have
access to the Front Matter, Abstracts, Author Index, Subject Index and the full
text of Open Access publications.
Part of speech taggers need a considerable amount of data to train their models. Such data is not readily available for medical texts in Portuguese. We evaluated the accuracy of a morphological tagger against a gold standard when trained with corpora of different sizes and domains. Accuracy was the highest with a medical corpus during the complete training process, achieving 91.5%. Training on a newswire corpus achieved 75.3% only. Furthermore, an active learning technique has been adapted to the POS tagging task. The algorithm uses a POS tagger committee to isolate the sentences with the highest disagreement indexes for manual correction. However, the method was not able to reduce training and tagging times when compared to a random selection strategy. We encourage that future works employ some effort in order to annotate a small amount of random data in the domain of study, which should be enough for higher accuracy rates.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.
This website uses cookies
We use cookies to provide you with the best possible experience. They also allow us to analyze user behavior in order to constantly improve the website for you. Info about the privacy policy of IOS Press.