Hostname: page-component-8448b6f56d-xtgtn Total loading time: 0 Render date: 2024-04-23T17:03:21.736Z Has data issue: false hasContentIssue false

Improving part-of-speech tagging using lexicalized HMMs

Published online by Cambridge University Press:  13 May 2004

FERRAN PLA
Affiliation:
Departament de Sistemes Informàtics i Computació, Universitat Politècnica de València, Camí de Vera, s/n. 46020 València SPAIN e-mail: fpla@dsci.upv.es
ANTONIO MOLINA
Affiliation:
Departament de Sistemes Informàtics i Computació, Universitat Politècnica de València, Camí de Vera, s/n. 46020 València SPAIN e-mail: fpla@dsci.upv.es

Abstract

We introduce a simple method to build Lexicalized Hidden Markov Models (L-HMMs) for improving the precision of part-of-speech tagging. This technique enriches the contextual Language Model taking into account a set of selected words empirically obtained. The evaluation was conducted with different lexicalization criteria on the Penn Treebank corpus using the TnT tagger. This lexicalization obtained about a 6% reduction of the tagging error, on an unseen data test, without reducing the efficiency of the system. We have also studied how the use of linguistic resources, such as dictionaries and morphological analyzers, improves the tagging performance. Furthermore, we have conducted an exhaustive experimental comparison that shows that Lexicalized HMMs yield results which are better than or similar to other state-of-the-art part-of-speech tagging approaches. Finally, we have applied Lexicalized HMMs to the Spanish corpus LexEsp.

Type
Papers
Copyright
© 2004 Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)