Improving part-of-speech tagging using lexicalized HMMs

FERRAN PLA; ANTONIO MOLINA

doi:10.1017/S1351324904003353

Abstract

We introduce a simple method to build Lexicalized Hidden Markov Models (L-HMMs) for improving the precision of part-of-speech tagging. This technique enriches the contextual Language Model taking into account a set of selected words empirically obtained. The evaluation was conducted with different lexicalization criteria on the Penn Treebank corpus using the TnT tagger. This lexicalization obtained about a 6% reduction of the tagging error, on an unseen data test, without reducing the efficiency of the system. We have also studied how the use of linguistic resources, such as dictionaries and morphological analyzers, improves the tagging performance. Furthermore, we have conducted an exhaustive experimental comparison that shows that Lexicalized HMMs yield results which are better than or similar to other state-of-the-art part-of-speech tagging approaches. Finally, we have applied Lexicalized HMMs to the Spanish corpus LexEsp.

Crossref Citations

This article has been cited by the following publications. This list is generated based on data provided by Crossref.

Sánchez-Martínez, Felipe Pérez-Ortiz, Juan Antonio and Forcada, Mikel L. 2004. Advances in Natural Language Processing. Vol. 3230, Issue. , p. 137.

Navarro, José R. González, Jorge Picó, David Casacuberta, Francisco de Val, Joan M. Fabregat, Ferran Pla, Ferran and Tomás, Jesús 2004. Advances in Natural Language Processing. Vol. 3230, Issue. , p. 349.

Fu, Guohong and Luke, Kang-Kwong 2005. Chinese named entity recognition using lexicalized HMMs. ACM SIGKDD Explorations Newsletter, Vol. 7, Issue. 1, p. 19.

Guo-Hong Fu Rui-Feng Xu Kang-Kwong Luke and Qin Lu 2005. Chinese text chunking using lexicalized HMMs. p. 7.

Alba, Enrique Luque, Gabriel and Araujo, Lourdes 2006. Natural language tagging with genetic algorithms. Information Processing Letters, Vol. 100, Issue. 5, p. 173.

Huang, Degen and Sun, Xiao 2007. An Integrative Approach to Chinese Named Entity Recognition. p. 171.

Sánchez-Martínez, Felipe Pérez-Ortiz, Juan Antonio and Forcada, Mikel L. 2008. Using target-language information to train part-of-speech taggers for machine translation. Machine Translation, Vol. 22, Issue. 1-2, p. 29.

Zhang, Xueqing Liu, Zhen Qiu, Huizhong and Fu, Yan 2009. A Hybrid Approach for Chinese Named Entity Recognition in Music Domain. p. 677.

Zamora-Martínez, Francisco Castro-Bleda, María José España-Boquera, Salvador and Tortajada-Velert, Salvador 2010. Current Topics in Artificial Intelligence. Vol. 5988, Issue. , p. 191.

Tyers, Francis Sánchez-Martínez, Felipe Ortiz-Rojas, Sergio and Forcada, Mikel 2010. Free/Open-Source Resources in the Apertium Platform for Machine Translation Research and Development. The Prague Bulletin of Mathematical Linguistics, Vol. 93, Issue. 1, p. 67.

I-Chun Liu I-Chun Chen and Ming-Syan Chen 2010. Le Festin: Shop sign recognition assisted food recommendation system. p. 1.

Sun, Xiao 2011. A discriminative latent model for Chinese multiword expression extraction. p. 253.

Forsati, Rana and Shamsfard, Mehrnoush 2014. Hybrid PoS-tagging: A cooperation of evolutionary and statistical approaches. Applied Mathematical Modelling, Vol. 38, Issue. 13, p. 3193.

Hosseini Pozveh, Zahra Monadjemi, Amirhassan and Ahmadi, Ali 2018. FNLP‐ONT: A feasible ontology for improving NLP tasks in Persian. Expert Systems, Vol. 35, Issue. 4,

Article contents

Improving part-of-speech tagging using lexicalized HMMs

Abstract

Access options

This article has been cited by the following publications. This list is generated based on data provided by Crossref.

Article contents

Improving part-of-speech tagging using lexicalized HMMs

Abstract

Access options

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests