Abstract
Entropy Guided Transformation Learning (ETL) is a new machine learning strategy that combines the advantages of Decision Trees (DT) and Transformation Based Learning (TBL). In this work, we apply the ETL framework to Portuguese Part-of-Speech Taggging. We use two different corpora: Mac-Morpho and Tycho Brahae. ETL achieves the best results reported so far for Machine Learning based POS tagging of both corpora. ETL provides a new training strategy that accelerates transformation learning. For the Mac-Morpho corpus this corresponds to a factor of three speedup. ETL shows accuracies of 96.75% and 96.64% for Mac-Morpho and Tycho Brahae, respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Jurafsky, D., Martin, J.H.: Speech and Language Processing. Printice Hall (2000)
Brill, E.: Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Comput. Linguistics 21, 543–565 (1995)
Ratnaparkhi, A.: A maximum entropy model for part-of-speech tagging. In: Brill, E., Church, K. (eds.) Proceedings of the Conference on Empirical Methods in Natural Language Processing, Somerset, New Jersey, pp. 133–142. Association for Computational Linguistics (1996)
Brants, T.: Tnt – a statistical part-of-speech tagger. In: ANLP, pp. 224–231 (2000)
Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: International Conference on New Methods in Language Processing, Manchester, UK (1994)
Giménez, J., Màrquez, L.: Fast and accurate part-of-speech tagging: The svm approach revisited. In: RANLP, pp. 153–163 (2003)
Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: HLT-NAACL (2003)
Aires, R.V.X., Aluísio, S.M., e Silva Kuhn, D.C., Andreeta, M.L.B., Osvaldo, N., Oliveira, J.: Combining classifiers to improve part of speech tagging: A case study for brazilian portuguese. In: IBERAMIA-SBIA, pp. 227–236. ICMC/USP (2000)
Finger, M.: Técnicas de otimização da precisão empregadas no etiquetador tycho brahe. In: Proceedings of PROPOR, São Paulo, pp. 141–154 (2000)
Kepler, F.N., Finger, M.: Part-of-speech tagging of portuguese based on variable length markov chains. In: Vieira, R., Quaresma, P., Nunes, M.d.G.V., Mamede, N.J., Oliveira, C., Dias, M.C. (eds.) PROPOR 2006. LNCS (LNAI), vol. 3960, pp. 248–251. Springer, Heidelberg (2006)
Kepler, F.N., Finger, M.: Comparing two markov methods for part-of-speech tagging of portuguese. In: IBERAMIA-SBIA, pp. 482–491 (2006)
The lacio web project (accessed in January 23, 2008), http://www.nilc.icmc.usp.br/lacioweb/ferramentas.htm
dos Santos, C.N., Milidiú, R.L.: Entropy guided transformation learning. Technical Report 29/07, Departamento de Informática, PUC-Rio (2007)
Roche, E., Schabes, Y.: Deterministic part-of-speech tagging with finite-state transducers. Comput. Linguist. 21, 227–253 (1995)
Aluísio, S.M., Pelizzoni, J.M., Marchi, A.R., de Oliveira, L., Manenti, R., Marquiafável, V.: An account of the challenge of tagging a reference corpus for brazilian portuguese. In: Mamede, N.J., Baptista, J., Trancoso, I., Nunes, M.d.G.V. (eds.) PROPOR 2003. LNCS, vol. 2721, pp. 110–117. Springer, Heidelberg (2003)
IEL-UNICAMP, IME-USP: (Corpus anotado do português histórico tycho brahe (accessed in January 23, 2008), http://www.ime.usp.br/~tycho/corpus/
Milidiú, R.L., dos Santos, C.N., Duarte, J.C.: Phrase chunking using entropy guided transformation learning. In: Proceedings of ACL 2008, Columbus, Ohio (2008)
Curran, J.R., Wong, R.K.: Formalisation of transformation-based learning. In: Proceedings of the ACSC, Canberra, Australia, pp. 51–57 (2000)
Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Ramshaw, L., Marcus, M.: Text chunking using transformation-based learning. In: Armstrong, S., Church, K., Isabelle, P., Manzi, S., Tzoukermann, E., Yarowsky, D. (eds.) Natural Language Processing Using Very Large Corpora. Kluwer Academic Publishers, Dordrecht (1999)
dos Santos, C.N., Milidiú, R.L.: Probabilistic classifications with tbl. In: Proceedings of Eighth International Conference on Intelligent Text Processing and Computational Linguistics – CICLing, Mexico City, Mexico, pp. 196–207 (2007)
Mangu, L., Brill, E.: Automatic rule acquisition for spelling correction. In: Proceedings of The Fourteenth ICML. Morgan Kaufmann, San Francisco (1997)
Freitas, M.C., Duarte, J.C., dos Santos, C.N., Milidiú, R.L., Renteria, R.P., Quental, V.: A machine learning approach to the identification of appositives. In: Proceedings of Ibero-American AI Conference, Ribeirão Preto, Brazil (2006)
Milidiú, R.L., Duarte, J.C., Cavalcante, R.: Machine learning algorithms for portuguese named entity recognition. In: Proceedings of Fourth Workshop in Information and Human Language Technology, Ribeirão Preto, Brazil (2006)
dos Santos, C.N., Oliveira, C.: Constrained atomic term: Widening the reach of rule templates in transformation based learning. In: Bento, C., Cardoso, A., Dias, G. (eds.) EPIA 2005. LNCS (LNAI), vol. 3808, pp. 622–633. Springer, Heidelberg (2005)
Ngai, G., Florian, R.: Transformation-based learning in the fast lane. In: Proceedings of North Americal ACL, pp. 40–47 (2001)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nogueira dos Santos, C., Milidiú, R.L., Rentería, R.P. (2008). Portuguese Part-of-Speech Tagging Using Entropy Guided Transformation Learning. In: Teixeira, A., de Lima, V.L.S., de Oliveira, L.C., Quaresma, P. (eds) Computational Processing of the Portuguese Language. PROPOR 2008. Lecture Notes in Computer Science(), vol 5190. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85980-2_15
Download citation
DOI: https://doi.org/10.1007/978-3-540-85980-2_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85979-6
Online ISBN: 978-3-540-85980-2
eBook Packages: Computer ScienceComputer Science (R0)