Skip to main content

Portuguese Part-of-Speech Tagging Using Entropy Guided Transformation Learning

  • Conference paper
Computational Processing of the Portuguese Language (PROPOR 2008)

Abstract

Entropy Guided Transformation Learning (ETL) is a new machine learning strategy that combines the advantages of Decision Trees (DT) and Transformation Based Learning (TBL). In this work, we apply the ETL framework to Portuguese Part-of-Speech Taggging. We use two different corpora: Mac-Morpho and Tycho Brahae. ETL achieves the best results reported so far for Machine Learning based POS tagging of both corpora. ETL provides a new training strategy that accelerates transformation learning. For the Mac-Morpho corpus this corresponds to a factor of three speedup. ETL shows accuracies of 96.75% and 96.64% for Mac-Morpho and Tycho Brahae, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jurafsky, D., Martin, J.H.: Speech and Language Processing. Printice Hall (2000)

    Google Scholar 

  2. Brill, E.: Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Comput. Linguistics 21, 543–565 (1995)

    Google Scholar 

  3. Ratnaparkhi, A.: A maximum entropy model for part-of-speech tagging. In: Brill, E., Church, K. (eds.) Proceedings of the Conference on Empirical Methods in Natural Language Processing, Somerset, New Jersey, pp. 133–142. Association for Computational Linguistics (1996)

    Google Scholar 

  4. Brants, T.: Tnt – a statistical part-of-speech tagger. In: ANLP, pp. 224–231 (2000)

    Google Scholar 

  5. Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: International Conference on New Methods in Language Processing, Manchester, UK (1994)

    Google Scholar 

  6. Giménez, J., Màrquez, L.: Fast and accurate part-of-speech tagging: The svm approach revisited. In: RANLP, pp. 153–163 (2003)

    Google Scholar 

  7. Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: HLT-NAACL (2003)

    Google Scholar 

  8. Aires, R.V.X., Aluísio, S.M., e Silva Kuhn, D.C., Andreeta, M.L.B., Osvaldo, N., Oliveira, J.: Combining classifiers to improve part of speech tagging: A case study for brazilian portuguese. In: IBERAMIA-SBIA, pp. 227–236. ICMC/USP (2000)

    Google Scholar 

  9. Finger, M.: Técnicas de otimização da precisão empregadas no etiquetador tycho brahe. In: Proceedings of PROPOR, São Paulo, pp. 141–154 (2000)

    Google Scholar 

  10. Kepler, F.N., Finger, M.: Part-of-speech tagging of portuguese based on variable length markov chains. In: Vieira, R., Quaresma, P., Nunes, M.d.G.V., Mamede, N.J., Oliveira, C., Dias, M.C. (eds.) PROPOR 2006. LNCS (LNAI), vol. 3960, pp. 248–251. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  11. Kepler, F.N., Finger, M.: Comparing two markov methods for part-of-speech tagging of portuguese. In: IBERAMIA-SBIA, pp. 482–491 (2006)

    Google Scholar 

  12. The lacio web project (accessed in January 23, 2008), http://www.nilc.icmc.usp.br/lacioweb/ferramentas.htm

  13. dos Santos, C.N., Milidiú, R.L.: Entropy guided transformation learning. Technical Report 29/07, Departamento de Informática, PUC-Rio (2007)

    Google Scholar 

  14. Roche, E., Schabes, Y.: Deterministic part-of-speech tagging with finite-state transducers. Comput. Linguist. 21, 227–253 (1995)

    Google Scholar 

  15. Aluísio, S.M., Pelizzoni, J.M., Marchi, A.R., de Oliveira, L., Manenti, R., Marquiafável, V.: An account of the challenge of tagging a reference corpus for brazilian portuguese. In: Mamede, N.J., Baptista, J., Trancoso, I., Nunes, M.d.G.V. (eds.) PROPOR 2003. LNCS, vol. 2721, pp. 110–117. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  16. IEL-UNICAMP, IME-USP: (Corpus anotado do português histórico tycho brahe (accessed in January 23, 2008), http://www.ime.usp.br/~tycho/corpus/

  17. Milidiú, R.L., dos Santos, C.N., Duarte, J.C.: Phrase chunking using entropy guided transformation learning. In: Proceedings of ACL 2008, Columbus, Ohio (2008)

    Google Scholar 

  18. Curran, J.R., Wong, R.K.: Formalisation of transformation-based learning. In: Proceedings of the ACSC, Canberra, Australia, pp. 51–57 (2000)

    Google Scholar 

  19. Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)

    Google Scholar 

  20. Ramshaw, L., Marcus, M.: Text chunking using transformation-based learning. In: Armstrong, S., Church, K., Isabelle, P., Manzi, S., Tzoukermann, E., Yarowsky, D. (eds.) Natural Language Processing Using Very Large Corpora. Kluwer Academic Publishers, Dordrecht (1999)

    Google Scholar 

  21. dos Santos, C.N., Milidiú, R.L.: Probabilistic classifications with tbl. In: Proceedings of Eighth International Conference on Intelligent Text Processing and Computational Linguistics – CICLing, Mexico City, Mexico, pp. 196–207 (2007)

    Google Scholar 

  22. Mangu, L., Brill, E.: Automatic rule acquisition for spelling correction. In: Proceedings of The Fourteenth ICML. Morgan Kaufmann, San Francisco (1997)

    Google Scholar 

  23. Freitas, M.C., Duarte, J.C., dos Santos, C.N., Milidiú, R.L., Renteria, R.P., Quental, V.: A machine learning approach to the identification of appositives. In: Proceedings of Ibero-American AI Conference, Ribeirão Preto, Brazil (2006)

    Google Scholar 

  24. Milidiú, R.L., Duarte, J.C., Cavalcante, R.: Machine learning algorithms for portuguese named entity recognition. In: Proceedings of Fourth Workshop in Information and Human Language Technology, Ribeirão Preto, Brazil (2006)

    Google Scholar 

  25. dos Santos, C.N., Oliveira, C.: Constrained atomic term: Widening the reach of rule templates in transformation based learning. In: Bento, C., Cardoso, A., Dias, G. (eds.) EPIA 2005. LNCS (LNAI), vol. 3808, pp. 622–633. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  26. Ngai, G., Florian, R.: Transformation-based learning in the fast lane. In: Proceedings of North Americal ACL, pp. 40–47 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

António Teixeira Vera Lúcia Strube de Lima Luís Caldas de Oliveira Paulo Quaresma

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nogueira dos Santos, C., Milidiú, R.L., Rentería, R.P. (2008). Portuguese Part-of-Speech Tagging Using Entropy Guided Transformation Learning. In: Teixeira, A., de Lima, V.L.S., de Oliveira, L.C., Quaresma, P. (eds) Computational Processing of the Portuguese Language. PROPOR 2008. Lecture Notes in Computer Science(), vol 5190. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85980-2_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85980-2_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85979-6

  • Online ISBN: 978-3-540-85980-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics