Skip to main content

Using Multiattribute Prediction Suffix Graphs for Spanish Part-of-Speech Tagging

  • Conference paper
  • First Online:
Advances in Intelligent Data Analysis (IDA 2001)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2189))

Included in the following conference series:

  • 1115 Accesses

Abstract

An implementation of a Spanish POS tagger is described in this paper. This implementation combines three basic approaches: a single word tagger based on decision trees;a POS tagger based on a new learning model called the Multiattribute Prediction Suffix Graph; and a feature structure set of tags. Using decision trees for single word tagging allows the tagger to work without a lexicon that enumerates possible tags only. Moreover, it decreases the error rate because there are no unknown words. The feature structure set of tags is advantageous when the available training corpus is small and the tag set large, which can be the case with morphologically rich languages such as Spanish. Finally, the multiattribute prediction suffix graph model training is more efficient than traditional full-order Markov models and achieves better accuracy.

This work has been partially supported by the FACA project, number PB98-0937- C04-01, of the CICYT, Spain. FACA is a part of the FRESCO project

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. E. Brill. Some advances in transformation-based part of speech tagging. In Proceedings of AAAI94, page 6, 1994.

    Google Scholar 

  2. Eugene Charniak, Curtis Hendrickson, Neil Jacobson, and Mike Perkowitz. Equations for part-ofspeech tagging. In Proceedings of the Eleventh National Conference on Artificial Intelligence, pages 784–789, 1993.

    Google Scholar 

  3. K.W. Church. A stochastic parts program and noun phrase parser for unrestricted text. In Proceedings of ICASSP, 1989.

    Google Scholar 

  4. D. Cutting, J. Kupiec, J. Pederson, and P. Sibun. A practical part-of-speech tagger. In Proceedings of the Third Conference on Applied Natural Language Processing. ACL, 1992.

    Google Scholar 

  5. W. Daelemans, J. Zavrel, P. Beck, and S. Gillis. MTB: A memory-based part-ofspeech tagger generator. In Proceedings of 4th Workshop on Very Large Corpora, Copenhagen, Denmark, 1996.

    Google Scholar 

  6. R. Garside, G. Leech, and G. Sampson. The Computational Analysis of English. London and New York: Longman, 1987.

    Google Scholar 

  7. Fred Jelinek. Robust part-of-speech tagging using a hidden Markov model. Technical report, IBM, 1985.

    Google Scholar 

  8. André Kempe. Probabilistic tagging with feature structures. In Coling-94, volume 1, pages 161–165, August 1994.

    Google Scholar 

  9. B. Merialdo. Tagging english text with a probabilistic model. Computational Linguistics, 2(20):155–171, 1994.

    Google Scholar 

  10. L. Márquez and H. Rodríguez. Towards learning a constraint grammar from annotated corpora using decision trees. ESPRIT BRA-7315 Acquilez II, Working Paper, 1995.

    Google Scholar 

  11. J.R. Quinlan. Induction of decision trees. Machine Learning, (1):81–106, 1986.

    Google Scholar 

  12. J. Rissanen. A universal data compression system. IEEE Trans. Inform. Theory, 29(5):656–664, 1983.

    Article  MATH  MathSciNet  Google Scholar 

  13. D. Ron. Automata Learning and its Applications. PhD thesis, MIT, 1996.

    Google Scholar 

  14. Dana Ron, Yoram Singer, and Naftali Tishby. The power of amnesia: Learning probabilistic automata with variable memory length. Machine Learning, (25):117–149, 1996.

    Article  MATH  Google Scholar 

  15. H. Schmid. Part-of-speech tagging with neural networks. In Proceedings of 15th International Conference on Computational Linguistics, Kyoto, Japan, 1994.

    Google Scholar 

  16. J.L. Triviño. SEAM. Sistema experto para anĺisis morfológico. Master’s thesis, Universidad de Málaga, 1995.

    Google Scholar 

  17. J.L. Triviño and R. Morales. A spanish POS tagger with variable memory. In IWPT 2000, Sixth International Workshop on Parsing Technologies, February 2000.

    Google Scholar 

  18. J.L. Triviño and R. Morales. MPSGs (multiattribute prediction suffix graphs). Technical report, Dept. Languages and Computer Sciences. University of Málaga, 2001.

    Google Scholar 

  19. A.J. Viterbi. Error bounds for convolutional codes and an asymptotical optimal decoding algorithm. In Proceedings of IEEE, volume 61, pages 268–278, 1967.

    Google Scholar 

  20. A. Voutilainen and T. Järvinen. Specifying a shallow grammatical representation for parsing purposes. In Proceedings of the 7th meeting of the European Association for Computational Linguistics, pages 210–214, 1995.

    Google Scholar 

  21. M. Weinberger, A. Lempel, and K. Ziv. A sequential algorithm for the universal coding of finite-memory sources. IEEE Trans. Inform. Theory, 38:1002–1014, 1982.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Triviño-Rodriguez, J.L., Morales-Bueno, R. (2001). Using Multiattribute Prediction Suffix Graphs for Spanish Part-of-Speech Tagging. In: Hoffmann, F., Hand, D.J., Adams, N., Fisher, D., Guimaraes, G. (eds) Advances in Intelligent Data Analysis. IDA 2001. Lecture Notes in Computer Science, vol 2189. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44816-0_23

Download citation

  • DOI: https://doi.org/10.1007/3-540-44816-0_23

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42581-6

  • Online ISBN: 978-3-540-44816-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics