Abstract
An implementation of a Spanish POS tagger is described in this paper. This implementation combines three basic approaches: a single word tagger based on decision trees;a POS tagger based on a new learning model called the Multiattribute Prediction Suffix Graph; and a feature structure set of tags. Using decision trees for single word tagging allows the tagger to work without a lexicon that enumerates possible tags only. Moreover, it decreases the error rate because there are no unknown words. The feature structure set of tags is advantageous when the available training corpus is small and the tag set large, which can be the case with morphologically rich languages such as Spanish. Finally, the multiattribute prediction suffix graph model training is more efficient than traditional full-order Markov models and achieves better accuracy.
This work has been partially supported by the FACA project, number PB98-0937- C04-01, of the CICYT, Spain. FACA is a part of the FRESCO project
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
E. Brill. Some advances in transformation-based part of speech tagging. In Proceedings of AAAI94, page 6, 1994.
Eugene Charniak, Curtis Hendrickson, Neil Jacobson, and Mike Perkowitz. Equations for part-ofspeech tagging. In Proceedings of the Eleventh National Conference on Artificial Intelligence, pages 784–789, 1993.
K.W. Church. A stochastic parts program and noun phrase parser for unrestricted text. In Proceedings of ICASSP, 1989.
D. Cutting, J. Kupiec, J. Pederson, and P. Sibun. A practical part-of-speech tagger. In Proceedings of the Third Conference on Applied Natural Language Processing. ACL, 1992.
W. Daelemans, J. Zavrel, P. Beck, and S. Gillis. MTB: A memory-based part-ofspeech tagger generator. In Proceedings of 4th Workshop on Very Large Corpora, Copenhagen, Denmark, 1996.
R. Garside, G. Leech, and G. Sampson. The Computational Analysis of English. London and New York: Longman, 1987.
Fred Jelinek. Robust part-of-speech tagging using a hidden Markov model. Technical report, IBM, 1985.
André Kempe. Probabilistic tagging with feature structures. In Coling-94, volume 1, pages 161–165, August 1994.
B. Merialdo. Tagging english text with a probabilistic model. Computational Linguistics, 2(20):155–171, 1994.
L. Márquez and H. Rodríguez. Towards learning a constraint grammar from annotated corpora using decision trees. ESPRIT BRA-7315 Acquilez II, Working Paper, 1995.
J.R. Quinlan. Induction of decision trees. Machine Learning, (1):81–106, 1986.
J. Rissanen. A universal data compression system. IEEE Trans. Inform. Theory, 29(5):656–664, 1983.
D. Ron. Automata Learning and its Applications. PhD thesis, MIT, 1996.
Dana Ron, Yoram Singer, and Naftali Tishby. The power of amnesia: Learning probabilistic automata with variable memory length. Machine Learning, (25):117–149, 1996.
H. Schmid. Part-of-speech tagging with neural networks. In Proceedings of 15th International Conference on Computational Linguistics, Kyoto, Japan, 1994.
J.L. Triviño. SEAM. Sistema experto para anĺisis morfológico. Master’s thesis, Universidad de Málaga, 1995.
J.L. Triviño and R. Morales. A spanish POS tagger with variable memory. In IWPT 2000, Sixth International Workshop on Parsing Technologies, February 2000.
J.L. Triviño and R. Morales. MPSGs (multiattribute prediction suffix graphs). Technical report, Dept. Languages and Computer Sciences. University of Málaga, 2001.
A.J. Viterbi. Error bounds for convolutional codes and an asymptotical optimal decoding algorithm. In Proceedings of IEEE, volume 61, pages 268–278, 1967.
A. Voutilainen and T. Järvinen. Specifying a shallow grammatical representation for parsing purposes. In Proceedings of the 7th meeting of the European Association for Computational Linguistics, pages 210–214, 1995.
M. Weinberger, A. Lempel, and K. Ziv. A sequential algorithm for the universal coding of finite-memory sources. IEEE Trans. Inform. Theory, 38:1002–1014, 1982.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Triviño-Rodriguez, J.L., Morales-Bueno, R. (2001). Using Multiattribute Prediction Suffix Graphs for Spanish Part-of-Speech Tagging. In: Hoffmann, F., Hand, D.J., Adams, N., Fisher, D., Guimaraes, G. (eds) Advances in Intelligent Data Analysis. IDA 2001. Lecture Notes in Computer Science, vol 2189. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44816-0_23
Download citation
DOI: https://doi.org/10.1007/3-540-44816-0_23
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42581-6
Online ISBN: 978-3-540-44816-7
eBook Packages: Springer Book Archive