Abstract
The process of labeling each word in a sentence with one of its lexical categories (noun, verb, etc) is called tagging and is a key step in parsing and many other language processing and generation applications. Automatic lexical taggers are usually based on statistical methods, such as Hidden Markov Models, which works with information extracted from large tagged available corpora. This information consists of the frequencies of the contexts of the words, that is, of the sequence of their neighbouring tags. Thus, these methods rely on the assumption that the tag of a word only depends on its surrounding tags. This work proposes the use of a Messy Evolutionary Algorithm to investigate the validity of this assumption. This algorithm is an extension of the fast messy genetic algorithms, a variety of Genetic Algorithms that improve the survival of high quality partial solutions or building blocks. Messy GAs do not require all genes to be present in the chromosomes and they may also appear more than one time. This allows us to study the kind of building blocks that arise, thus obtaining information of possible relationships between the tag of a word and other tags corresponding to any position in the sentence. The paper describes the design of a messy evolutionary algorithm for the tagging problem and a number of experiments on the performance of the system and the parameters of the algorithm.
Supported by project PR1/03-11588.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
L. Araujo. A parallel evolutionary algorithm for stochastic natural language parsing. In Proc. of the Int. Conf. Parallel Problem Solving from Nature (PPSNVII), 2002.
E. Brill. Transformation-based error-driven learning and natural language processing: A case study in part of speech tagging. Computational Linguistics, 21(4), 1995.
E. Charniak. Statistical Language Learning. MIT press, 1993.
D. Cutting, J. Kupiec, J. Pedersen, and P. Sibun. A practical part-of-speech tagger. In Proc. of the Third Conf. on Applied Natural Language Processing. Association for Computational Linguistics, 1992.
D.E. Goldberg, Korb B., and Deb K. Messy genetic algorithms: motivation, analysis, and first results. Complex Systems, 3:493–530, 1989.
D.E. Goldberg, Korb B., and Deb K. Messy genetic algorithms revisited: Studies in mixed size and scale. Complex Systems, 4:415–444, 1990.
D.E. Goldberg, Kargupta H. Deb K., and Harik G. Rapid, accurate optimization of difficult problems using fast messy genetic algorithms. In Proc. of the Fifth International Conference on Genetic Algorithms, pages 56–64. Morgan Kaufmann Publishers, 1993.
D.E. Goldberg, Deb K., and J. H. Clark. Don’t worry, be messy. In Proc. of the Fourth International Conference in Genetic Algorithms and their Applications, pages 24–30, 1991.
Georges R. Harik and David E. Goldberg. Learning linkage. In Richard K. Belew and Michael D. Vose, editors, Foundations of Genetic Algorithms 4, pages 247–262. Morgan Kaufmann, San Francisco, CA, 1997.
H. Kargupta. Search, polynomial complexity, and the fast messy genetic algorithm. Ph.D. thesis, Graduate College of the University of Illinois at Urbana-Champaign, 1996.
B. Merialdo. Tagging english text with a probabilistic model. Computational Linguistics, 20(2):155–172, 1994.
Francis W. Nelson and Henry Kucera. Manual of information to accompany a standard corpus of present-day edited american english, for use with digital computers. Technical report, Department of Linguistics, Brown University., 1979.
H. Schutze and Y. Singer. Part od speech tagging using a variable memory markov model. In Proc. of the 1994 of the Association for Computational Linguistics. Association for Computational Linguistics, 1994.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Araujo, L. (2003). Studying the Advantages of a Messy Evolutionary Algorithm for Natural Language Tagging. In: Cantú-Paz, E., et al. Genetic and Evolutionary Computation — GECCO 2003. GECCO 2003. Lecture Notes in Computer Science, vol 2724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45110-2_94
Download citation
DOI: https://doi.org/10.1007/3-540-45110-2_94
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40603-7
Online ISBN: 978-3-540-45110-5
eBook Packages: Springer Book Archive