Studying the Advantages of a Messy Evolutionary Algorithm for Natural Language Tagging

Araujo, Lourdes

doi:10.1007/3-540-45110-2_94

Lourdes Araujo⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2724))

Included in the following conference series:

Genetic and Evolutionary Computation Conference

804 Accesses
5 Citations

Abstract

The process of labeling each word in a sentence with one of its lexical categories (noun, verb, etc) is called tagging and is a key step in parsing and many other language processing and generation applications. Automatic lexical taggers are usually based on statistical methods, such as Hidden Markov Models, which works with information extracted from large tagged available corpora. This information consists of the frequencies of the contexts of the words, that is, of the sequence of their neighbouring tags. Thus, these methods rely on the assumption that the tag of a word only depends on its surrounding tags. This work proposes the use of a Messy Evolutionary Algorithm to investigate the validity of this assumption. This algorithm is an extension of the fast messy genetic algorithms, a variety of Genetic Algorithms that improve the survival of high quality partial solutions or building blocks. Messy GAs do not require all genes to be present in the chromosomes and they may also appear more than one time. This allows us to study the kind of building blocks that arise, thus obtaining information of possible relationships between the tag of a word and other tags corresponding to any position in the sentence. The paper describes the design of a messy evolutionary algorithm for the tagging problem and a number of experiments on the performance of the system and the parameters of the algorithm.

Supported by project PR1/03-11588.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

L. Araujo. A parallel evolutionary algorithm for stochastic natural language parsing. In Proc. of the Int. Conf. Parallel Problem Solving from Nature (PPSNVII), 2002.
Google Scholar
E. Brill. Transformation-based error-driven learning and natural language processing: A case study in part of speech tagging. Computational Linguistics, 21(4), 1995.
Google Scholar
E. Charniak. Statistical Language Learning. MIT press, 1993.
Google Scholar
D. Cutting, J. Kupiec, J. Pedersen, and P. Sibun. A practical part-of-speech tagger. In Proc. of the Third Conf. on Applied Natural Language Processing. Association for Computational Linguistics, 1992.
Google Scholar
D.E. Goldberg, Korb B., and Deb K. Messy genetic algorithms: motivation, analysis, and first results. Complex Systems, 3:493–530, 1989.
MATH MathSciNet Google Scholar
D.E. Goldberg, Korb B., and Deb K. Messy genetic algorithms revisited: Studies in mixed size and scale. Complex Systems, 4:415–444, 1990.
MATH Google Scholar
D.E. Goldberg, Kargupta H. Deb K., and Harik G. Rapid, accurate optimization of difficult problems using fast messy genetic algorithms. In Proc. of the Fifth International Conference on Genetic Algorithms, pages 56–64. Morgan Kaufmann Publishers, 1993.
Google Scholar
D.E. Goldberg, Deb K., and J. H. Clark. Don’t worry, be messy. In Proc. of the Fourth International Conference in Genetic Algorithms and their Applications, pages 24–30, 1991.
Google Scholar
Georges R. Harik and David E. Goldberg. Learning linkage. In Richard K. Belew and Michael D. Vose, editors, Foundations of Genetic Algorithms 4, pages 247–262. Morgan Kaufmann, San Francisco, CA, 1997.
Google Scholar
H. Kargupta. Search, polynomial complexity, and the fast messy genetic algorithm. Ph.D. thesis, Graduate College of the University of Illinois at Urbana-Champaign, 1996.
Google Scholar
B. Merialdo. Tagging english text with a probabilistic model. Computational Linguistics, 20(2):155–172, 1994.
Google Scholar
Francis W. Nelson and Henry Kucera. Manual of information to accompany a standard corpus of present-day edited american english, for use with digital computers. Technical report, Department of Linguistics, Brown University., 1979.
Google Scholar
H. Schutze and Y. Singer. Part od speech tagging using a variable memory markov model. In Proc. of the 1994 of the Association for Computational Linguistics. Association for Computational Linguistics, 1994.
Google Scholar

Download references

Author information

Authors and Affiliations

Dpto. Sistemas Informáticos y Programación, Universidad Complutense de Madrid, Madrid
Lourdes Araujo

Authors

Lourdes Araujo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Applied Scientific Computing (CASC), Lawrence Livermore National Laboratory, 7000 East Avenue, L-561, Livermore, CA, 94550, USA
Erick Cantú-Paz

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Araujo, L. (2003). Studying the Advantages of a Messy Evolutionary Algorithm for Natural Language Tagging. In: Cantú-Paz, E., et al. Genetic and Evolutionary Computation — GECCO 2003. GECCO 2003. Lecture Notes in Computer Science, vol 2724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45110-2_94

Download citation

DOI: https://doi.org/10.1007/3-540-45110-2_94
Published: 18 June 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40603-7
Online ISBN: 978-3-540-45110-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics