Learning language using genetic algorithms

Smith, Tony C.; Witten, Ian H.

doi:10.1007/3-540-60925-3_43

Tony C. Smith¹ &
Ian H. Witten¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 1040))

Included in the following conference series:

International Joint Conference on Artificial Intelligence

208 Accesses
9 Citations

Abstract

Strict pattern-based methods of grammar induction are often frustrated by the apparently inexhaustible variety of novel word combinations in large corpora. Statistical methods offer a possible solution by allowing frequent well-formed expressions to overwhelm the infrequent ungrammatical ones. They also have the desirable property of being able to construct robust grammars from positive instances alone. Unfortunately, the “zero-frequency” problem entails assigning a small probability to all possible word patterns, thus ungrammatical n-grams become as probable as unseen grammatical ones. Further, such grammars are unable to take advantage of inherent lexical properties that should allow infrequent words to inherit the syntactic properties of the class to which they belong.

This paper describes a genetic algorithm (GA) that adapts a population of hypothesis grammars towards a more effective model of language structure. The GA is statistically sensitive in that the utility of frequent patterns is reinforced by the persistence of efficient substructures. It also supports the view of language learning as a “bootstrapping problem,” a learning domain where it appears necessary to simultaneously discover a set of categories and a set of rules defined over them. Results from a number of tests indicate that the GA is a robust, fault-tolerant method for inferring grammars from positive examples of natural language.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bickel, A. S., and Bickel, R. W. 1987. Tree structured rules in genetic algorithms. In Davis, L., ed., Genetic Algorithms and Simulated Annealing. Pittman.
Google Scholar
Charniak, E. 1993. Statistical language learning. Massachusetts: MIT Press.
Google Scholar
Devitt, M., and Sterelny, K. 1987. Language and Reality: An Introduction to the Philosophy of Language. Oxford: Blackwell.
Google Scholar
Finch, S., and Chater, N. 1992. Bootstrapping syntactic categories using statistical methods. In Daelemans, W. & Powers, D., ed., Background and Experiments in Machine Learning of Natural Language, 229–236. Tilburg, NL: ITK.
Google Scholar
Gold, E. M. 1967. Language identification in the limit. Information Control 10:447–474.
Google Scholar
Koza, J. R. 1992. Genetic Programming. MIT Press.
Google Scholar
Kupiec, J. M. 1989. Augmenting a hidden markov model for phrase-dependent word tagging. In Proceedings of the 1989 DARPA Speech and Natural Language Workshop, 92–98. Philadelphia: Morgan Kaufmann.
Google Scholar
Mauldin, M. L. 1984. Maintaining diversity in genetic search. In Proceedings of the National Conference on AI, 247–250. AAAI.
Google Scholar
Moisl, H. 1992. Connectionist finite state natural language processing. Connection Science 4(2):67–91.
Google Scholar
Nevill-Manning, C. G.; Witten, I. H.; and Maulsby, D. L. 1994. Compression by induction of hierarchical grammars. In Storer, J. A., and Cohen, M., eds., Proceedings of the Data Compression Conference, 244–253. Los Alamitos, California: IEEE Press.
Google Scholar
Rabiner, L., and Juang, B. H. 1993. Fundamentals of speech recognition. Prentice Hall.
Google Scholar
Rager, J., and Berg, G. 1990. A connectionist model of motion and government in chomsky's government-binding theory. Connection Science 2(1 & 2):35–52.
Google Scholar
Smith, T. C., and Witten, I. H. 1995. Probability-driven lexical classification: A corpus-based approach. In Proceedings of PACLING-95.
Google Scholar
Stich, S. 1980. Grammar, psychology, and indeterminacy. In Block, N., ed., Readings in Philosophy of Psychology, Volume 2. London: Methuen and Co. 208–222. reprint.
Google Scholar
Wijkman, P. A. I. 1994. A framework for evolutionary computation. In The Third Turkish Symposium on Artificial Intelligence and Neural Networks.
Google Scholar
Wyard, P. 1991. Context free grammar induction using genetic algorithms. In Proceedings of the 4th International Conference on Genetic Algorithms, 514–518.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Waikato, Hamilton, New Zealand
Tony C. Smith & Ian H. Witten

Authors

Tony C. Smith
View author publications
You can also search for this author in PubMed Google Scholar
Ian H. Witten
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Stefan Wermter Ellen Riloff Gabriele Scheler

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Smith, T.C., Witten, I.H. (1996). Learning language using genetic algorithms. In: Wermter, S., Riloff, E., Scheler, G. (eds) Connectionist, Statistical and Symbolic Approaches to Learning for Natural Language Processing. IJCAI 1995. Lecture Notes in Computer Science, vol 1040. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-60925-3_43

Download citation

DOI: https://doi.org/10.1007/3-540-60925-3_43
Published: 07 June 2005
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-60925-4
Online ISBN: 978-3-540-49738-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics