Abstract
This paper describes some experiments that used Genetic Algorithms (GAs) for looking for important word associations (phrases) in unstructured text documents obtained from the Internet in the area of a specialized medicine. GAs can evolve sets of word associations with assigned significance weights from the document categorization point of view (here two classes: relevant and irrelevant documents). The categorization was similarly reliable like the naïve Bayes method using just individual words; in addition, in this case GAs provided phrases consisting of one, two, or three words. The selected phrases were quite meaningful from the human point of view.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Goldberg, D. E. (1989): Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley Pub. Co.
Lewis, D. D. (1998): Naïve (Bayes) at Forty: The Independence Assumption in Information Retrieval. In: Proceedings of the 10 th European Conference on Machine Learning ECML’98. Springer Verlag, Berlin Heidelberg New York, pp. 4–15.
McCallum, A. and Nigam, K. (1998): A Comparison of Event Models for Naïve Bayes Text Classi.cation. In: Proceedings of the AAAI-98 Workshop on Learning for Text Categorization. ICML/AAAI-98, Madison, Wisconsin, July 26–27.
Žižka, J., Bourek, A. (2002): Automated Selection of Interesting Medical Text Documents. In: Proceedings of the Fifth International Conference Text, Speech, and Dialogue TSD-2002. Springer Verlag, Berlin Heidelberg New York, LNAI 2448, pp. 99–106.
Žižka, J., Bourek, A. (2002): Automated Selection of Interesting Medical Text Documents by the TEA Text Analyzer. In: A. Gelbukh (Ed.) Computational Linguistics and Intelligent Text Processing, Lecture Notes in Computer Science, N 2276, Springer-Verlag, Berlin, Heidelberg, New York, 2002, pp. 402–404.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Žižka, J., Šrédl, M., Bourek, A. (2003). Searching for Significant Word Associations in Text Documents Using Genetic Algorithms. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2003. Lecture Notes in Computer Science, vol 2588. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36456-0_64
Download citation
DOI: https://doi.org/10.1007/3-540-36456-0_64
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00532-2
Online ISBN: 978-3-540-36456-6
eBook Packages: Springer Book Archive