Searching for Significant Word Associations in Text Documents Using Genetic Algorithms

Žižka, Jan; Šrédl, Michal; Bourek, Aleš

doi:10.1007/3-540-36456-0_64

Jan Žižka⁵,
Michal Šrédl⁵ &
Aleš Bourek⁶

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2588))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

919 Accesses

Abstract

This paper describes some experiments that used Genetic Algorithms (GAs) for looking for important word associations (phrases) in unstructured text documents obtained from the Internet in the area of a specialized medicine. GAs can evolve sets of word associations with assigned significance weights from the document categorization point of view (here two classes: relevant and irrelevant documents). The categorization was similarly reliable like the naïve Bayes method using just individual words; in addition, in this case GAs provided phrases consisting of one, two, or three words. The selected phrases were quite meaningful from the human point of view.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

An Evolutionary Algorithm-Based Text Categorization Technique

Efficiency of genetic algorithm for subject search queries

Article 25 May 2016

A Knowledge Discovery from Full-Text Document Collections Using Clustering and Interpretable Genetic-Fuzzy Systems

References

Goldberg, D. E. (1989): Genetic Algorithms in Search, Optimization, and Machine Learning. Addison-Wesley Pub. Co.
Google Scholar
Lewis, D. D. (1998): Naïve (Bayes) at Forty: The Independence Assumption in Information Retrieval. In: Proceedings of the 10 ^th European Conference on Machine Learning ECML’98. Springer Verlag, Berlin Heidelberg New York, pp. 4–15.
Google Scholar
McCallum, A. and Nigam, K. (1998): A Comparison of Event Models for Naïve Bayes Text Classi.cation. In: Proceedings of the AAAI-98 Workshop on Learning for Text Categorization. ICML/AAAI-98, Madison, Wisconsin, July 26–27.
Google Scholar
Žižka, J., Bourek, A. (2002): Automated Selection of Interesting Medical Text Documents. In: Proceedings of the Fifth International Conference Text, Speech, and Dialogue TSD-2002. Springer Verlag, Berlin Heidelberg New York, LNAI 2448, pp. 99–106.
Google Scholar
Žižka, J., Bourek, A. (2002): Automated Selection of Interesting Medical Text Documents by the TEA Text Analyzer. In: A. Gelbukh (Ed.) Computational Linguistics and Intelligent Text Processing, Lecture Notes in Computer Science, N 2276, Springer-Verlag, Berlin, Heidelberg, New York, 2002, pp. 402–404.
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Informatics, Department of Information Technologies, Masaryk University, Botanická 68a, 602 00, Brno, Czech Republic
Jan Žižka & Michal Šrédl
Faculty of Medicine, Department of Biophysics, Masaryk University, Joštova 10, 662 43, Brno, Czech Republic
Aleš Bourek

Authors

Jan Žižka
View author publications
You can also search for this author in PubMed Google Scholar
Michal Šrédl
View author publications
You can also search for this author in PubMed Google Scholar
Aleš Bourek
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Centro de Investigación en Computación (CIC), Instituto Politécnico Nacional (IPN), Col. Zacatenco, CP 07738, Mexico D.F., Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Žižka, J., Šrédl, M., Bourek, A. (2003). Searching for Significant Word Associations in Text Documents Using Genetic Algorithms. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2003. Lecture Notes in Computer Science, vol 2588. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36456-0_64

Download citation

DOI: https://doi.org/10.1007/3-540-36456-0_64
Published: 30 April 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-00532-2
Online ISBN: 978-3-540-36456-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics