Multi-Objective Genetic Algorithms, NSGA-II and SPEA2, for Document Clustering

Lee, Jung Song; Choi, Lim Cheon; Park, Soon Cheol

doi:10.1007/978-3-642-27207-3_22

Multi-Objective Genetic Algorithms, NSGA-II and SPEA2, for Document Clustering

Jung Song Lee⁸,
Lim Cheon Choi⁸ &
Soon Cheol Park⁸

Conference paper

1847 Accesses
8 Citations

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 257))

Abstract

This paper proposes the multi-objective genetic algorithm (MOGA) for document clustering. The studied, hierarchical agglomerative algorithms,k-means algorithm and general genetic algorithm (GA) are more progressing in document clustering. However, in hierarchical agglomerative algorithms, efficiency is a problem (O(n ²logn)), k-means algorithm depends on too much the initial centroids, and general GA can converge to the local optimal value when defining an objective function which is not suitable. In this paper, two of MOGA’s algorithms, NSGA-II and SPEA2 are applied to document clustering in order to complete these disadvantages. We compare to NSGA-II, SPEA2 and the existing clustering algorithms (k-means, general GA). Our experimental results show the average values of NSGA-II and SPEA2 are about 28% higher the clustering performance than the k-means algorithm and about 17% higher the clustering performance than the general GA.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Croft, W.B., Metzler, D., Strohman, T.: Search Engines Information Retrieval in Practice. Addison Wesley (2009)
Google Scholar
Frigui, H., Krishnapuram, R.: A Robust Competitive Clustering Algorithm with Applications in Computer Vision. Pattern Analysis and Machine Intelligence 21(4), 450–465 (1999)
Article Google Scholar
Pantel, P., Lin, D.: Document Clustering with Committees. In: 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Finland (2002)
Google Scholar
Maulik, U., Bandyopadhyay, S.: Genetic Algorithm-based Clustering Technique. Pattern Recognition 33(9), 1455–1465 (2000)
Article Google Scholar
Srinivas, M., Patnaik, L.M.: Adaptive Probabilities of Crossover and Nutation in Genetic Algorithms. IEEE Trans. Syst. Man Cybern. 24(4), 656–667 (1994)
Article Google Scholar
Song, W., Park, S.C.: Genetic Algorithm for Text Clustering based on Latent Semantic Indexing. Computers and Mathematics with Applications 57, 1901–1907 (2009)
Article MATH Google Scholar
Cha, S.M., Kwon, K.H.: A new Migration Method of the Multipopulation Genetic Algorithms. The Korea Institute of Information Scientists and Engineers (2001)
Google Scholar
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On Clustering Validation Techniques. Intelligent Information Systems (2001)
Google Scholar
Osyczka, A.: Multicriteria Optimization for Engineering Design. Design Optimization, 193–227 (1985)
Google Scholar
Coello Coello, C.A.: Evolutionary multi-objective optimization: a historical view of the field. IEEE Computational Intelligence Magazine, 28–36 (2006)
Google Scholar
Choi, L.C., Choi, K.U., Park, S.C.: An Automatic Semantic Term-Network Construction System. In: International Symposium on Computer Science and its Applications (2008)
Google Scholar
Salton, G., Buckley, C.: Term-Weighting Approaches in Automatic Text Retrieval. Information Processing & Management (1988)
Google Scholar
Calinski, T., Harabasz, J.: A Dendrite Method for Cluster Analysis. Communications in Statistics (1974)
Google Scholar
Davies, D.L., Bouldin, D.W.: A Cluster Separation Measure. IEEE transactions on Pattern analysis and Machine Intelligence (1979)
Google Scholar
Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A Fast Elitist Multiobjective Genetic Algorithm: NSGA- II. IEEE Transaction on Evolutionary Computation 6(2), 182–197 (2002)
Article Google Scholar
Zitzer, E., Laumanns, M., Thiele, L.: SPEA2: Improving the Strength Pareto Evolutionary Algorithm for Multiobjective Optimization. In: Proceedings of the EROGEN Conference, pp. 182–197 (2001)
Google Scholar
Fragoudis, D., Meretakis, D., Likothanassis, S.: Best Terms: an Efficient Feature-Selection Algorithm for Text Categorization. Knowl. Inform. Syst. (2005)
Google Scholar

Download references

Author information

Authors and Affiliations

Division of Electronics and Information Engineering, Jeonbuk National University, Jeonju, Jeonbuk, Republic of Korea
Jung Song Lee, Lim Cheon Choi & Soon Cheol Park

Authors

Jung Song Lee
View author publications
You can also search for this author in PubMed Google Scholar
Lim Cheon Choi
View author publications
You can also search for this author in PubMed Google Scholar
Soon Cheol Park
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Multimedia Engineering Department, Hannam University, 133 Ojeong-dong, Daeduk-gu, Daejeon, Korea
Tai-hoon Kim
The Ohio State University, 470 Hitchcock Hall, 2070 Neil Avenue, 43210-1275, Columbus, OH, USA
Hojjat Adeli
Catholic University of Daegu, South Korea
Haeng-kon Kim
Dept. of Computer Engineering, Mokwon University, 800, Doan-dong, Seo-gu, 302-729, Daejeon, Korea
Heau-jo Kang
Woosuk University, 565-701, Jeollabuk-do, Korea
Kyung Jung Kim
University of Michigan-Dearborn, Dearborn, MI, USA
Akingbehin Kiumi
School of Computing and Information Systems, University of Tasmania, Hobart, TAS, Australia
Byeong-Ho Kang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lee, J.S., Choi, L.C., Park, S.C. (2011). Multi-Objective Genetic Algorithms, NSGA-II and SPEA2, for Document Clustering. In: Kim, Th., et al. Software Engineering, Business Continuity, and Education. ASEA 2011. Communications in Computer and Information Science, vol 257. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27207-3_22

Download citation

DOI: https://doi.org/10.1007/978-3-642-27207-3_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27206-6
Online ISBN: 978-3-642-27207-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics