Abstract
This paper proposes the multi-objective genetic algorithm (MOGA) for document clustering. The studied, hierarchical agglomerative algorithms,k-means algorithm and general genetic algorithm (GA) are more progressing in document clustering. However, in hierarchical agglomerative algorithms, efficiency is a problem (O(n 2logn)), k-means algorithm depends on too much the initial centroids, and general GA can converge to the local optimal value when defining an objective function which is not suitable. In this paper, two of MOGA’s algorithms, NSGA-II and SPEA2 are applied to document clustering in order to complete these disadvantages. We compare to NSGA-II, SPEA2 and the existing clustering algorithms (k-means, general GA). Our experimental results show the average values of NSGA-II and SPEA2 are about 28% higher the clustering performance than the k-means algorithm and about 17% higher the clustering performance than the general GA.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Croft, W.B., Metzler, D., Strohman, T.: Search Engines Information Retrieval in Practice. Addison Wesley (2009)
Frigui, H., Krishnapuram, R.: A Robust Competitive Clustering Algorithm with Applications in Computer Vision. Pattern Analysis and Machine Intelligence 21(4), 450–465 (1999)
Pantel, P., Lin, D.: Document Clustering with Committees. In: 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Finland (2002)
Maulik, U., Bandyopadhyay, S.: Genetic Algorithm-based Clustering Technique. Pattern Recognition 33(9), 1455–1465 (2000)
Srinivas, M., Patnaik, L.M.: Adaptive Probabilities of Crossover and Nutation in Genetic Algorithms. IEEE Trans. Syst. Man Cybern. 24(4), 656–667 (1994)
Song, W., Park, S.C.: Genetic Algorithm for Text Clustering based on Latent Semantic Indexing. Computers and Mathematics with Applications 57, 1901–1907 (2009)
Cha, S.M., Kwon, K.H.: A new Migration Method of the Multipopulation Genetic Algorithms. The Korea Institute of Information Scientists and Engineers (2001)
Halkidi, M., Batistakis, Y., Vazirgiannis, M.: On Clustering Validation Techniques. Intelligent Information Systems (2001)
Osyczka, A.: Multicriteria Optimization for Engineering Design. Design Optimization, 193–227 (1985)
Coello Coello, C.A.: Evolutionary multi-objective optimization: a historical view of the field. IEEE Computational Intelligence Magazine, 28–36 (2006)
Choi, L.C., Choi, K.U., Park, S.C.: An Automatic Semantic Term-Network Construction System. In: International Symposium on Computer Science and its Applications (2008)
Salton, G., Buckley, C.: Term-Weighting Approaches in Automatic Text Retrieval. Information Processing & Management (1988)
Calinski, T., Harabasz, J.: A Dendrite Method for Cluster Analysis. Communications in Statistics (1974)
Davies, D.L., Bouldin, D.W.: A Cluster Separation Measure. IEEE transactions on Pattern analysis and Machine Intelligence (1979)
Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A Fast Elitist Multiobjective Genetic Algorithm: NSGA- II. IEEE Transaction on Evolutionary Computation 6(2), 182–197 (2002)
Zitzer, E., Laumanns, M., Thiele, L.: SPEA2: Improving the Strength Pareto Evolutionary Algorithm for Multiobjective Optimization. In: Proceedings of the EROGEN Conference, pp. 182–197 (2001)
Fragoudis, D., Meretakis, D., Likothanassis, S.: Best Terms: an Efficient Feature-Selection Algorithm for Text Categorization. Knowl. Inform. Syst. (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lee, J.S., Choi, L.C., Park, S.C. (2011). Multi-Objective Genetic Algorithms, NSGA-II and SPEA2, for Document Clustering. In: Kim, Th., et al. Software Engineering, Business Continuity, and Education. ASEA 2011. Communications in Computer and Information Science, vol 257. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27207-3_22
Download citation
DOI: https://doi.org/10.1007/978-3-642-27207-3_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27206-6
Online ISBN: 978-3-642-27207-3
eBook Packages: Computer ScienceComputer Science (R0)