Abstract
Genetic algorithm for document clustering(GC) shows good performance. However the genetic algorithm has problem of performance degradation by premature convergence phenomenon(PCP). In this paper, we propose double layered genetic algorithm for document clustering(DLGC) to solve this problem. The clustering algorithms including DLGC are tested and compared on Reuter-21578 data collection. The results show that our DLGC has the best performance among traditional clustering algorithms(K-means, Group Average Clustering) and GC in various experiments.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Foster, I., Kesselman, C.: Modern information retrieval. Addison-Wesley (1999)
Manning, C.D., Raghavan, P., Schütze, H.: Introduction to Information Retrieval. Cambridge University Press (2008)
Croft, W.B., Metzler, D., Strohman, T.: Search Engines Information Retrieval in Practice. Addison Wesley (2009)
Maulik, U., Bandyopadhyay, S.: Genetic Algorithm-based Clustering Technique. Pattern Recognition 33(9), 1455–1465 (2000)
Bandyopadhyay, S., Mauilk, U.: Nonparametric genetic clustering: Comparison of validity indices. IEEE Trans. System Man Cybern.-Part C Applications and Reviews 31, 120–125 (2001)
Song, W., Park, S.C.: Genetic Algorithm-Based Text Clustering Technique. In: Jiao, L., Wang, L., Gao, X.-b., Liu, J., Wu, F. (eds.) ICNC 2006, Part I. LNCS, vol. 4221, pp. 779–782. Springer, Heidelberg (2006)
Goldberg, D.E.: The Grid: Genetic Algorithms in Search, Optimization and Machine Learning. Addison Wesley (1989)
David, L.D.: Handbook of Genetic Algorithms. Van Nostrand Reinhold (1991)
Andre, J., Siarry, P., Dognon, T.: An improvement of the standard genetic algorithm fighting premature convergence in continuous optimization. Advances in Engineering Software 32(1), 49–60 (2001)
Yao, X., Liu, Y., Lin, G.: Evolutionary programming made faster. Presented at IEEE Trans. Evolutionary Computation, 82–102 (1999)
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal.l Intell. 1, 224–227 (1979)
Song, W., Park, S.C.: Genetic algorithm for text clustering based on latent semantic indexing. Presented at Computers & Mathematics with Applications, 1901–1907 (2009)
Selim, S.Z., Ismail, M.A.: K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality. IEEE Trans. Pattern Anal. Mach. Intell., 81–87 (1984)
Zhao, Y., Karypis, G., Fayyad, U.M.: Hierarchical Clustering Algorithms for Document Datasets. Data Min. Knowl. Discov. 10(2), 141–167 (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Choi, L.C., Lee, J.S., Park, S.C. (2011). Double Layered Genetic Algorithm for Document Clustering. In: Kim, Th., et al. Software Engineering, Business Continuity, and Education. ASEA 2011. Communications in Computer and Information Science, vol 257. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-27207-3_21
Download citation
DOI: https://doi.org/10.1007/978-3-642-27207-3_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-27206-6
Online ISBN: 978-3-642-27207-3
eBook Packages: Computer ScienceComputer Science (R0)