Abstract
In the paper, real coded multi objective genetic algorithm based K-clustering method has been studied, K represents the number of clusters. In K-clustering algorithm value of K is known. The searching power of Genetic Algorithm (GA) is exploited to search for suitable clusters and centers of clusters so that intra-cluster distance (Homogeneity, H) and inter-cluster distances (Separation, S) are simultaneously optimized. It is achieved by measuring H and S using Mod distance per feature metric, suitable for categorical features (attributes). We have selected 3 benchmark data sets from UCI Machine Learning Repository containing categorical features only.
The paper proposes two versions of MOGA based K-clustering algorithm. In proposed MOGA (H, S), all features are taking part in building chromosomes and calculation of H and S values. In MOGA_Feature_Selection (H, S), selected features take part to build chromosomes, relevant for clusters. Here, K-modes is hybridized with GA. We have used hybridized GA to combine global searching capabilities of GA with local searching capabilities of K-modes. Considering context sensitivity, we have used a special crossover operator called “pairwise crossover” and “substitution”. The main contribution of this paper is simultaneous dimensionality reduction and optimization of objectives using MOGA.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic subspace clustering of high dimensional data for data mining applications. In: Proceeding of ACM International Conference Management of Data, pp. 94–105 (1998)
Ankerst, M., Breunig, M., Kriegel, H.P., Sander, J.: OPTICS: ordering points to identify the clustering structure. In: Proceeding of International Conference Management of Data, pp. 49–60 (1999)
Bandyopadhyay, S., Pal, S.K., Aruna, B.: Multi-objective GAs, quantitative indices and pattern classification. IEEE Transactions on Systems, Man, and Cybernetics - Part B: Cybernetics 34, 2088–2099 (2004)
Bandyopadhyay, S., Maulik, U., Mukhopadhyay, A.: Multiobjective genetic clustering for pixel classification in remote sensing imagery. IEEE Transactions on Geoscience Remote Sensing 45, 1506–1511 (2007)
Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Kluwer Academic Publishers, Norwell (1981)
Bhandari, D., Murthy, C.A., Pal, S.K.: Genetic algorithm with elitist model and its convergence. International Journal of Pattern Recognition and Artificial Intelligence 10, 731–747 (1996)
Cvetkovic, D., Parmee, I.C., Webb, E.: Multi-Objective Optimisation and Preliminary Air-frame Design. In: Parmee, I.C. (ed.) Adaptive Computing in Design and Manufacture: The Integration of Evolutionary and Adaptive Computing Technologies with Product/System Design and Realisation, pp. 255–267. Springer, New York (1998)
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Transactions on Pattern Analysis and Machine Intelligence 1, 224–227 (1979)
Day, W.H., Edelsbrunner, H.: Efficient algorithms for agglomerative hierarchical clustering methods. Journal of Classification 1, 1–24 (1984)
Deb, K.: Multi-Objective Optimization Using Evolutionary Algorithms. John Wiley & Sons, New York (2001)
Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6, 182–197 (2002)
Dempster, A., Laird, N., Rubin, D.: Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society 39, 1–38 (1977)
Deng, S., He, Z., Xu, X.: G-ANMI: A mutual information based genetic clustering algorithm for categorical data. Knowledge-Based Systems 23, 144–149 (2010)
Dhiraj, K., Rath, S.K.: Comparison of SGA and RGA based clustering algorithm for pattern recognition. International Journal of Recent Trends in Engineering 1, 269–273 (2009)
Dogra, S.K.: Confusion Matrix, QSARWorld–A Strand Life Sciences Web Resource, http://www.qsarworld.com/qsar-ml-confusion-matrix.php
Dunham, M.H.: Data Mining: Introductory and Advanced Topics. Prentice-Hall, Eaglewood Cliffs (2002)
Dunn, J.: Well separated clusters and optimal fuzzy partitions. Journal of Cybernetics 4, 95–104 (1974)
Dutta, D., Dutta, P., Sil, J.: Clustering by multi objective genetic algorithm. In: Proceeding of 1st IEEE International Conference on Recent Advances in Information Technology, pp. 548–553 (2012)
Dutta, D., Dutta, P., Sil, J.: Clustering data set with categorical feature using multi objective genetic algorithm. In: Proceeding of IEEE International Conference on Data Science and Engineering, pp. 103–108 (2012)
Dutta, D., Dutta, P., Sil, J.: Data clustering with mixed features by multi objective genetic algorithm. In: Proceeding of 12th IEEE International Conference on Hybrid Intelligent Systems, pp. 336–341 (2012)
Dutta, D., Dutta, P., Sil, J.: Simultaneous feature selection and clustering for categorical features using multi objective genetic algorithm. In: Proceeding of 12th IEEE International Conference on Hybrid Intelligent Systems, pp. 191–196 (2012)
Dutta, D., Dutta, P., Sil, J.: Simultaneous continuous feature selection and K clustering by multi objective genetic algorithm. In: Proceeding of 3rd IEEE International Advance Computing Conference, pp. 937–942 (2013)
Ester, M., Kriegel, H.P., Sander, J.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceeding of 2nd International Conference on Knowledge Discovery and Data Mining, pp. 226–231 (1996)
Faceli, K., de Carvalho, A.C.P.L.F., de Souto, M.C.P.: Multi-objective clustering ensemble. International Journal of Hybrid Intelligent Systems 4, 145–156 (2013)
Falkenauer, E.: Genetic Algorithms and Grouping Problems. John Wiley & Sons, New York (1998)
Fisher, D.H.: Improving inference through conceptual clustering. In: Proceeding of National Conference on Artificial Intelligence, pp. 461–465 (1987)
Forsati, R., Doustdar, H.M., Shamsfard, M., Keikha, A., Meybodi, M.R.: A fuzzy co-clustering approach for hybrid recommender systems. International Journal of Hybrid Intelligent Systems 10, 71–81 (2013)
Fränti, P., Kivijärvi, J., Kaukoranta, T., Nevalainen, O.: Genetic Algorithms for Large-Scale Clustering Problems. The Computer Journal 40, 547–554 (1997)
Freitas, A.A.: Data Mining and Knowledge Discovery with Evolutionary Algorithms. Springer, Berlin (2002)
Fukunaga, K.: Introduction to Statistical Pattern Recognition. Academic Press, New York (1990)
Gan, G., Wu, J., Yang, Z.: A genetic fuzzy K-Modes algorithm for clustering categorical data. Expert Systems with Applications 36, 1615–1620 (2009)
Gennari, J.H., Langley, P., Fisher, D.: Models of incremental concept formation. Artificial Intelligence 40, 11–61 (1989)
Goldberg, D.E.: Genetic Algorithms for Search, Optimization, and Machine Learning, 1st edn. Addison-Wesley Longman, Reading (1989)
Guha, S., Rastogi, R., Shim, K.: CURE, an efficient clustering algorithm for large databases. In: Proceedings of ACM International Conference on Management of Data, pp. 73–84 (1998)
Hall, L.O., Özyurt, I.B., Bezdek, J.C.: Clustering with a genetically optimized approach. IEEE Transactions on Evolutionary Computation 3, 103–112 (1999)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2005)
Handl, J., Knowles, J.D.: Evolutionary multiobjective clustering. In: Yao, X., et al. (eds.) PPSN 2004. LNCS, vol. 3242, pp. 1081–1091. Springer, Heidelberg (2004)
Handl, J., Knowles, J.D.: Exploiting the Trade-off - The Benefits of Multiple Objectives in Data Clustering. In: Coello Coello, C.A., Hernández Aguirre, A., Zitzler, E. (eds.) EMO 2005. LNCS, vol. 3410, pp. 547–560. Springer, Heidelberg (2005)
Handl, J., Knowles, J.: Multi-objective clustering and cluster validation. In: Jin, Y. (ed.) Multi-Objective Clustering and Cluster Validation. SCI, vol. 16, pp. 21–47. Springer, Heidelberg (2006)
Handl, J., Knowles, J.: An evolutionary approach to multiobjective clustering. IEEE Transactions on Evolutionary Computation 11, 56–76 (2007)
Hettich, S., Blake, C., Merz, C.: UCI repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Hilton, G., Sejnowski, T.J. (eds.): Unsupervised learning: foundations of neural computation. MIT Press, Cambridge (1999)
Hinneburg, A., Hinneburg, E., Keim, D.A.: An efficient approach to clustering in large multimedia databases with noise. In: Proceeding of 4th International Conference on Knowledge Discovery and Data Mining, pp. 58–65 (1998)
Horn, J., Nafploitis, N., Goldberg, D.E.: A niched Pareto genetic algorithm for multiobjective optimization. In: Proceeding of IEEE Conference on Evolutionary Computation, pp. 82–87 (1994)
Hruschka, E.R., Ebecken, N.F.F.: A genetic algorithm for cluster analysis. Intelligent Data Analysis 7, 15–25 (2003)
Hruschka, E.R., Campello, R.J.G.B., Freitas, A.A., Carvalho, A.C.P.L.F., de: A Survey of Evolutionary Algorithms for Clustering. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 39, 133–155 (2009)
Huang, Z.: Clustering large data sets with mixed numeric and categorical values. In: Proceeding of 1st Pacific Asia Knowledge Discovery Data Mining Conference, pp. 21–34 (1997)
Huang, Z.: Extensions to the k-means algorithm for clustering large data sets with categorical values. Data Mining Knowledge Discovery 2, 283–304 (1998)
Huang, Z., Ng, M.K.: A fuzzy k-modes algorithm for clustering categorical data. IEEE Transactions on Fuzzy Systems 7, 446–452 (1999)
Hubert, L., Schultz, J.: Quadratic assignment as a general data-analysis strategy. British Journal of Mathematical and Statistical Psychology 29, 190–241 (1976)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data clustering: A review. ACM Computing Surveys 31, 264–323 (1999)
Jutler, H.: Liniejnaja modiel z nieskolmini celevymi funkcjami (linear model with several objective functions). Ekonomika i Matematiceckije Metody 3, 397–406 (1967) (in Polish)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley & Sons, New York (1990)
Kim, Y., Street, W.N., Menczer, F.: Feature selection in unsupervised learning via evolutionary search. In: Proceeding of 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 365–369 (2000)
Kim, Y., Street, W.N., Menczer, F.: Evolutionary model selection in unsupervised learning. Intelligent Data Analysis 6, 531–556 (2002)
Kleinberg, J.: An impossibility theorem for clustering. In: Becker, S., Thrum, S., Obermayer, K. (eds.) Advances in Neural Information Processing Systems, pp. 446–453. MIT Press, Cambridge (2002)
Knowles, J.D., Corne, D.W.: Approximating the nondominated front using the Pareto archived evolution strategy. Evolutionary Algorithm 8, 149–172 (2000)
Kohonen, T.: Self-organized formation of topologically correct feature maps. Biological Cybernetics 43, 59–69 (1982)
Kohonen, T.: Self-Organization and Associative Memory. Springer, New York (1984)
Kohonen, T., Kaski, S., Lagus, K., Solojärvi, J., Paatero, A., Saarela, A.: Self organization of a massive document collection. IEEE Transactions on Neural Networks 11, 574–585 (2000)
Korkmaz, E.E., Du, J., Alhajj, R., Barker, K.: Combining advantages of new chromosome representation scheme and multi-objective genetic algorithms for better clustering. Intelligent Data Analysis 10, 163–182 (2006)
Langley, P.: Elements of Machine Learning. Morgan Kaufmann, San Francisco (1995)
Law, M.H.C., Topchy, A.P., Jain, A.K.: Multiobjective data clustering. In: Proceeding of IEEE Computer Socity Conference on Compututer Vision and Pattern Recognition, pp. 424–430 (2004)
Liu, H., Yu, L.: Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering 17, 1–12 (2005)
Lloyd, S.: Least squares quantization in PCM. IEEE Transactions on Information Theory 28, 129–137 (1982); Original version: Technical Report, Bell Labs (1957)
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceeding of 5th Berkeley Symposium Mathematical Statistics and Probability, pp. 281–297 (1967)
Maulik, U., Bandyopadhyay, S.: Genetic algorithm-based clustering technique. Pattern Recognition 3, 1455–1465 (2000)
Maulik, U., Bandyopadhyay, S., Saha, I.: Integrating clustering and supervised learning for categorical data analysis. IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans 40, 664–675 (2010)
Merz, P., Zell, A.: Clustering gene expression profiles with memetic algorithms. In: Guervós, J.J.M., Adamidis, P.A., Beyer, H.-G., Fernández-Villacañas, J.-L., Schwefel, H.-P. (eds.) PPSN 2002. LNCS, vol. 2439, pp. 811–820. Springer, Heidelberg (2002)
Michalski, R.S., Carbonell, J.G., Mitchell, T.M.: An Overview of Machine Learning. In: Michalski, R.S., Carbonell, J.G., Mitchell, T.M. (eds.) Machine Learning, An Artificial Intelligence Approach, pp. 3–23. Springer, Berlin (1984)
Mierswa, I., Wurst, M.: Information preserving multi-objective feature selection for unsupervised learning. In: Proceeding of 8th ACM Annual Conference on Genetic and Evolutionary Computation, pp. 1545–1552 (2006)
Mierswa, I., Wurst, M.: Sound multi-objective feature space transformation for clustering. In: Proceeding of International Conference on Knowledge Discovery, Data Mining, and Machine Learning, pp. 330–337 (2006)
Mitra, S., Pal, S.K., Mitra, P.: Data mining in soft computing framework: a survey. IEEE Transactions on Neural Networks 13, 3–14 (2002)
Morita, M.E., Sabourin, R., Bortolozzi, F., Suen, C.Y.: Unsupervised feature selection using multi-objective genetic algorithms for handwritten word recognition. In: Proceeding of the 7th International Conference on Document Analysis and Recognition, pp. 666–670 (2003)
Mukhopadhyay, A., Maulik, U., Bandyopadhyay, S.: Multiobjective genetic fuzzy clustering of categorical attributes. In: Proceeding of 10th International Conference on Information Technology, pp. 74–79 (2007)
Mukhopadhyay, A., Maulik, U., Bandyopadhyay, S.: Multiobjective genetic algorithm-based fuzzy clustering of categorical attributes. IEEE Transactions on Evolutionary Computation 13, 991–1005 (2009)
Nuovo, A.G.D., Palesi, M., Catania, V.: Multiobjective evolutionary fuzzy clustering for high-dimensional problems. In: Proceeding of IEEE International Conference of Fuzzy System, pp. 1–6 (2007)
Pan, H., Zhu, J., Han, D.: Genetic algorithms applied to multi-class clustering for gene expression data. Genomics, Proteomics, Bioinformatics 1, 279–287 (2003)
Pareto, V.: Manuale di Economia Politica. Piccola Biblioteca Scientifica, Milan (1906); Translated into English by Schwier, A.S., Page, A.N.: Manual of Political Economy. Kelley Publishers, London (1971)
Parsons, L., Haque, E., Liu, H.: Subspace clustering for high dimensional data, a review. ACM SIGKDD Explorations Newsletter 6, 90–105 (2004)
Ripon, K.S.N., Tsang, C.H., Kwong, S.: Multi-objective data clustering using variable-length real jumping genes genetic algorithm and local search method. In: Proceeding of IEEE International Joint Conference on Neural Networks, pp. 3609–3616 (2006)
Ripon, K.S.N., Tsang, C.H., Kwong, S, Ip, M.: Multi-objective evolutionary clustering using variable-length real jumping genes genetic algorithm. In: Proceeding of IEEE 18th International Conference on Pattern Recognition, pp. 1200–1203 (2006)
Ripon, K.S.N., Siddique, M.N.H.: Evolutionary multi-objective clustering for overlapping clusters detection. In: Proceeding of IEEE 11th International Congress on Evolutionary Computation, pp. 976–982 (2009)
Ritzel, B.J., Eheart, J.W., Ranjithan, S.: Using genetic algorithms to solve a multiple objective groundwater pollution containment problem. Water Resources Research 30, 1589–1603 (1994)
Rumelhart, D.E., Zipser, D.: Feature discovery by competitive learning. Cognitive Science 9, 75–112 (1985)
Samet, H.: Foundations of Multidimensional and Metric Data Structures. Elsevier, Amsterdam (2006)
Schaffer, J.D.: Some Experiments in Machine Learning Using Vector Evaluated Genetic Algorithms, Ph.D. Dissertation, University of Vanderbilt (1984)
Schaffer, J.D.: Multiple objective optimization with vector evaluated genetic algorithms. In: Proceeding of International Conference on Genetic Algorithms and their Applications, pp. 93–100 (1985)
Scheunders, P.: A genetic C-means clustering algorithm applied to color image quantization. Pattern Recognition 30, 859–866 (1997)
Sheikholeslami, G., Chatterjee, S., Zhang, A.: WaveCluster: A multi-resolution clustering approach for very large spatial databases. In: Proceeding of 24th International Conference on Very Large Data Bases, pp. 428–439 (1998)
Srinivas, N., Deb, K.: Multiobjective optimization using non-dominated sorting in genetic algorithms. Evolutionary Computation 2, 221–248 (1994)
Stewart, G.W.: On the early history of the singular value decomposition. SIAM Review 35, 551–566 (1993)
Surry, P.D., Radcliffe, N.J., Boyd, I.D.: A multi-objective approach to constrained optimisation of gas supply networks: The COMOGA method. In: Fogarty, T.C. (ed.) AISB-WS 1995. LNCS, vol. 993, pp. 166–180. Springer, Heidelberg (1995)
Tou, J.T., Gonzalez, R.C.: Pattern Recognition Principles. Addison-Wesley, Reading (1974)
Wang, W., Yang, J., Muntz, R.: STING: A statistical information grid approach to spatial data mining. In: Proceeding of 23rd International Conference on Very Large Data Bases, pp. 186–195 (1997)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 3rd edn. Morgan Kaufmann, San Francisco (2005)
Zhang, T., Ramakrishnan, R., Livny, M.: Birch: an efficient data clustering method for very large databases. In: Proceeding of ACM International Conference on Management of Data, pp. 103–114 (1996)
Zitzler, E.: Evolutionary Algorithms for Multiobjective Optimization: Methods and Applications, Ph.D. Dissertation, Swiss Federal Institute of Technology (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Dutta, D., Dutta, P., Sil, J. (2013). Categorical Feature Reduction Using Multi Objective Genetic Algorithm in Cluster Analysis. In: Gavrilova, M.L., Tan, C.J.K., Abraham, A. (eds) Transactions on Computational Science XXI. Lecture Notes in Computer Science, vol 8160. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45318-2_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-45318-2_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45317-5
Online ISBN: 978-3-642-45318-2
eBook Packages: Computer ScienceComputer Science (R0)