Abstract
Gene clustering is a common methodology for analyzing similar data based on expression trajectories. Clustering algorithms in general need the number of clusters as a priori, and this is mostly hard to estimate, even by domain experts. In this paper, we use Niched Pareto k-means Genetic Algorithm (GA) for clustering m-RNA data. After running the multi-objective GA, we get the pareto-optimal front that gives alternatives for the optimal number of clusters as a solution set. We analyze the clustering results under two cluster validity techniques commonly cited in the literature, namely DB index and SD index. This gives an idea about ranking the optimal numbers of clusters for each validity index. We tested the proposed clustering approach by conducting experiments using three data sets, namely figure2data, cancer (NCI60) and Leukaemia data. The obtained results are promising; they demonstrate the applicability and effectiveness of the proposed approach.
Keywords
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Barash, Y., Friedman, N.: Context-specific Bayesian clustering for gene expression data. In: Proc. of RECOMB, pp. 12–21 (2001)
Ben-Dor, Shamir, R., Yakhini, Z.: Clustering gene expression patterns. Journal of Computatonal Biology (1999)
Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Transactions on Pattern Recognition and Machine Intelligence 1, 224–227 (1979)
Deb, K., et al.: A Fast Elitist Non-Dominated Sorting Genetic Algorithm for Multi-Objective Optimization: NSGA-II. In: Deb, K., Rudolph, G., Lutton, E., Merelo, J.J., Schoenauer, M., Schwefel, H.-P., Yao, X. (eds.) PPSN 2000. LNCS, vol. 1917, Springer, Heidelberg (2000)
Dunn, J.: Well separated clusters and optimal fuzzy partitions. Journal of Cybernetics 4, 95–104 (1974)
Grabmeier, J., et al.: Techniques of Cluster Algorithms in Data Mining. In: Data Mining and Knowledge Discovery, vol. 6, pp. 303–360. Kluwer Academic Publishers, Dordrecht (2003)
Gene Expression Data of the Genomic Resources, University of Stanford (Downloaded in May 2004), Available, http://genome-www.stanford.edu/serum/data.html
Golub, T.R., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Halkidi, M., Vazirgiannis, M., Batistakis, I.: Quality scheme assessment in the clustering process. In: Proceedings of PKDD, Lyon, France (2000)
Halkidi, M., Vazirgiannis, M.: Clustering Validity Assessment: Finding the optimal partitioning of a data set. In: Proceedings of IEEE ICDM, California (November 2001)
Hartigan, J.A.: Clustering Algorithms. John Wiley and Sons, New York (1975)
Horn, J., Nafpliotis, N., Goldberg, D.E.: A niched pareto genetic algorithm for multiobjective optimization. In: Proceedings of IEEE CEC, IEEE World Congress on Computational Computation, Piscataway, NJ, vol. 1, pp. 82–87 (1994)
Hubert, L., Schultz, J.: Quadratic assignment as a general data-analysis strategy. British Journal of Mathematical and Statistical Psychologies 29, 190–241 (1976)
Iyer, V.R., et al.: The transcriptional program in the response of human fibroblasts to serum. Science 283(5398), 83–87 (1999)
Jain, K., et al.: Data Clustering: A Review. ACM Surveys 31(3) (1999)
Kohonen, T.: Self-organizing Maps. Springer, Heidelberg (1997)
Liu, Y., Özyer, T., Alhajj, R., Barker, K.: Validity Analysis of Clustering Obtained Using Multi-Objective Genetic Algorithm. In: Proc. of IEEE ISDA (2004)
Lu, Y., et al.: FGKA: A Fast Genetic K-means Clustering Algorithm. In: Proc. of ACM Symposium on Applied Computing, Nicosia, Cyprus, pp. 162–163 (2004)
MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: University of California Press (ed.) Proc. of Berkeley Symposium on Math Stat Probability, Cam LML, Neyman J, pp. 281–297 (1965)
Morgan, B.J.T.,, A.P.: Non-uniqueness and inversions in cluster analysis. Applied Statisics 44, 114–134
Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of Comp App. Math 20, 53–65 (1987)
Scherf, U., et al.: A Gene Expression Database for the Molecular Pharmacology of Cancer. Nat Genet 24, 236–244 (2000)
Shamir, R., Sharan, R.: Algorithmic approaches to clustering gene expression data: Current Topics in Computational Biology. MIT Press, Cambridge (2001)
Tamayo, P., et al.: Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. Proc. of. Nat’l. Acad. Sci. USA 96, 2907–2912 (1999)
Tamura, K., et al.: Necessary and Sufficient Conditions for Local and Global Non-Dominated Solutions in Decision Problems with Multi-objectives. Journal of Optimization Theory and Applications 27, 509–523 (1979)
Theodoridis, S., Koutroumbas, K.: Pattern Recognition. Academic Press, London (1998)
Yeung, K.Y., et al.: Model-based clustering and data transformations for gene expression data. Bioinformatics 17, 977–987 (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Özyer, T., Liu, Y., Alhajj, R., Barker, K. (2004). Multi-objective Genetic Algorithm Based Clustering Approach and Its Application to Gene Expression Data. In: Yakhno, T. (eds) Advances in Information Systems. ADVIS 2004. Lecture Notes in Computer Science, vol 3261. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30198-1_46
Download citation
DOI: https://doi.org/10.1007/978-3-540-30198-1_46
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23478-4
Online ISBN: 978-3-540-30198-1
eBook Packages: Computer ScienceComputer Science (R0)