Abstract
DNA microarray allows the monitoring and measurement of the expression levels of thousands of genes simultaneously in an organism. A systematic and computational analysis of this vast amount of data provides understanding and insight into many aspects of biological processes. Recently, there has been a growing interest in classification of patient samples based on these gene expressions. The main challenge here is the overwhelming number of genes relative to the number of available training samples in the data set, and many of these genes are irrelevant for classification and have negative effect on the accuracy of the classifier. The choice of genes affects several aspects of classification: accuracy, required learning time, cost, and number of training samples needed. In this paper, we propose a new Probabilistic Model Building Genetic Algorithm (PMBGA) for the identification of informative genes for molecular classification and present our unbiased experimental results on three bench-mark data sets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alizadeh, A.A., Eisen, M.B., et al.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403, 503–511 (2000)
Alon, U., Barkai, N., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proceedings of National Academy of Science, Cell Biology 96, 6745–6750 (1999)
Cestnik, B.: Estimating probabilities: a crucial task in machine learning. In: Proceedings of the European Conference on Artificial Intelligence, pp. 147–149 (1990)
Deb, K., Reddy, A.R.: Reliable classification of two-class cancer data using evolutionary algorithms. BioSystems 72, 111–129 (2003)
Golub, G.R., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(15), 531–537 (1999)
Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings of the International Joint Conference on Artificial Intelligence (1995)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97(1-2), 273–324 (1997)
Larrañaga, P., Lozano, J.A.: Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation. Kluwer Academic Publishers, Boston (2001)
Liu, J., Iba, H.: Selecting Informative Genes with Parallel Genetic Algorithms in Tissue Classification. Genome Informatics 12, 14–23 (2001)
Liu, J., Iba, H.: Selecting Informative Genes Using a Multiobjective Evolutionary Algorithm. In: Proceedings of the World Congress on Computation Intelligence(WCCI 2002), pp. 297–302 (2002)
Mühlenbein, H., Paaß, G.: From Recombination of Genes to the Estimation of Distribution I. In: Binary parameters. Parallel Problem Solving from Nature-PPSN IV. Lecture Notes in Computer Science (LNCS), vol. 1411, pp. 178–187. Springer, Berlin (1996)
Paul, T.K., Iba, H.: Linear and Combinatorial Optimizations by Estimation of Distribution Algorithms. In: Proceedings of the 9th MPS Symposium on Evolutionary Computation, IPSJ, Japan, pp. 99–106 (2002)
Paul, T.K., Iba, H.: Reinforcement Learning Estimation of Distribution Algorithm. In: Cantú-Paz, E., Foster, J.A., Deb, K., Davis, L., Roy, R., O’Reilly, U.-M., Beyer, H.-G., Kendall, G., Wilson, S.W., Harman, M., Wegener, J., Dasgupta, D., Potter, M.A., Schultz, A., Dowsland, K.A., Jonoska, N., Miller, J., Standish, R.K. (eds.) GECCO 2003. LNCS, vol. 2724, pp. 1259–1270. Springer, Heidelberg (2003)
Paul, T.K., Iba, H.: Optimization in Continuous Domain by Real-coded Estimation of Distribution Algorithm. In: Design and Application of Hybrid Intelligent Systems, pp. 262–271. IOS Press, Amsterdam (2003)
Pelikan, M., Goldberg, D.E., Cantú-paz, E.: Linkage Problem, Distribution Estimation and Bayesian Networks. Evolutionary Computation 8(3), 311–340 (2000)
Pelikan, M., Goldberg, D.E., Lobo, F.G.: A Survey of Optimizations by Building and Using Probabilistic Models. Technical Report, Illigal Report no. 99018, University of Illinois at Urbana-Champaign, USA (1999)
Rowland, J.J.: Generalization and Model Selection in Supervised Learning with Evolutionary Computation. In: Raidl, G.R., Cagnoni, S., Cardalda, J.J.R., Corne, D.W., Gottlieb, J., Guillot, A., Hart, E., Johnson, C.G., Marchiori, E., Meyer, J.-A., Middendorf, M. (eds.) EvoIASP 2003, EvoWorkshops 2003, EvoSTIM 2003, EvoROB/EvoRobot 2003, EvoCOP 2003, EvoBIO 2003, and EvoMUSART 2003. LNCS, vol. 2611, pp. 119–130. Springer, Heidelberg (2003)
Slonim, D.K., Tamayo, P., et al.: Class Prediction and Discovery Using Gene Expression Data. In: Proceedings of the 4th Annual International Conference on Computational Molecular Biology, pp. 263–272 (2000)
Yang, J., Honavar, V.: Feature subset selection using a genetic algorithm. In: Feature extraction, construction and selection, pp. 118–135. Kluwer Academic Publishers, Dordrecht (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Paul, T.K., Iba, H. (2004). Identification of Informative Genes for Molecular Classification Using Probabilistic Model Building Genetic Algorithm. In: Deb, K. (eds) Genetic and Evolutionary Computation – GECCO 2004. GECCO 2004. Lecture Notes in Computer Science, vol 3102. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-24854-5_42
Download citation
DOI: https://doi.org/10.1007/978-3-540-24854-5_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22344-3
Online ISBN: 978-3-540-24854-5
eBook Packages: Springer Book Archive