Abstract
In the area of bioinformatics, the identification of gene subsets responsible for classifying available samples to two or more classes (for example, classes being ‘malignant’ or ‘benign’) is an important task. The main difficulties in solving the resulting optimization problem are the availability of only a few samples compared to the number of genes in the samples and the exorbitantly large search space of solutions. Although there exist a few applications of evolutionary algorithms (EAs) for this task, we treat the problem as a multi-objective optimization problem of minimizing the gene subset size and simultaneous minimizing the number of misclassified samples. Contrary to the past studies, we have discovered that a small gene subset size (such as four or five) is enough to correctly classify 100% or near 100% samples for three cancer samples (Leukemia, Lymphoma, and Colon). Besides a few variants of NSGA-II, in one implementation NSGA-II is modified to find multi-modal non-dominated solutions discovering as many as 630 different three-gene combinations providing a 100% correct classification to the Leukemia data. In order to perform the identification task with more confidence, we have also introduced a threshold in the prediction strength. All simulation results show consistent gene subset identifications on three disease samples and exhibit the flexibilities and efficacies in using a multi-objective EA for the gene identification task.
Keywords
- Gene Subset
- Acute Myeloblastic Leukemia
- Prediction Strength
- Domination Criterion
- Multiobjective Evolutionary Algorithm
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
U. Alon, N. Barkai, D. A. Notterman, K. Gish, S. Ybarra, D. Mack, and A. J. Levine. Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. In Proceedings of National Academy of Science, Cell Biology, volume 96, pages 6745–6750, 1999.
A. Ben-Dor, L. Bruhn, N. Friedman, I. Nachman, M. Schummer, and Z. Yakhini. Tissue classification with gene expression profiles. Journal of Computational Biology, 7:559–583, 2000.
P. A. Clarke, M. George, D. Cunningham, I. Swift, and P. Workman. Analysis of tumor gene expression following chemotherapeutic treatment of patients with bowl cancer. In Proceedings of Nature Genetics Microarray Meeting — 99, page 39, 1999.
K. Deb. Multi-objective optimization using evolutionary algorithms. Chichester, UK: Wiley, 2001.
K. Deb, S. Agrawal, A. Pratap, and T. Meyarivan. A fast and elitist multi-objective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation, 6(2):182–197, 2002.
A. A. Alizadeh et al. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature, 403:503–511, 2000.
D. Gershon. Microarray technology an array of opportunities. Nature, 416:885–891, 2002.
D. E. Goldberg and J. Richardson. Genetic algorithms with sharing for multimodal function optimization. In Proceedings of the First International Conference on Genetic Algorithms and Their Applications, pages 41–49, 1987.
T. R. Golub, D. K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J. P. Mesirov, H. Coller, M. L. Loh, J. R. Downing, M. A. Caligiuri, C. D. Bloomfield, and E. S. Lander. Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science, 286:531–537, 1999.
R. Kohavi and G. H. John. Wrappers for feature subset selection. Artificial Intelligence Journal, Special Issue on Relevance, 97:234–271, 1997.
J. Liu and H. Iba. Selecting informative genes using a multiobjective evolutionary algorithm. In Proceedings of the World Congress on Computational Intelligence (WCCI-2002), pages 297–302, 2002.
J. Liu, H. Iba, and M. Ishizuka. Selecting informative genes with parallel genetic algorithms in tissue classification. Genome Informatics, 12:14–23, 2001.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Raji Reddy, A., Deb, K. (2003). Identification of Multiple Gene Subsets Using Multi-objective Evolutionary Algorithms. In: Fonseca, C.M., Fleming, P.J., Zitzler, E., Thiele, L., Deb, K. (eds) Evolutionary Multi-Criterion Optimization. EMO 2003. Lecture Notes in Computer Science, vol 2632. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36970-8_44
Download citation
DOI: https://doi.org/10.1007/3-540-36970-8_44
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-01869-8
Online ISBN: 978-3-540-36970-7
eBook Packages: Springer Book Archive