Abstract
The rapid advances of genome-scale sequencing have brought out the necessity of developing new data processing techniques for enormous genomic data. Microarrays, for example, can generate such a large number of gene expression data that we usually analyze them with some clustering algorithms. However, the clustering algorithms have been ineffective for visualization in that they are not concerned about the order of genes in each cluster. In this paper, a hybrid genetic algorithm for finding the optimal order of microarray data, or gene expression profiles, is proposed. We formulate our problem as a new type of traveling salesman problem and apply a hybrid genetic algorithm to the problem. To use the 2D natural crossover, we apply the Sammon’s mapping to the microarray data. Experimental results showed that our algorithm found improved gene orders for visualizing the gene expression profiles.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
A. A. Alizadeh, M. B. Eisen, and et al. Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature, 403(6769):503–511, 2000.
Z. Bar-Joseph, D. K. Gifford, and T. S. Jaakkola. Fast optimal leaf ordering for hierarchical clustering. Bioinformatics, 17:22–29, 2001.
A. Ben-Dor, R. Shamir, and Z. Yakhini. Clustering gene expresssion patterns. Journal of Computational Biology, 6:281–297, 1999.
J. L. Bentley. Experiments on traveling salesman problem. In 1st Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’ 90), pages 129–133, 1990.
T. Biedl, B. Brejova, and et al. Optimal arrangement of leaves in the tree representing hierarchical clustering of gene expression data. Technical Report Technical Report CS-2001-14, Dept. of Computer Science, University of Waterloo, 2001.
T. N. Bui and B. R. Moon. Graph partitioning and genetic algorithms. IEEE Transactions on Computers, 45:841–855, 1996.
H. David. First (?) occurrence of common terms in mathematical statistics. The American Statistician, 49:121–133, 1995.
W. Dzwinel. How to make Sammon mapping useful for multidimensional data structures analysis. Pattern Recognition, 27(7):949–959, 1994.
A. Edwards and L. Cavalli-sforza. A method for cluster analysis. Biometrics, 21:362–375, 1965.
M. B. Eisen, P. T. Spellman, P. O. Brown, and D. Botstein. Cluster analysis and display of genome-wide expression patterns. In Proceedings of the National Academy of Sciences, pages 14863–14867, 1998.
A. M. Fraser. Reconstructing attractors from scalar time series: a comparison of singular system and redundancy criteria. Physica D, 34:391–404, 1989.
M. L. Fredman, D. S. Johnson, L. A. McGeoch, and G. Ostheimer. Data structures for traveling salesman. In 4th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA’ 93), pages 145–154, 1993.
D. E. Goldberg. Genetic algorithms in search, optimization and machine learning. Addison-Wesley, Reading, MA, 1989.
D. E. Goldberg, K. Deb, and B. Korb. Do not worry, be messy. In Proceedings of the Fourth International Conference on Genetic Algorithms, pages 24–30, 1991.
R. Hamming. Error detecting and error correcting codes. Bell systems Technical Journal, 29(2):147–160, 1950.
J. Harris. The arithmetic of the product moment of calculating the coefficient of correlation. American Nature, 44:693–699, 1910.
J. Herrero, A. Valencia, and J. Dopazo. A hierarchical unsupervised growing neural network for clustering gene expression patterns. Bioinformatics, 17:126–136, 2001.
J. Holland. Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor, 1975.
D. S. Johnson. Local optimization and the traveling salesman problem. In 17th Colloquium on Automata, Languages, and Programming, pages 446–461, 1990.
S. Jung and B. R. Moon. The natural crossover for the 2D Euclidean TSP. In Genetic and Evolutionary Computation Conference, pages 1003–1010, 2000.
S. Jung and B. R. Moon. Toward minimal restriction of genetic encoding and crossovers for the 2D Euclidean TSP. IEEE Transactions on Evolutionary Computation, 6(6):557–565, 2002.
S. Kawasaki, C. Borchert, and et al. Gene expression profiles during the initial phase of salt stress in rice. Plant Cell, 13(4):889–906, 2001.
M. Kendall. A new measure of rank correlation. Biomerika, 30:81–93, 1938.
A. B. Khodursky, B. J. Peter, and et al. DNA microarray analysis of gene expression in reponse to physiological and genetic changes that affect tryptophan metabolism in escherichia coli. In Proceedings of the National Academy of Sciences, pages 12170–12175, 2000.
W. Li. Mutual information functions versus correlation functions. Journal of Statistical Physics, 60:823–837, 1990.
S. Lin and B. Kernighan. An effective heuristic algorithm for the traveling salesman problem. Operations Research, 21(4598):498–516, 1973.
O. Martin, S. Otto, and E. Felten. Large-step Markov chains for the traveling salesman problem. Complex Systems, 5:299–236, 1991.
P. Merz and A. Zell. Clustering gene expression profiles with memetic algorithms. In Proceedings of the 7th International Conference on Parallel Problem Solving from Nature, pages 811–820, 2002.
P. Moscato. On evolution, search, optimization, genetic algorithms and martial arts: Towards memetic algorithms. Technical Report Technical Report C3P Report 826, Concurrent Computation Program, California Institute of Technology, 1989.
Y. Nagata and S. Kobayashi. Edge assembly crossover: A high-power genetic algorithm for the traveling saleman problem. In 7th International Conference on Genetic Algorithms, pages 450–457, 1997.
E. Pekalska, D. De Ridder, R. P. W. Duin, and M. A. Kraaijveld. A new method of generalizing Sammon mapping with application to algorithm speed-up. In Fifth Annual Conference of the Advanced School for Computing and Imaging, pages 221–228, 1999.
J. M. Renders and H. Bersini. Hybridizing genetic algorithms with hill-climbing methods for global optimization: Two possible ways. In Proceedings of the First IEEE Conference on Evolutionary Computation, pages 312–317, 1994.
D. De Ridder and R. P. W. Duin. Sammon’s mapping using neural networks: a comparision. Pattern Recognition Letters, 18(11–13):1307–1316, 1997.
J. W. Sammon, Jr. A non-linear mapping for data structure analysis. IEEE Transactions on Computers, 18:401–409, 1969.
R. Schaffer, J. Landgraf, and et al. Microarray analysis of diurnal and circadian-regulated genes in arabidopsis. Plant Cell, 13(1):113–123, 2001.
M. Schena, D. Shalon, R. W. Davis, and P. O. Brown. Quantitative monitoring of gene expresssion patterns with a complementary DNA microarray. Science, 270(5235):467–470, 1995.
D. Shalon, S. J. Smith, and P. O. Brown. A DNA microarray system for analyzing complex DNA samples using two-color fluorescent probe hybridization. Genome Research, 6(7):639–645, 1996.
R. R. Sokal and C. D. Michener. A statistical method for evaluating systematic relationships. University of Kansas Science Bulletin, 38:1409–1438, 1958.
C. Spearman. The proof and measurement of association between two things. American Journal of Psychology, 15:72–101, 1904.
T. S. Spellman, G. Sherlock, and et al. Comprehensive identification of cell cycle-regulated genes of the yeast saccharomyces cerevisia by microarray hybridization. Molecular Biology of the Cell, 9:3273–3297, 1998.
A. Sturn. Cluster analysis for large scale gene expression studies. Master’s thesis, Graz University of Technology, Graz, Austria, 2001.
P. Tamayo, D. Slonim, and et al. Interpreting patterns of gene expresssion with self-organizing maps: Methods and application to hematopoietic differentiation. In Proceedings of the National Academy of Sciences, pages 2907–2912, 1999.
S. Tavazoie, J. D. Hughes, and et al. Systematic determination of genetic net work architecture. Nature Genetics, 22:281–285, 1999.
P. Toronen, M. Kolehmainen, G. Wong, and E. Castren. Analysis of gene expression data using self-organizing maps. FEBS Letters, 451:142–146, 1999.
H. K. Tsai, J. M. Yang, and C. Y. Kao. A genetic algorithm for traveling salesman problems. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2001), pages 687–693, 2001.
H. K. Tsai, J. M. Yang, and C. Y. Kao. Applying genetic algorithms to finding the optimal order in displaying the microarray data. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2002), pages 610–617, 2002.
H. K. Tsai, J. M. Yang, and C. Y. Kao. Solving traveling salesman problems by combining global and local search mechanisms. In Proceedings of the Congress on Evolutionary Computation (CEC 2002), pages 1290–1295, 2002.
J. Ward. Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58:236–244, 1963.
D. Whitley, V. Gordon, and K. Mathias. Larmarckian evolution, the baldwin effect and function optimization. In International Conference on Evolutionary Computation, Oct. 1994. Lecture Notes in Computer Science, 866:6–15, Springer-Verlag.
D. Whitley and J. Kauth. GENITOR: A different genetic algorithm. In Proceedings of Rocky Mountain Conference on Artificial Intelligence, pages 118–130, 1988.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lee, SK., Kim, YH., Moon, BR. (2003). Finding the Optimal Gene Order in Displaying Microarray Data. In: Cantú-Paz, E., et al. Genetic and Evolutionary Computation — GECCO 2003. GECCO 2003. Lecture Notes in Computer Science, vol 2724. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45110-2_116
Download citation
DOI: https://doi.org/10.1007/3-540-45110-2_116
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40603-7
Online ISBN: 978-3-540-45110-5
eBook Packages: Springer Book Archive