Abstract
Recent advances in the measurement of gene expression have allowed large data sets to become available for different types of analyses. In these data sets, the number of variables exceeds the number of observations by at least one order of magnitude. Substantial variable reduction is usually necessary before learning algorithms can be utilized in practice. Commonly used greedy variable selection strategies preclude the discovery of potentially important variable combinations if the variables in the combination are not sufficiently informative in isolation. Given the high dimensionality, artifacts are frequent and the use of evaluation techniques to prevent model overfitting need to be employed. In this article, we describe the factors that make the analysis of high-throughput gene expression data especially challenging, and indicate why properly evaluated stochastic algorithms can play a particularly important role in this process.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Adams, M.D., Kelley, J.M., Gocayne, J.D., Dubnick, M., Polymeropoulos, M.H., Xiao, H., et al.: Complementary DNA sequencing: expressed sequence tags and human genome project. Science 252(5013), 1651–1656 (1991)
Lockhart, D.J., Dong, H., Byrne, M.C., Follettie, M.T., Gallo, M.V., Chee, M.S., et al.: Expression monitoring by hybridization to high-density oligonucleotide arrays. Nat. Biotechnol. 14(13), 1675–1680 (1996)
Velculescu, V.E., Zhang, L., Vogelstein, B., Kinzler, K.: Serial analysis of gene expression. Science 270(5235), 484–487 (1995)
Brenner, S., Johnson, M., Bridgham, J., Golda, G., Lloyd, D.H., Johnson, D., et al.: Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays. Nat. Biotechnol. 18(6), 630–634 (2000)
Pinkel, D., Segraves, R., Sudar, D., Clark, S., Poole, I., Kowbel, D., et al.: High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nat. Genet. 20(2), 207–211 (1998)
Tolonen, A.C., Albeanu, D.F., Corbett, J.F., Handley, H., Henson, C., Malik, P.: Optimized in situ construction of oligomers on an array surface. Nucleic Acids Res. 30(20), e107 (2002)
Quackenbush, J.: Computational analysis of microarray data. Nat. Rev. Genet. 2(6), 418–427 (2001)
Kuo, W.P., Jenssen, T.K., Butte, A.J., Ohno-Machado, L., Kohane, I.: Analysis of matched mRNA measurements from two different microarray technologies. Bioinformatics 18(3), 405–412 (2002)
Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D.: Cluster analysis and display of genomewide expression patterns. Proc. Natl. Acad. Sci. U.S.A. 95(25), 14863–14868 (1998)
Frederiksen, C.M., Knudsen, S., Laurberg, S.: TF OR. Classification of Dukes’ B and C colorectal cancers using expression arrays. J. Cancer Res. Clin. Oncol. 129(5), 263–271 (2003)
Weber, G., Vinterbo, S., Ohno-Machado, L.: Building an asynchronous web-based tool for machine learning classification. In: Proc. AMIA Symp. 2002, pp. 869–873 (2002)
Stephanopoulos, G., Hwang, D., Schmitt, W.A., Misra, J.: Mapping physiological states from microarray expression measurements. Bioinformatics 18(8), 1054–1063 (2002)
Khan, J., Wei, J.S., Ringner, M., Saal, L.H., Ladanyi, M., Westermann, F., et al.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nat. Med. 7(6), 673–679 (2001)
Zhang, H., Yu, C.Y., Singer, B.: Cell and tumor classification using gene expression data: construction of forests. Proc. Natl. Acad. Sci. U.S. A 100(7), 4168–4172 (2003)
Lee, Y., Lee, C.: Classification of multiple cancer types by multicategory support vector machines using gene expression data. Bioinformatics 19(9), 1132–1139 (2003)
Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Wigle, D.A., Jurisica, I., Radulovich, N., Pintilie, M., Rossant, J., Liu, N., et al.: Molecular profiling of non-small cell lung cancer and correlation with disease-free survival. Cancer Res. 62(11), 3005–3008 (2002)
Alon, U., Barkai, N., Notterman, D.A., Gish, K., Ybarra, S., Mack, D., et al.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. U.S. A 96(12), 6745–6750 (1999)
Dhanasekaran, S.M., Barrette, T.R., Ghosh, D., Shah, R., Varambally, S., Kurachi, K., et al.: Delineation of prognostic biomarkers in prostate cancer. Nature 412(6849), 822–826 (2001)
Kuo, W.P., Hasina, R., Ohno-Machado, L., Lingen, M.W.: Classification and identification of genes associated with oral cancer based on gene expression profiles. A preliminary study. N.Y. State Dent J. 69(2), 23–26 (2003)
Perou, C.M., Sorlie, T., Eisen, M.B., van de Rijn, M., Jeffrey, S.S., Rees, C.A., et al.: Molecular portraits of human breast tumours. Nature 406(6797), 747–752 (2000)
Tibshirani, R., Hastie, T., Narasimhan, B., Chu, G.: Diagnosis of multiple cancer types by shrunken centroids of gene expression. Proc. Natl. Acad. Sci. U.S. A 99(10), 6567–6572 (2002)
Dettling, M., Buhlmann, P.: Supervised clustering of genes. Genome Biol. 3(12) (2002), RESEARCH0069
Elowitz, M.B., Levine, A.J., Siggia, E.D., Swain, P.S.: Stochastic gene expression in a single cell. Science 297(5584), 1183–1186 (2002)
Swain, P.S., Elowitz, M.B., Siggia, E.: Intrinsic and extrinsic contributions to stochasticity in gene expression. Proc. Natl. Acad. Sci. U.S.A 99(20), 12795–12800 (2002)
Ozbudak, E.M., Thattai, M., Kurtser, I., Grossman, A.D., van Oudenaarden, A.: Regulation of noise in the expression of a single gene. Nat. Genet. 31(1), 69–73 (2002)
Kastner, J., Solomon, J., Fraser, S.: Modeling a hox gene network in silico using a stochastic simulation algorithm. Dev. Biol. 246(1), 122–131 (2002)
Blake, W.J., Kaern, M., Cantor, C.R., Collins, J.J.: in eukaryotic gene expression. Nature 422(6932), 633–637 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ohno-Machado, L., Kuo, W.P. (2003). Stochastic Algorithms for Gene Expression Analysis. In: Albrecht, A., Steinhöfel, K. (eds) Stochastic Algorithms: Foundations and Applications. SAGA 2003. Lecture Notes in Computer Science, vol 2827. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39816-5_4
Download citation
DOI: https://doi.org/10.1007/978-3-540-39816-5_4
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20103-8
Online ISBN: 978-3-540-39816-5
eBook Packages: Springer Book Archive