Abstract
Gene Selection is one class of most used data analysis algorithms on microarray datasets. The goal of gene selection algorithms is to filter out a small set of informative genes that best explains experimental variations. Traditional gene selection algorithms are mostly single-gene based. Some discriminative scores are calculated and sorted for each gene. Top ranked genes are then selected as informative genes for further study. Such algorithms ignore completely correlations between genes, although such correlations is widely known. Genes interact with each other through various pathways and regulative networks. In this paper, we propose to use, instead of ignoring, such correlations for gene selection. Experiments performed on three public available datasets show promising results.
This research is partly supported by National Science Foundation Grants DBI-0234895, IIS-0308001 and National Institutes of Health Grant 1 P20 GM067650-01A1. All opinions, findings, conclusions and recommendations in this paper are those of the authors and do not necessarily reflect the views of the National Science Foundation or the National Institutes of Health.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Alon, U., Barkai, N., Notterman, D., Gish, K., Ybarra, S., Mack, D., Levine, A.: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissue probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. U.S.A. 96(12), 6745–6750 (1999)
Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., Yakhini, Z.: Tissue classification with gene expression profiles 7, 559–583 (2000)
Bø, T., Jonassen, I.: New feature subset selection procedures for classification of expression profiles. Genome Biology 3(4), research0017.1–0017.11 (2002)
Bobashev, G.V., Das, S., Das, A.: Experimental design for gene microarray experiments and differential expression analysis. In: Methods of Microarray Data Analysis II, pp. 23–41 (2001)
Chang, C.-C., Lin, C.-J.: Libsvm: a library for support vector machines
Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. Journal of the American Statistical Association 97(457), 77–87 (2002)
Golub, T.R., et al.: Molecular classifications of cancer: Class discovery and class prediction by gene expression monitoring. Science 286(5439), 531–537 (1999)
Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Machine Learning 46, 389–422 (2002)
Hastie, T., Tibshirani, R., Eisen, M., Alizadeh, A., Levy, R., Staudt, L., Chan, W., Botstein, D., Brown, P.: ’gene shaving’ as a method for identifying distinct sets of genes with similar expression patterns. Genome Biology 1(2) (2000)
Jaeger, J., Sengupta, R., Ruzzo, W.L.: Improved gene selection for classification of microarrays. In: Proc. PSB (2003)
Jain, A.K., Duin, R.P., Mao, J.: Statistical pattern recognition: A review. IEEE Transactions on pattern analysis and machine intelligence 22(1), 4–37 (2000)
Jiang, D., Tang, C., Zhang, A.: Cluster analysis for gene expression data: A survey. IEEE Transactions on Knowledge and Data Engineering 16(11), 1370–1386 (2004)
Khan, J., Wei, J., Ringner, M., Saal, L., Ladanyi, M., Westermann, F., Berthold, F., Schwab, M., Antonescu, C., Peterson, C.: Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks. Nature Medicine 7(6), 673–679 (2001)
Li, T., Zhang, C., Ogihara, M.: A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics 20, 2429–2437 (2004)
Li, W., Grosse, I.: Gene selection criterion for discriminant microarray data analysis based on extreme value distributions. In: Proc. RECOMB (2003)
Lu, Y., Han, J.: Cancer classification using gene expression data. Genome Inform 28, 243–268 (2003)
Mardia, K., Kent, J., Bibby, J.: Multivariate Analysis. Academic Press, London (1979)
Ramaswamy, S., Tamayo, P., Rifkin, R., Mukherjee, S., Yeang, C., Angelo, M., Ladd, C., Reich, M., Latulippe, E., Mesirov, J., Poggio, T., Gerald, W., Loda, M., Lander, E.S., Golub, T.: Multiclass cancer diagnosis using tumor gene expression signatures. PNAS 98(26), 15149–15154 (2001)
Tusher, V.G., Tibshirani, R., Chu, G.: Significance analysis of microarrays applied to the ionizing radiation response. PNAS 98(9), 5116–5121 (2001)
Wang, Y., Makedon, F.S., Ford, J.C., Pearlman, J.: Hykgene: a hybrid approach for selecting marker genes for phenotype classification using microarray gene expression data. Bioinformatics 21(8), 1530–1537 (2005)
Wu, Y., Zhang, A.: Feature selection for classifying high-dimensional numerical data. In: IEEE Conference on Computer Vision and Pattern Recognition 2004, vol. 2, pp. 251–258 (2004)
Xing, E.P., Jordan, M.I., Karp, R.M.: Feature selection for high-dimensional genomic microarray data. In: Proc. 18th International Conf. on Machine Learning, pp. 601–608. Morgan Kaufmann, San Francisco (2001)
Yu, L., Liu, H.: Redundancy based feature selection for microarray data. In: Proc. of SIGKDD (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Xu, X., Zhang, A. (2005). Virtual Gene: Using Correlations Between Genes to Select Informative Genes on Microarray Datasets. In: Priami, C., Zelikovsky, A. (eds) Transactions on Computational Systems Biology II. Lecture Notes in Computer Science(), vol 3680. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11567752_10
Download citation
DOI: https://doi.org/10.1007/11567752_10
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29401-6
Online ISBN: 978-3-540-31661-9
eBook Packages: Computer ScienceComputer Science (R0)