Abstract
To emphasize gene interactions in the classification algorithms, a new representation is proposed, comprising gene-pairs and not single genes. Each pair is represented by L 1 difference in the corresponding expression values. The novel representation is evaluated on benchmark datasets and is shown to often increase classification accuracy for genetic datasets. Exploiting the gene-pair representation and the Gene Ontology (GO), the semantic similarity of gene pairs can be incorporated to pre-select pairs with a high similarity value. The GO-based feature selection approach is compared to the plain data driven selection and is shown to often increase classification accuracy.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alizadeh, A., Eisen, M., Davis, R., et al.: Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403(3), 503–511 (2000)
Alon, U., Barkai, N., Notterman, D.A., et al.: Broad patterns of gene expression revealed by clustering of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA 96(12), 6745–6750 (1999)
Ashburner, M., Ball, C., Blake, J., et al.: Gene ontology: tool for the unification of biology. Nature Genetics 25(1), 25–29 (2000)
Baechler, E., Batliwalla, F., Karypis, G., et al.: Interferon-inducible gene expression signature in peripheral blood cells of patients with severe lupus. Proc. Natl. Acad. Sci. 100(5) (2003)
Bar-Hillel, A.: Learning from weak representations using distance functions and generative models. PhD thesis, The Hebrew University of Jerusalem (2006)
Blake, C., Merz, C.J.: UCI repository of machine learning databases (1998), http://archive.ics.uci.edu/ml/
Chen, Z., Tang, J.: Using gene ontology to enhance effectiveness of similarity measures for microarray data. In: IEEE Inter. Conf. on Bioinformatics and Biomedicine, pp. 66–71 (2008)
Golub, T.R., Slonim, D.K., Tamayo, P., et al.: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)
Gordon, G., Jensen, R., Hsiao, L., et al.: Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Research 62(17), 4963–4967 (2002)
Guyon, I., Gunn, S., Nikravesh, M., Zadeh, L.A.: Feature Extraction, Foundations and Applications. Springer, Heidelberg (2006)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: An update. SIGKDD Explorations 11(1) (2009)
Hertz, T.: Learning Distance Functions: Algorithms and Applications. PhD thesis, The Hebrew University of Jerusalem (2006)
International Human Genome Sequencing Consortium: Finishing the euchromatic sequence of the human genome. Nature 431(7011), 931–945 (2004)
Ionasec, R.I., Tsymbal, A., Vitanovski, D., Georgescu, B., Zhou, S.K., Navab, N., Comaniciu, D.: Shape-based diagnosis of the aortic valve. In: Proc. SPIE Medical Imaging (2009)
Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. In: Int. Conf. Research on Computational Linguistics (1997)
Kustra, R., Zagdanski, A.: Incorporating Gene Ontology in clustering gene expression data. In: Proc.19th IEEE Symposium on Computer-Based Medical Systems, CBMS 2006 (2006)
Larranaga, P., Calvo, B., Santana, R., et al.: Machine learning in bioinformatics. Brief Bioinform. 7(1), 86–112 (2006)
Lin, D.: An information-theoretic definition of similarity. Morgan Kaufmann, San Francisco (1998)
Pesquita, C., Faria, D., Bastos, H., Falcão, A.O., Couto, F.M.: Evaluating GO-based semantic similarity measures. In: Proc. 10th Annual Bio-Ontologies Meeting (2007)
Pomeroy, S., Tamayo, P., Gaasenbeek, M., et al.: Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415, 436–442 (2002)
Qi, J., Tang, J.: Integrating gene ontology into discriminative powers of genes for feature selection in microarray data. In: Proc. ACM Symposium on Applied Computing (2007)
Quackenbush, J.: Computational analysis of microarray data. Nature Reviews Genetics 2(6), 418–427 (2001)
Resnik, P.: Using information content to evaluate semantic similarity in a taxonomy. In: Proc. 14th Int. Joint Conf. on Artificial Intelligence (1995)
Saeys, Y., Inza, I., Larranaga, P.: A review of feature selection techniques in bioinformatics. Bioinformatics 23(19), 2507–2517 (2007)
Sevilla, J., Segura, V., Podhorski, A., et al.: Correlation between gene expression and GO semantic similarity. IEEE/ACM Trans. Comput. Biol. Bioinformatics 2(4), 330–338 (2005)
Tsymbal, A., Huber, M., Zhou, K.: Neighbourhood graph and learning discriminative distance functions for clinical decision support. In: Proc. IEEE Eng. Med. Biol. Soc. Conf. (2009)
van ’t Veer, L.J., Dai, H., van de Vijver, M.J., et al.: Gene expression profiling predicts clinical outcome of breast cancer. Nature 415, 530–536 (2002)
Wang, H., Azuaje, F.: An ontology-driven clustering method for supporting gene expression analysis. In: Proc. 18th IEEE Symposium on Computer-Based Medical Systems, CBMS 2005, pp. 389–394. IEEE Computer Society, Los Alamitos (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Schön, T., Tsymbal, A., Huber, M. (2010). Gene-Pair Representation and Incorporation of GO-based Semantic Similarity into Classification of Gene Expression Data. In: Szczuka, M., Kryszkiewicz, M., Ramanna, S., Jensen, R., Hu, Q. (eds) Rough Sets and Current Trends in Computing. RSCTC 2010. Lecture Notes in Computer Science(), vol 6086. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13529-3_24
Download citation
DOI: https://doi.org/10.1007/978-3-642-13529-3_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13528-6
Online ISBN: 978-3-642-13529-3
eBook Packages: Computer ScienceComputer Science (R0)