Abstract
We demonstrate a set-level approach to the integration of multiple platform gene expression data for predictive classification and show its utility for boosting classification performance when single- platform samples are rare. We explore three ways of defining gene sets, including a novel way based on the notion of a fully coupled flux related to metabolic pathways. In two tissue classification tasks, we empirically show that the gene set based approach is useful for combining heterogeneous expression data, while surprisingly, in experiments constrained to a single platform, biologically meaningful gene sets acting as sample features are often outperformed by random gene sets with no biological relevance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bild, A., Febbo, P.G.: Application of a priori established gene sets to discover biologically important differential expression in microarray data. PNAS 102(43), 15278–15279 (2005)
Bolstad, B.M., Irizarry, R.A., Astrand, M., Speed, T.P.: A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19(2), 185–193 (2003)
The Gene Ontology Consortium. Gene ontology: tool for the unification of biology. Nature Genetics 25 (2000)
Gentleman, R.C., Carey, V.J., Bates, D.M., et al.: Bioconductor: Open software development for computational biology and bioinformatics. Genome Biology 5, R80 (2004)
Goeman, J., Bühlmann, P.: Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics 23(8), 980–987 (2007)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning. Springer, Heidelberg (2001)
Ho, T.K.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)
Holec, M., Zelezny, F., Klema, J., et al.: Using bio-pathways in relational learning. In: Late Breaking Papers, 18th International Conference on Inductive Logic Programming (ILP 2008) (2008)
Huang, D.W., Sherman, B.T., Lempick, R.A.: Systematic and integrative analysis of large gene lists using david bioinformatics resources. Nature Protocols 4, 44–57 (2009)
Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y., Hattori, M.: The KEGG resource for deciphering the genome. Nucleic Acids Res. 32, 277–280 (2004)
Mootha, V.K., Lindgren, C., Laureta, S., et al.: Pgc-1-alpha-responsive genes involved in oxidative phosphorylation are coorinately down regulated in human diabetes. Nature Genetics 34, 267–273 (2003)
Nicolae, D.L., De la Cruz, O., Wen, W., Ke, B., Song, M.: Invited keynote talk: Set-level analyses for genome-wide association data. In: Măndoiu, I., Sunderraman, R., Zelikovsky, A. (eds.) ISBRA 2008. LNCS (LNBI), vol. 4983, p. 1. Springer, Heidelberg (2008)
Notebaart, R.A., Teusink, B., Siezen, R.J., Papp, B.: Co-regulation of metabolic genes is better explained by flux coupling than by network distance. PLOS Computational Biology 4(1) (2008)
Edgar, R., Domrachev, M., Lash, A.E.: Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30(1), 207–210 (2002)
Rapaport, F., Zinovyev, A., Dutreix, M., Barillot, E., Vert, J.-P.: Classification of microarray data using gene networks. BMC Bioinformatics 8, 35 (2007)
Shaw, A.S., Filbert, E.L.: Scaffold proteins and immune-cell signalling. Nat. Rev. Immunol. 9(1), 47–56 (2009)
Stalteri, M.A., Harrison, A.P.: Interpretation of multiple probe sets mapping to the same gene in affymetrix genechips. BMC Bioinformatics 8, 13 (2007)
Tomfohr, J., Lu, J., Kepler, T.B.: Pathway level analysis of gene expression using singular value decomposition. BMC Bioinformatics 6 (2005)
Weichhart, T., Semann, M.D.: The PI3K/Akt/mTOR pathway in innate immune cells: emerging therapeutic applications. Ann Rheum Dis. suppl. 3, iii:70–74 (2008)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
Sun, Y., Chen, J.: mTOR signaling: PLD takes center stage. Cell Cycle 7(20), 3118–3123 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Holec, M., Železný, F., Kléma, J., Tolar, J. (2009). Integrating Multiple-Platform Expression Data through Gene Set Features. In: Măndoiu, I., Narasimhan, G., Zhang, Y. (eds) Bioinformatics Research and Applications. ISBRA 2009. Lecture Notes in Computer Science(), vol 5542. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01551-9_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-01551-9_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01550-2
Online ISBN: 978-3-642-01551-9
eBook Packages: Computer ScienceComputer Science (R0)