Abstract
A series of genome-scale algorithms and high-performance implementations is described and shown to be useful in the genetic analysis of gene transcription. With them it is possible to address common questions such as: “are the sets of genes co-expressed under one type of conditions the same as those sets co-expressed under another?” A new noise-adaptive graph algorithm, dubbed “paraclique,” is introduced and analyzed for use in biological hypotheses testing. A notion of vertex coverage is also devised, based on vertex-disjoint paths within correlation graphs, and used to determine the identity, proportion and number of transcripts connected to individual phenotypes and quantitative trait loci (QTL) regulatory models. A major goal is to identify which, among a set of candidate genes, are the most likely regulators of trait variation. These methods are applied in an effort to identify multiple-QTL regulatory models for large groups of genetically co-expressed genes, and to extrapolate the consequences of this genetic variation on phenotypes observed across levels of biological scale through the evaluation of vertex coverage. This approach is furthermore applied to definitions of homology-based gene sets, and the incorporation of categorical data such as known gene pathways. In all these tasks discrete mathematics and combinatorial algorithms form organizing principles upon which methods and implementations are based.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Abu-Khzam, F.N., Collins, R.L., Fellows, M.R., Langston, M.A., Suters, W.H., Symons, C.T.: Kernelization algorithms for the vertex cover problem: Theory and experiments. In: Proceedings, Workshop on Algorithm Engineering and Experiments, New Orleans, Louisiana (2004)
Abu-Khzam, F.N., Langston, M.A., Shanbhag, P., Symons, C.T.: Scalable parallel algorithms for FPT problems. Algorithmica (accepted for publication, 2006)
Alter, O., Brown, P.O., Botstein, D.: Singular value decomposition for genome-wide expression data processing and modeling. Proceedings of the National Academy of Sciences 97, 10101–10106 (2000)
Baldwin, N.E., Chesler, E.J., Kirov, S., Langston, M.A., Snoddy, J.R., Williams, R.W., Zhang, B.: Computational, integrative, and comparative methods for the elucidation of genetic coexpression networks. Journal of Biomedicine and Biotechnology 2, 172–180 (2005)
Bartoli, M., Ternaux, J.P., Forni, C., Portalier, P., Salin, P., Amalric, M., Monneron, A.: Down-regulation of striatin, a neuronal calmodulin-binding protein, impairs rat locomotor activity. Journal of Neurobiology 40, 234–243 (1999)
Becamel, C., Gavarini, S., Chanrion, B., Alonso, G., Galeotti, N., Dumuis, A., Bockaert, J., Marin, P.: The serotonin 5-ht2a and 5-ht2c receptors interact with specific sets of pdz proteins. Journal of Biological Chemistry 279, 20257–20266 (2004)
Bellaachia, A., Portnoy, D., Chen, Y., Elkahloun, A.G.: E-cast: A data mining algorithm for gene expression data. In: Proceedings, Workshop on Data Mining in Bioinformatics, Edmonton, Alberta, Canada (2002)
Ben-Dor, A., Bruhn, L., Friedman, N., Nachman, I., Schummer, M., Yakhini, Z.: Tissue classification with gene expression profiles. Journal of Computational Biology, 54–64 (2000)
Ben-Dor, A., Shamir, R., Yakhini, Z.: Clustering gene expression patterns. Journal of Computational Biology 6(3/4), 281–297 (1999)
Bomze, I., Budinich, M., Pardalos, P., Pelillo, M.: The maximum clique problem. In: Du, D.Z., Pardalos, P.M. (eds.) Handbook of Combinatorial Optimization, vol. 4, Kluwer Academic Publishers, Dordrecht (1999)
Brem, R.B., Kruglyak, L.: The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proceedings of the National Academy of Sciences 102, 1572–1577 (2005)
Brem, R.B., Yvert, G., Clinton, R., Kruglyak, L.: Genetic dissection of transcriptional regulation in budding yeast. Science 296, 752–755 (2002)
Broman, K.W., Speed, T.P.: A model selection approach for the identification of quantitative trait loci in experimental crosses. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64, 641–656 (2002)
Butte, A.J., Tamayo, P., Slonim, D., Golub, T.R., Kohane, I.S.: Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. Proceedings of the National Academy of Sciences 97, 12182–12186 (2000)
Butz, S., Okamoto, M., Sudhof, T.C.: A tripartite protein complex with the potential to couple synaptic vesicle exocytosis to cell adhesion in brain. Cell 94, 773–782 (1998)
Bystrykh, L., Weersing, E., Dontje, B., Sutton, S., Pletcher, M.T., Wiltshire, T., Su, A., Vellenga, E., Wang, J., Manly, K.F., Lu, L., Chesler, E.J., Alberts, R., Jansen, R.C., Williams, R.W., Cooke, M.P., Haan, G.d.: Uncovering regulatory pathways that affect hematopoietic stem cell function using ’genetical genomics’. Nature Genetics 37, 225–232 (2005)
Chandran, L.S., Grandoni, F.: Refined memorisation for vertex cover. In: Proceedings, International Workshop on Parameterized and Exact Computation (IWPEC) (2004)
Chesler, E.J., Lu, L., Shou, S., Qu, Y., Gu, J., Wang, J., Hsu, H.C., Mountz, J.D., Baldwin, N.E., Langston, M.A., Hogenesch, J.B., Threadgill, D.W., Manly, K.F., Williams, R.W.: Complex trait analysis of gene expression uncovers polygenic and pleiotropic networks that modulate nervous system function. Nature Genetics 37, 233–242 (2005)
Chesler, E.J., Lu, L., Wang, J., Williams, R.W., Manly, K.F.: Webqtl: Rapid exploratory analysis of gene expression and genetic networks for brain and behavior. Nature Neuroscience 7, 486–486 (2004)
Chesler, E.J., Wang, J., Lu, L., Qu, Y., Manly, K.F., Williams, R.W.: Genetic correlates of gene expression in recombinant inbred strains: a relational model system to explore neurobehavioral phenotypes. Neuroinformatics 1, 343–357 (2003)
Chesler, E.J., Williams, R.W.: Brain gene expression: Genomics and genetics. International Review of Neurobiology 60, 59–95 (2004)
Churchill, G.A., Airey, D.C., Allayee, H., Angel, J.M., Attie, A.D., Beatty, J., Beavis, W.D., Belknap, J.K., Bennett, B., Berrettini, W., Bleich, A., Bogue, M., Broman, K.W., Buck, K.J., Buckler, E., Burmeister, M., Chesler, E.J., Cheverud, J.M., Clapcote, S., Cook, M.N., Cox, R.D., Crabbe, J.C., Crusio, W.E., Darvasi, A., Deschepper, C.F., Doerge, R.W., Farber, C.R., Forejt, J., Gaile, D., Garlow, S.J., Geiger, H., Gershenfeld, H., Gordon, T., Gu, J., Gu, W., Haan, G.d., Hayes, N.L., Heller, C., Himmelbauer, H., Hitzemann, R., Hunter, K., Hsu, H.C., Iraqi, F.A., Ivandic, B., Jacob, H.J., Jansen, R.C., Jepsen, K.J., Johnson, D.K., Johnson, T.E., Kempermann, G., Kendziorski, C., Kotb, M., Kooy, R.F., Llamas, B., Lammert, F., Lassalle, J.M., Lowenstein, P.R., Lu, A.L.L., Manly, K.F., Marcucio, R., Matthews, D., Medrano, J.F., Miller, D.R., Mittleman, G., Mock, B.A., Mogil, J.S., Montagutelli, X., Morahan, G., Morris, D.G., Mott, R., Nadeau, J.H., Nagase, H., Nowakowski, R.S., O’Hara, B.F., Osadchuk, A.V., Page, G.P., Paigen, A., Paigen, K., Palmer, A.A., Pan, H.J., Peltonen-Palotie, L., Peirce, J., Pomp, D., Pravenec, M., Prows, D.R., Qi, Z., Reeves, R.H., Roder, J., Rosen, G.D., Schadt, E.E., Schalkwyk, L.C., Seltzer, Z., Shimomura, K., Shou, S., Sillanpaa, M.J., Siracusa, L.D., Snoeck, H.W., Spearow, J.L., Svenson, K., Tarantino, L.M., Threadgill, D., Toth, L.A., Valdar, W., Villena, F.P.d., Warden, C., Whatley, S., Williams, R.W., Wiltshire, T., Yi, N., Zhang, D., Zhang, M., Zou, F.: The collaborative cross, a community resource for the genetic analysis of complex traits. Nature Genetics 36, 1133–1137 (2004)
Doerge, R.W.: Mapping and analysis of quantitative trait loci in experimental populations. Nature Reviews Genetics 3, 43–52 (2002)
Downey, R.G., Fellows, M.R.: Parameterized Complexity. Springer, Heidelberg (1999)
Feige, U., Peleg, D., Kortsarz, G.: The dense k-subgraph problem. Algorithmica 29, 410–421 (2001)
Girolami, M., Breitling, R.: Biologically valid linear factor models of gene expression. Bioinformatics 20, 3021–3033 (2004)
Hansen, P., Jaumard, B.: Cluster analysis and mathematical programming. Mathematical Programming 79(1-3), 191–215 (1997)
Hartuv, E., Schmitt, A., Lange, J., Meier-Ewert, S., Lehrachs, H., Shamir, R.: An algorithm for clustering cDNAs for gene expression analysis. In: Proceedings, RECOMB, Lyon, France (1999)
Heyer, L.J., Kruglyak, S., Yooseph, S.: Exploring expression data: Identification and analysis of coexpressed genes. Genome Research 9, 1106–1115 (1999)
Hubner, N., Wallace, C.A., Zimdahl, H., Petretto, E., Schulz, H., Maciver, F., Mueller, M., Hummel, O., Monti, J., Zidek, V., Musilova, A., Kren, V., Causton, H., Game, L., Born, G., Schmidt, S., Muller, A., Cook, S.A., Kurtz, T.W., Whittaker, J., Pravenec, M., Aitman, T.J.: Integrated transcriptional profiling and linkage analysis for identification of genes underlying disease. Nature Genetics 37, 243–253 (2005)
Langston, M.A., Lan, L., Peng, X., Baldwin, N.E., Symons, C.T., Zhang, B., Snoddy, J.R.: A combinatorial approach to the analysis of differential gene expression data: The use of graph algorithms for disease prediction and screening. In: Shoemaker, J.S., Lin, S.M. (eds.) Methods of Microarray Data Analysis IV, Springer, Heidelberg (2005)
Langston, M.A., Perkins, A.D., Saxton, A.M., Scharff, J.A., Voy, B.H.: Innovative computational methods for transcriptomic data analysis. In: Proceedings, ACM Symposium on Applied Computing, Dijon, France (accepted for publication, 2006)
Li, J., Burmeister, M.: Genetical genomics: Combining genetics with gene expression analysis. Human Molecular Genetics 14, 163–169 (2005)
Manly, K.F., Olson, J.M.: Overview of qtl mapping software and introduction to map manager qt. Mammalian Genome 10, 327–334 (1999)
Peirce, J.L., Lu, L., Gu, J., Silver, L.M., Williams, R.W.: A new set of bxd recombinant inbred lines from advanced intercross populations in mice. BMC Genetics 5, 7 (2004)
Schadt, E.E., Monks, S.A., Drake, T.A., Lusis, A.J., Che, N., Colinayo, V., Ruff, T.G., Milligan, S.B., Lamb, J.R., Cavet, G., Linsley, P.S., Mao, M., Stoughton, R.B., Friend, S.H.: Genetics of gene expression surveyed in maize, mouse and man. Nature 422, 297–302 (2003)
Slonim, D.K.: From patterns to pathways: gene expression data analysis comes of age. Nature 32, 502–508 (2002)
Wagner, A.: Distributed robustness versus redundancy as causes of mutational robustness. Bioessays 27, 176–188 (2005)
Zhang, Y., Abu-Khzam, F.N., Baldwin, N.E., Chesler, E.J., Langston, M.A., Samatova, N.F.: Genome-scale computational approaches to memory-intensive applications in systems biology. In: Proceedings, Supercomputing, Seattle, Washington (2005)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Chesler, E.J., Langston, M.A. (2007). Combinatorial Genetic Regulatory Network Analysis Tools for High Throughput Transcriptomic Data. In: Eskin, E., Ideker, T., Raphael, B., Workman, C. (eds) Systems Biology and Regulatory Genomics. RSB RRG 2005 2005. Lecture Notes in Computer Science(), vol 4023. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-48540-7_13
Download citation
DOI: https://doi.org/10.1007/978-3-540-48540-7_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-48293-2
Online ISBN: 978-3-540-48540-7
eBook Packages: Computer ScienceComputer Science (R0)