Abstract
Comparative genomics was previously misguided by the naïve dogma that what is true in E. coli is also true in the elephant. With the rejection of such a dogma, comparative genomics has been positioned in proper evolutionary context. Here I numerically illustrate the application of phylogeny-based comparative methods in comparative genomics involving both continuous and discrete characters to solve problems from characterizing functional association of genes to detection of horizontal gene transfer and viral genome recombination, together with a detailed explanation and numerical illustration of statistical significance tests based on the false discovery rate (FDR). FDR methods are essential for multiple comparisons associated with almost any large-scale comparative genomic studies. I discuss the strength and weakness of the methods and provide some guidelines on their proper applications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Altschul, S.F., Gish, W., Miller, W., Myers, E.W., & Lipman, D.J. (1990). Basic local alignment search tool. Journal of Molecular Biology, 215, 403–410.
Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang Z., M., & Lipman, D.J. (1997). Gapped blast and psi-blast: A new generation of protein database search programs. Nucleic Acids Research, 25, 3389–3402.
Argos, P., Rossmann, M.G., Grau, U.M., Zuber, A., Franck, G., & Tratschin, J.D. (1979). Thermal stability and protein structure. Biochemistry (Moscow), 18, 5698–5703.
Aris-Brosou, S., & Xia, X. (2008). Phylogenetic analyses: A toolbox expanding towards Bayesian methods. International Journal of Plant Genomics, 2008, DOI10.1155/2008/683509
Ballester, R., Marchuk, D., Boguski, M., Saulino, A., Letcher, R., & Wigler, M. (1990). The nf1 locus encodes a protein functionally related to mammalian gap and yeast ira proteins. Cell, 63, 851–859.
Barker, D., Meade, A., & Pagel, M. (2007). Constrained models of evolution lead to improved prediction of functional linkage from correlated gain and loss of genes. Bioinformatics, 23, 14–20.
Barker, D., & Pagel, M. (2005). Predicting functional gene links from phylogenetic-statistical analyses of whole genomes. PLoS Computational Biology, 1, e3.
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B, 57, 289–300.
Benjamini, Y., & Yekutieli, D. (2001). The control of the false discovery rate in multiple hypothesis testing under dependency. The Annals of Statistics, 29, 1165–1188.
Bestor, T.H., & Coxon, A. (1993). The pros and cons of dna methylation. Current Biology, 6, 384–386.
Brown C.J., Garner, E.C., Dunker, A.K, & Joyce, P. (2001). The power to detect recombination using the coalescent. Molecular Biology and Evolution, 18, 1421–1424.
Bruen, T.C., Philippe, H., & Bryant, D. (2006). A simple and robust statistical test for detecting the presence of recombination. Genetics, 172, 2665–2681.
Burge, C., & Karlin, S. (1997). Prediction of complete gene structures in human genomic dna. Journal of Molecular Biology, 268, 78–94.
Burge, C.B., & Karlin, S. (1998). Finding the genes in genomic dna. Current Opinion in Structural Biology, 8, 346–354.
Cardon, L.R., Burge, C., Clayton, D.A., Karlin, S. (1994). Pervasive CpG suppression in animal mitochondrial genomes. Proceedings of the National Academy of Sciences, 91, 3799–3803.
Carullo, M., & Xia, X. (2008). An extensive study of mutation and selection on the wobble nucleotide in trna anticodons in fungal mitochondrial genomes. Journal of Molecular Evolution, 66, 484–493.
Chambaud, I., Heilig, R., Ferris, S., Barbe, V., Samson, D., Galisson, F., et al. (2001). The complete genome sequence of the murine respiratory pathogen mycoplasma pulmonis. Nucleic Acids Research, 29, 2145–2153.
Dalgaard, J.Z., & Garrett, R.A., (1993). Archaeal hyperthermophile genes. In M. Kates, D. J. Kushner, & A. T. Matheson (Eds.), The biochemistry of Archaea (Archaebacteria). Amsterdam: Elsevier.
Felsenstein, J. (1981). Evolutionary trees from dna sequences: A maximum likelihood approach. Journal of Molecular Evolution, 17, 368–376.
Felsenstein, J. (1985). Phylogenies and the comparative method. American Natural, 125, 1–15.
Felsenstein, J. (2002). PHYLIP 3.6 (phylogeny inference package). Seattle: Department of Genetics, University of Washington.
Felsenstein, J. (2004). Inferring phylogenies. Sunderland, Massachusetts: Sinauer.
Frederico, L.A., Kunkel, T.A., & Shaw, B.R. (1990). A sensitive genetic assay for the detection of cytosine deamination determination of rate constants and the activation energy. Biochemistry (Moscow), 29, 2532–2537.
Galtier, N., & Lobry, J.R. (1997). Relationships between genomic g+c content, rna secondary structures, and optimal growth temperature in prokaryotes. Journal of Molecular Evolution, 44, 632–636.
Ge, Y., Sealfon, S.C., & Speed, T.P. (2008). Some step-down procedures controlling the false discovery rate under dependence. Statistica Sinica, 18, 881–904.
Gordon, J.L., Byrne, K.P., & Wolfe, K.H. Additions, losses, and rearrangements on the evolutionary route from a reconstructed ancestor to the modern saccharomyces cerevisiae genome. PLoS Genetics, 5(5), e1000,485. DOI10.1371/journal.pgen.1000485
Goto M., Washio T., Tomita M. (2000). Causal analysis of CpG suppression in the Mycoplasma genome. Microbial and Comparative Genomics, 5, 51–58.
Harvey, P.H., & Pagel, M.D. (1991). The comparative method in evolutionary biology. Oxford: Oxford University Press.
Hey, J. (2000). Human mitochondrial dna recombination: can it be true? Trends in Ecology and Evolution, 15, 181–182.
Hurst, L.D., & Merchant, A.R. (2001). High guanine-cytosine content is not an adaptation to high temperature: A comparative analysis amongst prokaryotes. Proceedings of the Royal Society B, 268, 493–497.
Husmeier, D., & Wright, F. (2005). Detectign recombination in DNA sequence alignments. In D. Husmeier, R. Dybowski, & S. Roberts (Eds.), Probabilistic modeling in bioinformatics and medical informatics (p. 504). London: Springer.
Irimia, M., Penny, D., & Roy, S.W. (2007). Coevolution of genomic intron number and splice sites. Trends Genetics, 23, 321.
Jacob, F. (1988). The statue within: an autobiography. New York: Basic Books, Inc.
Jakobsen, I.B., & Easteal, S. (1996). A program for calculating and displaying compatibility matrices as an aid in determining reticulate evolution in molecular sequences. Computer Applications in the Biosciences, 12, 291–295.
Josse, J., Kaiser, A.D., & Kornberg, A. (1961). Enzymatic synthesis of deoxyribonucleic acid vii. frequencies of nearest neighbor base-sequences in deoxyribonucleic acid. The Journal of Biological Chemistry, 236, 864–875.
Karlin, S., & Burge, C. (1995). Dinucleotide relative abundance extremes: A genomic signature. Trends in Genetics, 11, 283–290.
Karlin, S., & Mrazek, J. (1996). What drives codon choices in human genes. The Journal of Biological Chemistry, 262, 459–472.
Kimura, M., & Crow, A.J.F (1964). The number of alleles that can be maintained in a finite population. Genetics, 49, 725–738.
Kishino, H., & Hasegawa, M. (1989). Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from dna sequence data, and the branching order in hominoidea. Journal of Molecular Evolution, 29, 170–179.
Kliman, R.M., & Bernal, C.A. (2005). Unusual usage of agg and ttg codons in humans and their viruses. Gene, 352, 92.
Kraytsberg, Y., Schwartz, M., Brown, T.A., Ebralidse, K., Kunz, W.S., Clayton, D.A., et al. (2004). Recombination of human mitochondrial dna. Science, 304, 981.
Kushiro, A., Shimizu, M., & Tomita, K. I. (1987). Molecular cloning and sequence determination of the tuf gene coding for the elongation factor tu of thermus thermophilus hb8. European Journal of Biochemistry, 170, 93–98.
Lemey, P., & Posada, D. (2009). Introduction to recombination detection. In P. Lemey, M. Salemi, & A. M. Vandamme AM, The phylogenetic handbook (2nd ed.). Cambridge: Cambridge University Press.
Lindahl, T. (1993). Instability and decay of the primary structure of dna. Nature, 362, 709–715.
Lole, K.S., Bollinger, R.C., Paranjape, R.S., Gadkari, D., Kulkarni, S.S., Novak, N.G., et al. (1999). Full-length human immunodeficiency virus type 1 genomes from subtype c-infected seroconverters in india, with evidence of intersubtype recombination. The Journal of Virology, 73, 152–160.
Martins, E.P., & Hansen, T.F. (1997). Phylogenies and the comparative method: A general approach to incorporating phylogenetic information into the analysis of interspecific data. The American Naturalist, 149(4), 646–667.
Mushegian, A.R., & Koonin, E.A. (1996). Minimal gene set for cellular life derived by comparison of complete bacterial genomes. Proceedings of the National Academy of Sciences of the United States of America, 93, 10268–10273.
Muto, A., & Osawa, S. (1987). The guanine and cytocine content of genomic dna and bacterial evolution. Proceedings of the National Academy of Sciences, 84, 166–169.
Nakashima, H., Fukuchi, S., & Nishikawa, K. (2003). Compositional changes in rna, dna and proteins for bacterial adaptation to higher and lower temperatures. The Journal of Biochemistry (Tokyo), 133, 507–513.
Nei, M., & Kumar, S. (2000). Molecular evolution and phylogenetics. New York: Oxford University Press.
Nichols, T., & Hayasaka, S. (2003). Controlling the familywise error rate in functional neuroimaging: A comparative review. Statistical Methods in Medical Research, 12, 419–446.
Nur, I., Szyf, M., Razin, A., Glaser, G., Rottem, S., & Razin, S. (1985). Procaryotic and eucaryotic traits of dna methylation in spiroplasmas (mycoplasmas). The Journal of Bacteriology, 164, 19–24.
Nussinov, R. (1984). Doublet frequencies in evolutionary distinct groups. Nucleic Acids Research, 12, 1749–1463.
Ochman, H., Lawrence, J.G., & Groisman, E.A. (2000). Lateral gene transfer and the nature of bacterial innovation. Nature, 405, 299–304.
Pagel, M. (1994). Detecting correlated evolution on phylogenies: A general method for the comparative analysis of discrete characters. Proceedings of the Royal Society London B: Biological Sciences, 255, 37–45.
Pagel, M. (1997). Inferring evolutionary processes from phylogenies. Zoologica Scripta, 26, 331–348.
Pagel, M. (1999). Inferring the historical patterns of biological evolution. Nature, 401, 877–884.
Posada, D. (2002). Evaluation of methods for detecting recombination from dna sequences: Empirical data. Molecular Biology and Evolution, 19, 708–717.
Posada, D., & Crandall, K.A. (2001). Evaluation of methods for detecting recombination from dna sequences: Computer simulations. Proceedings of the National Academy of Sciences of the United States of America, 98, 13757–13762.
Press, W.H., Teukolsky, S.A., Tetterling, W.T., & Flannery, B.P. (1992). Numerical recipes in C the art of scientifi computing (2nd edn.). Cambridge: Cambridge University Press.
Razin, A., & Razin, S. (1980). Methylated bases in mycoplasmal dna. Nucleic Acids Research, 8, 1383–1390.
Rideout, W.M.I., Coetzee, G.A., Olumi, A.F., & Jones, P.A. (1990). 5-methylcytosine as an endogenous mutagen in the human ldl receptor and p53 genes. Science, 249, 1288–1290.
Sachs, G., Weeks, D.L., Melchers, K., & Scott, D.R. (2003). The gastric biology of helicobacter pylori. Annual Review of Physiology, 65, 349–369.
Saenger, W. (1984). Principles of nucleic acid structure. New York: Springer.
Salemi, M., Gray, R.R., & Goodenow, M.M. (2008). An exploratory algorithm to identify intrahost recombinant viral sequences. Molecular Phylogenetics and Evolution, 49, 618.
Salemi, M., & Vandamme, A.-M. (eds.) (2003). The Phylogenetic Handbook: A Practical Approach to DNA and Protein Phylogeny. Cambridge University Press.
Salminen, M.O., Carr, J.K., Burke, D.S., & McCutchan, F.E. (1995). Identification of breakpoints in intergenotypic recombinants of hiv type 1 by bootscanning. AIDS Research and Human Retroviruses, 11, 1423–1425.
Salminen, M., & Martin, D. (2009). Detecting and characterizing individual recombination events. In P. Lemey, M. Salemi, A. M. Vandamme (Eds.), The phylogenetic handbook (2nd ed.). Cambridge: Cambridge University Press.
Sankoff, D. (2009). Reconstructing the history of yeast genomes. PLoS Genetics, 5, e1000,483.
Sankoff, D., & El-Mabrouk, N. (2002). Genome rearrangement. In T. Jiang, Y. Xu, & M. Q. Zhang (Eds.), Current topics in computational molecular biology. Cambridge: MIT.
Schluter, D., Price, T.D., Mooers, A.Ø., & Ludwig, D. (1997). Likelihood of ancestor states in adaptive radiation. Evolution, 51, 1699–1711.
Shimodaira, H., & Hasegawa, M. (1999). Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Molecular Biology and Evolution, 16, 1114–1116.
Singer, C.E., & Ames, B.N. (1970). Sunlight ultraviolet and bacterial dna base ratios. Science, 170, 822–826.
Stoebel, D.M. (2005). Lack of evidence for horizontal transfer of the lac operon into escherichia coli. Molecular Biology and Evolution, 22, 683–690.
Suchard, M.A., Weiss, R.E., Dorman, K.S., & Sinsheimer, J.S. (2002). Oh brother, where art thou? a bayes factor test for recombination with uncertain heritage. The Systems Biology, 51, 715–728.
Sueoka, N. (1964). On the evolution of informational macromolecules. New York: Academic.
Sved, J., & Bird, A. (1990). The expected equilibrium of the cpg dinucleotide in vertebrate genomes under a mutation model. Proceedings of the National Academy of Sciences of the United States of America, 87, 4692–4696.
Vinci, G., Xia, X., & Veitia, R.A. (2008). Preservation of genes involved in sterol metabolism in cholesterol auxotrophs: Facts and hypotheses. PLoS ONE, 3, e2883.
Wang, H.C., & Hickey, D.A. (2002). Evidence for strong selective constraint acting on the nucleotide composition of 16s ribosomal rna genes. Nucleic Acids Research, 30, 2501–2507.
Wang, H.C., Xia, X., & Hickey, D.A. (2006). Thermal adaptation of ribosomal rna genes: A comparative study. Journal of Molecular Evolution, 63, 120–126.
Wiuf, C., Christensen, T., & Hein, J. (2001). A simulation study of the reliability of recombination detection methods. Journal of Molecular Evolution, 18, 1929–1939.
Xia, X. (1998). How optimized is the translational machinery in escherichia coli, salmonella typhimurium and saccharomyces cerevisiae? Genetics, 149, 37–44.
Xia, X. (1998). The rate heterogeneity of nonsynonymous substitutions in mammalian mitochondrial genes. Journal of Molecular Evolution, 15, 336–344.
Xia, X. (2001). Data analysis in molecular biology and evolution. Boston: Kluwer Academic Publishers.
Xia, X. (2003). Dna methylation and mycoplasma genomes. Journal of Molecular Evolution, 57, S21–S28.
Xia, X. (2005). Mutation and selection on the anticodon of trna genes in vertebrate mitochondrial genomes. Gene, 345, 13–20.
Xia, X. (2007). Molecular phylogenetics: Mathematical framework and unsolved problems. In U. Bastolla, M. Porto, H. E. Roman, & M. Vendruscolo (Eds.), Structural approaches to sequence evolution (pp. 171–191).
Xia, X. (2008). The cost of wobble translation in fungal mitochondrial genomes: Integration of two traditional hypotheses. BMC Evolutionary Biology, 8, 211.
Xia, X. (2009). Information-theoretic indices and an approximate significance test for testing the molecular clock hypothesis with genetic distances. Molecular Phylogenetics and Evolution, 52, 665–676.
Xia, X., Huang, H., Carullo, M., Betran, E., & Moriyama, E.N. (2007). Conflict between translation initiation and elongation in vertebrate mitochondrial genomes. PLoS ONE, 2, e227.
Xia, X., & Li, W.H. (1998). What amino acid properties affect protein evolution? Journal of Molecular Evolution, 47, 557–564.
Xia, X., & Palidwor, G. (2005). Genomic adaptation to acidic environment: Evidence from helicobacter pylori. The American Naturalist, 166, 776–784.
Xia, X., Wang, H.C., Xie, Z., Carullo, M., Huang, H., & Hickey, D.A. (2006). Cytosine usage modulates the correlation between cds length and cg content in prokaryotic genomes. Molecular Biology and Evolution, 23, 1450–1454.
Xia, X.H, Wei, T., Xie, Z., & Antoine, D. (2002). Genomic changes in nucleotide and dinucleotide frequencies in pasteurella multocida cultured under high temperature. Genetics, 161, 1385–1394.
Xia, X., & Xie, Z. (2001). Dambe: Software package for data analysis in molecular biology and evolution. Journal of Heredity, 92, 371–373.
Xia, X., & Yuen, K.Y. (2005). Differential selection and mutation between dsdna and ssdna phages shape the evolution of their genomic at percentage. BMC Genetics, 6, 20.
Zhang, D., Xiong, H., Shan, J., Xia, X., & Trudeau, V. (2008). Functional insight into maelstrom in the germline pirna pathway: A unique domain homologous to the dnaq-h 3-5 exonuclease, its lineage-specific expansion/loss and evolutionarily active site switch. Biology Directorate, 3, 48.
Acknowledgements
I thank J. Felsenstein and M. Pagel for identifying ambiguities and errors in the manuscript and for their many suggestions to improve the manuscript. S. Aris-Brosou, Y. B. Fu and G. Palidwor, as well as two anonymous reviewers, provided comments and references. I am supported by the Strategic Research, Discovery and Research Tools and Instrument Grants of Natural Science and Engineering Research Council of Canada.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Xia, X. (2011). Comparative Genomics. In: Lu, HS., Schölkopf, B., Zhao, H. (eds) Handbook of Statistical Bioinformatics. Springer Handbooks of Computational Statistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16345-6_26
Download citation
DOI: https://doi.org/10.1007/978-3-642-16345-6_26
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16344-9
Online ISBN: 978-3-642-16345-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)