Skip to main content

Comparative Genomics

  • Chapter
  • First Online:
Handbook of Statistical Bioinformatics

Part of the book series: Springer Handbooks of Computational Statistics ((SHCS))

Abstract

Comparative genomics was previously misguided by the naïve dogma that what is true in E. coli is also true in the elephant. With the rejection of such a dogma, comparative genomics has been positioned in proper evolutionary context. Here I numerically illustrate the application of phylogeny-based comparative methods in comparative genomics involving both continuous and discrete characters to solve problems from characterizing functional association of genes to detection of horizontal gene transfer and viral genome recombination, together with a detailed explanation and numerical illustration of statistical significance tests based on the false discovery rate (FDR). FDR methods are essential for multiple comparisons associated with almost any large-scale comparative genomic studies. I discuss the strength and weakness of the methods and provide some guidelines on their proper applications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., & Lipman, D.J. (1990). Basic local alignment search tool. Journal of Molecular Biology, 215, 403–410.

    Google Scholar 

  2. Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang Z., M., & Lipman, D.J. (1997). Gapped blast and psi-blast: A new generation of protein database search programs. Nucleic Acids Research, 25, 3389–3402.

    Google Scholar 

  3. Argos, P., Rossmann, M.G., Grau, U.M., Zuber, A., Franck, G., & Tratschin, J.D. (1979). Thermal stability and protein structure. Biochemistry (Moscow), 18, 5698–5703.

    Article  Google Scholar 

  4. Aris-Brosou, S., & Xia, X. (2008). Phylogenetic analyses: A toolbox expanding towards Bayesian methods. International Journal of Plant Genomics, 2008, DOI10.1155/2008/683509

    Google Scholar 

  5. Ballester, R., Marchuk, D., Boguski, M., Saulino, A., Letcher, R., & Wigler, M. (1990). The nf1 locus encodes a protein functionally related to mammalian gap and yeast ira proteins. Cell, 63, 851–859.

    Article  Google Scholar 

  6. Barker, D., Meade, A., & Pagel, M. (2007). Constrained models of evolution lead to improved prediction of functional linkage from correlated gain and loss of genes. Bioinformatics, 23, 14–20.

    Article  Google Scholar 

  7. Barker, D., & Pagel, M. (2005). Predicting functional gene links from phylogenetic-statistical analyses of whole genomes. PLoS Computational Biology, 1, e3.

    Article  Google Scholar 

  8. Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B, 57, 289–300.

    MathSciNet  MATH  Google Scholar 

  9. Benjamini, Y., & Yekutieli, D. (2001). The control of the false discovery rate in multiple hypothesis testing under dependency. The Annals of Statistics, 29, 1165–1188.

    Article  MathSciNet  MATH  Google Scholar 

  10. Bestor, T.H., & Coxon, A. (1993). The pros and cons of dna methylation. Current Biology, 6, 384–386.

    Article  Google Scholar 

  11. Brown C.J., Garner, E.C., Dunker, A.K, & Joyce, P. (2001). The power to detect recombination using the coalescent. Molecular Biology and Evolution, 18, 1421–1424.

    Article  Google Scholar 

  12. Bruen, T.C., Philippe, H., & Bryant, D. (2006). A simple and robust statistical test for detecting the presence of recombination. Genetics, 172, 2665–2681.

    Article  Google Scholar 

  13. Burge, C., & Karlin, S. (1997). Prediction of complete gene structures in human genomic dna. Journal of Molecular Biology, 268, 78–94.

    Article  Google Scholar 

  14. Burge, C.B., & Karlin, S. (1998). Finding the genes in genomic dna. Current Opinion in Structural Biology, 8, 346–354.

    Article  Google Scholar 

  15. Cardon, L.R., Burge, C., Clayton, D.A., Karlin, S. (1994). Pervasive CpG suppression in animal mitochondrial genomes. Proceedings of the National Academy of Sciences, 91, 3799–3803.

    Article  Google Scholar 

  16. Carullo, M., & Xia, X. (2008). An extensive study of mutation and selection on the wobble nucleotide in trna anticodons in fungal mitochondrial genomes. Journal of Molecular Evolution, 66, 484–493.

    Article  Google Scholar 

  17. Chambaud, I., Heilig, R., Ferris, S., Barbe, V., Samson, D., Galisson, F., et al. (2001). The complete genome sequence of the murine respiratory pathogen mycoplasma pulmonis. Nucleic Acids Research, 29, 2145–2153.

    Article  Google Scholar 

  18. Dalgaard, J.Z., & Garrett, R.A., (1993). Archaeal hyperthermophile genes. In M. Kates, D. J. Kushner, & A. T. Matheson (Eds.), The biochemistry of Archaea (Archaebacteria). Amsterdam: Elsevier.

    Google Scholar 

  19. Felsenstein, J. (1981). Evolutionary trees from dna sequences: A maximum likelihood approach. Journal of Molecular Evolution, 17, 368–376.

    Article  Google Scholar 

  20. Felsenstein, J. (1985). Phylogenies and the comparative method. American Natural, 125, 1–15.

    Article  Google Scholar 

  21. Felsenstein, J. (2002). PHYLIP 3.6 (phylogeny inference package). Seattle: Department of Genetics, University of Washington.

    Google Scholar 

  22. Felsenstein, J. (2004). Inferring phylogenies. Sunderland, Massachusetts: Sinauer.

    Google Scholar 

  23. Frederico, L.A., Kunkel, T.A., & Shaw, B.R. (1990). A sensitive genetic assay for the detection of cytosine deamination determination of rate constants and the activation energy. Biochemistry (Moscow), 29, 2532–2537.

    Article  Google Scholar 

  24. Galtier, N., & Lobry, J.R. (1997). Relationships between genomic g+c content, rna secondary structures, and optimal growth temperature in prokaryotes. Journal of Molecular Evolution, 44, 632–636.

    Article  Google Scholar 

  25. Ge, Y., Sealfon, S.C., & Speed, T.P. (2008). Some step-down procedures controlling the false discovery rate under dependence. Statistica Sinica, 18, 881–904.

    MathSciNet  MATH  Google Scholar 

  26. Gordon, J.L., Byrne, K.P., & Wolfe, K.H. Additions, losses, and rearrangements on the evolutionary route from a reconstructed ancestor to the modern saccharomyces cerevisiae genome. PLoS Genetics, 5(5), e1000,485. DOI10.1371/journal.pgen.1000485

    Google Scholar 

  27. Goto M., Washio T., Tomita M. (2000). Causal analysis of CpG suppression in the Mycoplasma genome. Microbial and Comparative Genomics, 5, 51–58.

    Article  Google Scholar 

  28. Harvey, P.H., & Pagel, M.D. (1991). The comparative method in evolutionary biology. Oxford: Oxford University Press.

    Google Scholar 

  29. Hey, J. (2000). Human mitochondrial dna recombination: can it be true? Trends in Ecology and Evolution, 15, 181–182.

    Article  Google Scholar 

  30. Hurst, L.D., & Merchant, A.R. (2001). High guanine-cytosine content is not an adaptation to high temperature: A comparative analysis amongst prokaryotes. Proceedings of the Royal Society B, 268, 493–497.

    Article  Google Scholar 

  31. Husmeier, D., & Wright, F. (2005). Detectign recombination in DNA sequence alignments. In D. Husmeier, R. Dybowski, & S. Roberts (Eds.), Probabilistic modeling in bioinformatics and medical informatics (p. 504). London: Springer.

    Chapter  Google Scholar 

  32. Irimia, M., Penny, D., & Roy, S.W. (2007). Coevolution of genomic intron number and splice sites. Trends Genetics, 23, 321.

    Article  Google Scholar 

  33. Jacob, F. (1988). The statue within: an autobiography. New York: Basic Books, Inc.

    Google Scholar 

  34. Jakobsen, I.B., & Easteal, S. (1996). A program for calculating and displaying compatibility matrices as an aid in determining reticulate evolution in molecular sequences. Computer Applications in the Biosciences, 12, 291–295.

    Google Scholar 

  35. Josse, J., Kaiser, A.D., & Kornberg, A. (1961). Enzymatic synthesis of deoxyribonucleic acid vii. frequencies of nearest neighbor base-sequences in deoxyribonucleic acid. The Journal of Biological Chemistry, 236, 864–875.

    Google Scholar 

  36. Karlin, S., & Burge, C. (1995). Dinucleotide relative abundance extremes: A genomic signature. Trends in Genetics, 11, 283–290.

    Article  Google Scholar 

  37. Karlin, S., & Mrazek, J. (1996). What drives codon choices in human genes. The Journal of Biological Chemistry, 262, 459–472.

    Google Scholar 

  38. Kimura, M., & Crow, A.J.F (1964). The number of alleles that can be maintained in a finite population. Genetics, 49, 725–738.

    Google Scholar 

  39. Kishino, H., & Hasegawa, M. (1989). Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from dna sequence data, and the branching order in hominoidea. Journal of Molecular Evolution, 29, 170–179.

    Article  Google Scholar 

  40. Kliman, R.M., & Bernal, C.A. (2005). Unusual usage of agg and ttg codons in humans and their viruses. Gene, 352, 92.

    Article  Google Scholar 

  41. Kraytsberg, Y., Schwartz, M., Brown, T.A., Ebralidse, K., Kunz, W.S., Clayton, D.A., et al. (2004). Recombination of human mitochondrial dna. Science, 304, 981.

    Article  Google Scholar 

  42. Kushiro, A., Shimizu, M., & Tomita, K. I. (1987). Molecular cloning and sequence determination of the tuf gene coding for the elongation factor tu of thermus thermophilus hb8. European Journal of Biochemistry, 170, 93–98.

    Article  Google Scholar 

  43. Lemey, P., & Posada, D. (2009). Introduction to recombination detection. In P. Lemey, M. Salemi, & A. M. Vandamme AM, The phylogenetic handbook (2nd ed.). Cambridge: Cambridge University Press.

    Google Scholar 

  44. Lindahl, T. (1993). Instability and decay of the primary structure of dna. Nature, 362, 709–715.

    Article  Google Scholar 

  45. Lole, K.S., Bollinger, R.C., Paranjape, R.S., Gadkari, D., Kulkarni, S.S., Novak, N.G., et al. (1999). Full-length human immunodeficiency virus type 1 genomes from subtype c-infected seroconverters in india, with evidence of intersubtype recombination. The Journal of Virology, 73, 152–160.

    Google Scholar 

  46. Martins, E.P., & Hansen, T.F. (1997). Phylogenies and the comparative method: A general approach to incorporating phylogenetic information into the analysis of interspecific data. The American Naturalist, 149(4), 646–667.

    Article  Google Scholar 

  47. Mushegian, A.R., & Koonin, E.A. (1996). Minimal gene set for cellular life derived by comparison of complete bacterial genomes. Proceedings of the National Academy of Sciences of the United States of America, 93, 10268–10273.

    Article  Google Scholar 

  48. Muto, A., & Osawa, S. (1987). The guanine and cytocine content of genomic dna and bacterial evolution. Proceedings of the National Academy of Sciences, 84, 166–169.

    Article  Google Scholar 

  49. Nakashima, H., Fukuchi, S., & Nishikawa, K. (2003). Compositional changes in rna, dna and proteins for bacterial adaptation to higher and lower temperatures. The Journal of Biochemistry (Tokyo), 133, 507–513.

    Article  Google Scholar 

  50. Nei, M., & Kumar, S. (2000). Molecular evolution and phylogenetics. New York: Oxford University Press.

    Google Scholar 

  51. Nichols, T., & Hayasaka, S. (2003). Controlling the familywise error rate in functional neuroimaging: A comparative review. Statistical Methods in Medical Research, 12, 419–446.

    Article  MathSciNet  MATH  Google Scholar 

  52. Nur, I., Szyf, M., Razin, A., Glaser, G., Rottem, S., & Razin, S. (1985). Procaryotic and eucaryotic traits of dna methylation in spiroplasmas (mycoplasmas). The Journal of Bacteriology, 164, 19–24.

    Google Scholar 

  53. Nussinov, R. (1984). Doublet frequencies in evolutionary distinct groups. Nucleic Acids Research, 12, 1749–1463.

    Article  Google Scholar 

  54. Ochman, H., Lawrence, J.G., & Groisman, E.A. (2000). Lateral gene transfer and the nature of bacterial innovation. Nature, 405, 299–304.

    Article  Google Scholar 

  55. Pagel, M. (1994). Detecting correlated evolution on phylogenies: A general method for the comparative analysis of discrete characters. Proceedings of the Royal Society London B: Biological Sciences, 255, 37–45.

    Article  Google Scholar 

  56. Pagel, M. (1997). Inferring evolutionary processes from phylogenies. Zoologica Scripta, 26, 331–348.

    Article  Google Scholar 

  57. Pagel, M. (1999). Inferring the historical patterns of biological evolution. Nature, 401, 877–884.

    Article  Google Scholar 

  58. Posada, D. (2002). Evaluation of methods for detecting recombination from dna sequences: Empirical data. Molecular Biology and Evolution, 19, 708–717.

    Article  Google Scholar 

  59. Posada, D., & Crandall, K.A. (2001). Evaluation of methods for detecting recombination from dna sequences: Computer simulations. Proceedings of the National Academy of Sciences of the United States of America, 98, 13757–13762.

    Article  Google Scholar 

  60. Press, W.H., Teukolsky, S.A., Tetterling, W.T., & Flannery, B.P. (1992). Numerical recipes in C the art of scientifi computing (2nd edn.). Cambridge: Cambridge University Press.

    Google Scholar 

  61. Razin, A., & Razin, S. (1980). Methylated bases in mycoplasmal dna. Nucleic Acids Research, 8, 1383–1390.

    Article  Google Scholar 

  62. Rideout, W.M.I., Coetzee, G.A., Olumi, A.F., & Jones, P.A. (1990). 5-methylcytosine as an endogenous mutagen in the human ldl receptor and p53 genes. Science, 249, 1288–1290.

    Article  Google Scholar 

  63. Sachs, G., Weeks, D.L., Melchers, K., & Scott, D.R. (2003). The gastric biology of helicobacter pylori. Annual Review of Physiology, 65, 349–369.

    Article  Google Scholar 

  64. Saenger, W. (1984). Principles of nucleic acid structure. New York: Springer.

    Book  Google Scholar 

  65. Salemi, M., Gray, R.R., & Goodenow, M.M. (2008). An exploratory algorithm to identify intrahost recombinant viral sequences. Molecular Phylogenetics and Evolution, 49, 618.

    Article  Google Scholar 

  66. Salemi, M., & Vandamme, A.-M. (eds.) (2003). The Phylogenetic Handbook: A Practical Approach to DNA and Protein Phylogeny. Cambridge University Press.

    Google Scholar 

  67. Salminen, M.O., Carr, J.K., Burke, D.S., & McCutchan, F.E. (1995). Identification of breakpoints in intergenotypic recombinants of hiv type 1 by bootscanning. AIDS Research and Human Retroviruses, 11, 1423–1425.

    Article  Google Scholar 

  68. Salminen, M., & Martin, D. (2009). Detecting and characterizing individual recombination events. In P. Lemey, M. Salemi, A. M. Vandamme (Eds.), The phylogenetic handbook (2nd ed.). Cambridge: Cambridge University Press.

    Google Scholar 

  69. Sankoff, D. (2009). Reconstructing the history of yeast genomes. PLoS Genetics, 5, e1000,483.

    Google Scholar 

  70. Sankoff, D., & El-Mabrouk, N. (2002). Genome rearrangement. In T. Jiang, Y. Xu, & M. Q. Zhang (Eds.), Current topics in computational molecular biology. Cambridge: MIT.

    Google Scholar 

  71. Schluter, D., Price, T.D., Mooers, A.Ø., & Ludwig, D. (1997). Likelihood of ancestor states in adaptive radiation. Evolution, 51, 1699–1711.

    Article  Google Scholar 

  72. Shimodaira, H., & Hasegawa, M. (1999). Multiple comparisons of log-likelihoods with applications to phylogenetic inference. Molecular Biology and Evolution, 16, 1114–1116.

    Article  Google Scholar 

  73. Singer, C.E., & Ames, B.N. (1970). Sunlight ultraviolet and bacterial dna base ratios. Science, 170, 822–826.

    Article  Google Scholar 

  74. Stoebel, D.M. (2005). Lack of evidence for horizontal transfer of the lac operon into escherichia coli. Molecular Biology and Evolution, 22, 683–690.

    Article  Google Scholar 

  75. Suchard, M.A., Weiss, R.E., Dorman, K.S., & Sinsheimer, J.S. (2002). Oh brother, where art thou? a bayes factor test for recombination with uncertain heritage. The Systems Biology, 51, 715–728.

    Article  Google Scholar 

  76. Sueoka, N. (1964). On the evolution of informational macromolecules. New York: Academic.

    Google Scholar 

  77. Sved, J., & Bird, A. (1990). The expected equilibrium of the cpg dinucleotide in vertebrate genomes under a mutation model. Proceedings of the National Academy of Sciences of the United States of America, 87, 4692–4696.

    Article  Google Scholar 

  78. Vinci, G., Xia, X., & Veitia, R.A. (2008). Preservation of genes involved in sterol metabolism in cholesterol auxotrophs: Facts and hypotheses. PLoS ONE, 3, e2883.

    Article  Google Scholar 

  79. Wang, H.C., & Hickey, D.A. (2002). Evidence for strong selective constraint acting on the nucleotide composition of 16s ribosomal rna genes. Nucleic Acids Research, 30, 2501–2507.

    Article  Google Scholar 

  80. Wang, H.C., Xia, X., & Hickey, D.A. (2006). Thermal adaptation of ribosomal rna genes: A comparative study. Journal of Molecular Evolution, 63, 120–126.

    Article  Google Scholar 

  81. Wiuf, C., Christensen, T., & Hein, J. (2001). A simulation study of the reliability of recombination detection methods. Journal of Molecular Evolution, 18, 1929–1939.

    Article  Google Scholar 

  82. Xia, X. (1998). How optimized is the translational machinery in escherichia coli, salmonella typhimurium and saccharomyces cerevisiae? Genetics, 149, 37–44.

    Google Scholar 

  83. Xia, X. (1998). The rate heterogeneity of nonsynonymous substitutions in mammalian mitochondrial genes. Journal of Molecular Evolution, 15, 336–344.

    Article  Google Scholar 

  84. Xia, X. (2001). Data analysis in molecular biology and evolution. Boston: Kluwer Academic Publishers.

    Google Scholar 

  85. Xia, X. (2003). Dna methylation and mycoplasma genomes. Journal of Molecular Evolution, 57, S21–S28.

    Article  Google Scholar 

  86. Xia, X. (2005). Mutation and selection on the anticodon of trna genes in vertebrate mitochondrial genomes. Gene, 345, 13–20.

    Article  Google Scholar 

  87. Xia, X. (2007). Molecular phylogenetics: Mathematical framework and unsolved problems. In U. Bastolla, M. Porto, H. E. Roman, & M. Vendruscolo (Eds.), Structural approaches to sequence evolution (pp. 171–191).

    Google Scholar 

  88. Xia, X. (2008). The cost of wobble translation in fungal mitochondrial genomes: Integration of two traditional hypotheses. BMC Evolutionary Biology, 8, 211.

    Article  Google Scholar 

  89. Xia, X. (2009). Information-theoretic indices and an approximate significance test for testing the molecular clock hypothesis with genetic distances. Molecular Phylogenetics and Evolution, 52, 665–676.

    Article  Google Scholar 

  90. Xia, X., Huang, H., Carullo, M., Betran, E., & Moriyama, E.N. (2007). Conflict between translation initiation and elongation in vertebrate mitochondrial genomes. PLoS ONE, 2, e227.

    Article  Google Scholar 

  91. Xia, X., & Li, W.H. (1998). What amino acid properties affect protein evolution? Journal of Molecular Evolution, 47, 557–564.

    Article  Google Scholar 

  92. Xia, X., & Palidwor, G. (2005). Genomic adaptation to acidic environment: Evidence from helicobacter pylori. The American Naturalist, 166, 776–784.

    Article  Google Scholar 

  93. Xia, X., Wang, H.C., Xie, Z., Carullo, M., Huang, H., & Hickey, D.A. (2006). Cytosine usage modulates the correlation between cds length and cg content in prokaryotic genomes. Molecular Biology and Evolution, 23, 1450–1454.

    Article  Google Scholar 

  94. Xia, X.H, Wei, T., Xie, Z., & Antoine, D. (2002). Genomic changes in nucleotide and dinucleotide frequencies in pasteurella multocida cultured under high temperature. Genetics, 161, 1385–1394.

    Google Scholar 

  95. Xia, X., & Xie, Z. (2001). Dambe: Software package for data analysis in molecular biology and evolution. Journal of Heredity, 92, 371–373.

    Article  Google Scholar 

  96. Xia, X., & Yuen, K.Y. (2005). Differential selection and mutation between dsdna and ssdna phages shape the evolution of their genomic at percentage. BMC Genetics, 6, 20.

    Article  Google Scholar 

  97. Zhang, D., Xiong, H., Shan, J., Xia, X., & Trudeau, V. (2008). Functional insight into maelstrom in the germline pirna pathway: A unique domain homologous to the dnaq-h 3-5 exonuclease, its lineage-specific expansion/loss and evolutionarily active site switch. Biology Directorate, 3, 48.

    Article  Google Scholar 

Download references

Acknowledgements

I thank J. Felsenstein and M. Pagel for identifying ambiguities and errors in the manuscript and for their many suggestions to improve the manuscript. S. Aris-Brosou, Y. B. Fu and G. Palidwor, as well as two anonymous reviewers, provided comments and references. I am supported by the Strategic Research, Discovery and Research Tools and Instrument Grants of Natural Science and Engineering Research Council of Canada.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xuhua Xia .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Xia, X. (2011). Comparative Genomics. In: Lu, HS., Schölkopf, B., Zhao, H. (eds) Handbook of Statistical Bioinformatics. Springer Handbooks of Computational Statistics. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16345-6_26

Download citation

Publish with us

Policies and ethics